Learn how to create and modify a synthetic data model configuration before model training to support different generative models, data types, and privacy protections.
All Gretel models support a common interface for creating, training, and fine-tuning models as well as generating data. See the examples below for how to train and generate data with Gretel, or read the Models docs for model-specific parameters and tuning recommendations.

Model Creation

To create a synthetic data model, we add the models object to our configuration. The models object takes a list of keyed objects, starting with the tag for the generative model we wish to use.
schema_version: "1.0"
name: "my-awesome-model"
- synthetics:
data_source: gretel_a63772a737c4412f9314fb998fa480e2_foo.csv
It is assumed that a project artifact was already uploaded for this particular configuration, but data_source can be any valid URL that is accessible by the client. By default, no extra objects or parameters are required. Gretel uses default settings that will work well for a variety of datasets. There are three primary sections to be aware of:
schema_version: "1.0"
- synthetics:
epochs: 100
learning_rate: 0.001
in_set_count: 10
pattern_count: 10
outliers: medium
similarity: medium

Model Parameters

The params object contains key-value pairs that represent the available parameters that will be used to train a synthetic data model on the data_source. Parameters are specific to each supported model type. See a full list of supported parameters in the Models docs.
Gretel has configuration templates that may be helpful as starting points for creating your model.

Data Generation

After a Synthetic Data model is trained a sample synthetic dataset will be created automatically. This data set will be used to generate the Synthetic Data Report and to provide a sample synthetic data set that you can explore on your own.
Once initial model training or fine-tuning is complete, you can retrieve the model artifacts (synthetic report, model archive, and sample data) and also schedule the generation of larger datasets. See examples in the SDK Notebooks and CLI Examples sections.