Search…
⌃K

Models

This section covers the generative machine learning models supported by Gretel APIs as well as core use cases and capabilities.

Supported Features

This section compares features of different generative data models supported by Gretel APIs.
✅ = Supported
✖️ = Not yet supported
LSTM
Gretel GPT
ACTGAN
Amplify
DGAN
Tag
synthetics
gpt_x
actgan
amplify
timeseries_dgan
Type
Language Model
Language Model
Generative Adversarial Network
Statistical
Generative Adversarial Network
Model
LSTM
Pre-trained Transformer
GAN
Statistical
GAN
Privacy filters
✖️
✖️
✖️
Differential privacy
✖️
✖️
✖️
✖️
Synthetic quality report
✖️
✖️
Tabular
✖️
✖️
Time-series
✖️
✖️
✖️
Natural language
✖️
✖️
✖️
Conditional generation
✖️
✖️
Pre-trained
✖️
✖️
✖️
✖️
Gretel cloud
On-premises
GitHub - gretelai/gretel-synthetics: Synthetic data generators for structured and unstructured text, featuring differentially private learning.
GitHub
Check out our GitHub for research, source code and examples including our core synthetic data generation library.

Create and train a model

Below is an example configuration that may be used to create and fine-tune a synthetic data model. Save the example above to model-config.yaml.
  • Replace [model_id] with the type of model you wish to train (e.g. synthetics, gpt_x, actgan, timeseries_dgan, amplify).
  • data_source must point to a valid and accessible file in CSV, JSON, or JSONL format.
    • Supported storage formats include S3, GCS, Azure Blog Storage, HDFS, WebHDFS, HTTP, HTTPS, SFTP, or local filesystem.
    • data_source: __temp__ can be used when the source file is specified elsewhere using:
      • --in_data parameter via CLI,
      • parameter via SDK,
      • dataset button via Console.
schema_version: "1.0"
name: "my-model"
models:
- [model_id]:
data_source: foo.csv
Use the following CLI command to train and create the synthetic data model.
  • The use of exports are not necessary, they are only used to have a cleaner models create command.
  • --in_data is optional, and can be used to override the data_source specified in the config.
export CONFIG_PATH=model-config.yaml
export DATASOURCE=foo.csv
gretel models create \
--config $CONFIG_PATH \
--runner cloud \
--in-data $DATASOURCE > my-model.json

Generate data from a model

Below is an example CLI command that may be used to generate data from a model.
  • --model-id supports both a model uid and the JSON that models create outputs.
  • --in-data (optional) allows you to specify a CSV file to prompt the model for conditional data generation tasks.
gretel models run --model-id my-model.json \
--runner cloud \
--in-data prompts.csv \
--output .