Search…
⌃K

Overview

Learn how to connect Gretel with your existing data sources.
Gretel Connectors are in beta and may be subject to change. Please contact us if you are planning to use them.
Gretel Connectors may be used to make it easier to connect your existing data sources with Gretel.

Terminology and Concepts

Each Connector consists of a source, a sink, and a model config. The connector reads raw data from the source, passes it to a Gretel Cloud or Local worker to transform each record (if using a Gretel Transform config) or synthesize new records based on it (if using a Gretel Synthetics config), and writes the result to the sink.
Currently, only Gretel Transform configs, and AWS S3 sources and sinks are supported.

Connector Configuration

Gretel connectors are configured using YAML. A config consists of atleast one source, one sink and a connector that configures a source and sink together into a pipeline.
version: 1
sources:
- name: my_s3_source
type: s3
config:
bucket: my-connector-source
path_prefix: sandbox
sinks:
- name: my_s3_sink
type: s3
config:
bucket: my-connector-destination
path_prefix: output/sandbox
connectors:
- name: default
version: dev
max_active: 1
source: my_s3_source
sink: my_s3_sink
model: transform/default

Sources and Sinks

The sources and sinks properties in a config define where data should be read from and written to. Source data will be processed through a Gretel Model and then the results written to a sink.
A source or sink definition takes the form of
name: source_or_sink_name
type: integration_name # eg s3
config:
<integration-config>
  • name - The name of the source or sink. This name must be unique to the connector config.
  • type - Specifies the type of source or sink. Currently s3 is supported. For more information please refer to the S3 Connector docs.
  • config - Based on the type, any integration specific configuration should be defined here. Please refer to each integration's specific documentation for more details.

Connectors

The connectors map is used to define a pipeline. Each connector config must define a default configuration.
  • version - Specifies the connector version to run. Valid options include: latest.
  • source - Specifies the source to read data from.
  • sink - Specifies the destination to write results back to.
  • max_active - Determines the max number of active jobs the connector will launch. This setting can be used to manage pipeline throughput.
  • model - This may be a model id or model config. Please see Specifying a Model for more details.

Specifying a Model

When configuring a connector you must select a model for each connector config. You may specify either an existing model, or a model config.
When an existing model id is configured, that model will be re-run for every new data source in the pipeline. If a model config is specified, then a new model will be created or trained for every new data source in the pipeline.
In the example below, we configure a model using a model configuration provided by Gretel.
connectors:
default:
...
model: transform/default
For a complete list of available model templates, please see our gretelai/gretel-blueprints Github repo.
Model configurations can also be passed using either s3://... or https://... urls. For example
connectors:
default:
...
model: s3://my-bucket/model_config.yaml
If you have an existing pre-trained model, you can pass that model's id instead.
connectors:
default:
...
model: 617c339e2fe5baa5b7765dd1

Running a Connector

Connectors are shipped as Docker containers and may be ran via the Gretel CLI, or deployed into existing container orchestration platforms such as Kubernetes.
If you are not planning to deploy the connector using the Gretel CLI or a Gretel-provided CloudFormation/Terraform template, please contact us for access to the container's docker registry.

From the CLI

Given a connector configuration, you can use the gretel connectors start command to run a connector from Gretel's CLI:
gretel connectors start --config my_config.yaml
Please note, that in order to run the connector from the CLI, the host must have access to a running docker daemon.
For a complete list of available params you may run gretel connectors start --help.