Overview

A connection defines the information required to connect to an external system.

Note: Connections are recommended for Data Collector and Transformer pipelines and for Transformer for Snowflake pipelines that run on a deployed engine. They are not applicable for Transformer for Snowflake pipelines that run on the IBM StreamSets hosted engine.

Pipelines communicate with external systems to read and write data. Most of these external systems require sensitive information, such as user names or passwords, to access the data. When you configure pipelines and pipeline fragments, you can enter the details needed to connect to the external system, or you can select an existing connection that contains the details.

Using connections provides the following benefits:
Increased security
When you use connections, you can limit the number of users that need to know the security credentials for external systems.
For example, you want to ensure that only the DevOps team knows the security credentials required to access external systems. A DevOps engineer logs into Control Hub to create all connections to the external systems, and then shares the connections with data engineers who design pipelines, granting them the ability to use the connections. Data engineers select the appropriate connection name for a pipeline stage, but cannot view the connection details.
Reusability
You can create a connection once and then reuse that connection in multiple pipelines. Reusing connections reduces the possibility of user errors and simplifies updates to connection values.
For example, you might create a single connection to your source data stored in Amazon S3. You name the connection SourceData. You develop multiple pipelines to process this source data. Each time you add an Amazon S3 origin to a pipeline, you simply select the existing SourceData connection. You do not need to re-enter the AWS authentication details for each Amazon S3 origin. When you need to update the authentication details, you make a single update to the connection. All Amazon S3 origins using that connection reflect the updated values in subsequent pipeline runs.

Configuring connections requires an authoring Data Collector.

For more information on the supported connection types, see Connection Types Overview.

Connection Requirements

Before you create connections, note the following requirements:

Use an appropriate authoring Data Collector to create connections
You must select an available authoring Data Collector version 4.0.0 or later to create connections.
The Data Collector version and the stage libraries installed on the engine determine the connection types, such as Amazon S3 or JDBC, that you can create and the properties available in the connections.
For example, to use a MongoDB Atlas connection, you must use an authoring Data Collector version 5.2.0 or later, and have the MongoDB Atlas stage library installed on the engine. Similarly, to use Azure Managed Identity authentication in an Azure connection, you must use an authoring Data Collector version 5.5.0 or later and have the Azure stage library installed.
For a list of new connection types and properties supported with each Data Collector version, see Data Collector Versions.
Pipelines with the appropriate engine version can use connections
After creating connections, you can use those connections in pipelines.
For Data Collector and Transformer pipelines, the engine version that runs the pipeline must support the selected the connection type, and have the required stage library installed.
For example, say you create an Amazon S3 connection, using Data Collector version 4.0.0 with the Amazon Web Services stage library installed. You can use this connection in a Transformer pipeline, as long as you design and run the pipeline using Transformer version 4.0.0 or later with the Amazon Web Services stage library installed.
For Transformer for Snowflake, the engine version that runs the pipeline must support the selected connection version. For details, see Transformer for Snowflake Versions.

Using connections in pipelines requires the following minimum engine versions:

  • Data Collector version 4.0.0 or later
  • Transformer version 4.0.0 or later
  • Transformer for Snowflake version 5.0.0 or later
Later versions introduce support for additional connection types, as listed below:

Data Collector Versions

The Data Collector version determines the connection types that you can use in Data Collector pipelines and the properties available in each connection.

Connection support was initially introduced with Data Collector 4.0.0. Later versions introduce support for additional connection types and properties.

The following table lists the new connection types and properties supported with each Data Collector version. Later Data Collector versions support all previously introduced connections and connection properties, unless stated otherwise:
Engine Version New Connection Types or Properties
Data Collector 5.11.0
  • Web Client

Data Collector 5.10.0

  • Snowflake support for optionally defining the warehouse, database, and schema.

Data Collector 5.9.0

  • Teradata

Data Collector 5.8.0

  • Couchbase

Data Collector 5.6.0

  • Aerospike
  • Snowflake support for the following features:
    • Private key file authentication
    • Private key content authentication
    • OAuth authentication
    • Private link URL
  • Orchestrator
Data Collector 5.5.0
  • Azure Data Lake Storage Gen2 support for Azure Managed Identity authentication

Data Collector 5.4.0

  • CONNX

Data Collector 5.2.0

  • MongoDB Atlas
Data Collector 5.1.0
  • Kafka support for custom authentication

Data Collector 5.0.0

  • Hive
  • MQTT
  • OPC UA Client

Data Collector 4.4.0

  • Amazon S3 support for custom endpoints
  • CoAP Client
  • Influx DB
  • Influx DB 2.x
  • Pulsar

Data Collector 4.2.0

  • Cassandra
  • SFTP/FTP/FTPS support for using a proxy to connect to the remote server

Data Collector 4.1.0

  • Amazon Redshift
  • MongoDB
  • RabbitMQ
  • Redis

Data Collector 4.0.0

  • Amazon Kinesis Firehose
  • Amazon Kinesis Streams
  • Amazon S3
  • Amazon SQS
  • Azure Data Lake Storage Gen2
  • Azure Synapse - Requires the Azure Synapse Enterprise stage library version 1.2.0 or later.
  • Databricks Delta Lake - Requires the Databricks Enterprise stage library version 1.2.0 or later.
  • Elasticsearch
  • Google BigQuery
  • Google Cloud Storage
  • Google Pub/Sub
  • JDBC
  • JMS
  • Kafka
  • Kudu
  • MySQL
  • Oracle
  • PostgreSQL
  • Salesforce
  • SFTP/FTP/FTPS
  • Snowflake - Requires the Snowflake Enterprise stage library version 1.7.0 or later.
  • Snowpipe - Requires the Snowflake Enterprise stage library version 1.7.0 or later.
  • SQL Server

Transformer Versions

The Transformer version determines the connection types that you can use in Transformer pipelines and the properties available in each connection.

Connection support was initially introduced with Transformer 4.0.0. Later versions introduce support for additional connection types and properties.

The following table lists the new connection types and properties supported with each Transformer version. Later Transformer versions support all previously introduced connections and connection properties, unless stated otherwise:
Engine Version New Connection Types or Properties

Transformer 5.8.0

  • Snowflake support for optionally defining the warehouse, database, and schema.

Using this feature requires an authoring Data Collector version 5.10.0 or later.

Transformer 5.7.0

  • Amazon EMR Cluster Manager support for using AWS Service Catalog to provision a cluster.

Using this feature requires an authoring Data Collector version 5.10.0 or later.

Transformer 5.4.0

  • Amazon EMR Cluster Manager support for specifying a cluster by name and tags.
  • Amazon EMR Serverless support for specifying an application by name and tags.

Using these features requires an authoring Data Collector version 5.4.0 or later.

Transformer 5.3.0

  • Amazon EMR Serverless - Creating this connection type requires an authoring Data Collector version 5.3.0 or later.

Transformer 4.0.0

  • Amazon EMR Cluster Manager
  • Amazon Redshift - Creating this connection type requires an authoring Data Collector version 4.1.0 or later.
  • Amazon S3
  • Elasticsearch
  • JDBC
  • Kafka
  • Kudu
  • MySQL
  • Oracle
  • PostgreSQL
  • Snowflake - Requires the Snowflake Enterprise stage library version 1.7.0 or later.
  • SQL Server

Transformer for Snowflake Versions

When your organization uses Transformer for Snowflake deployed engines, you use Snowflake connections to provide connection information for the pipelines.

The Transformer for Snowflake version determines the connection versions that you can use in the pipelines, and the properties available in each connection. Connection support was initially introduced with Transformer for Snowflake 5.0.0. Later versions introduce support for additional functionality.

The following table lists the connection version and properties supported with each Transformer for Snowflake version. Later Transformer for Snowflake versions support all previously introduced functionality, unless stated otherwise:

Engine Version Connection Type or Properties

Transformer for Snowflake 5.6.0

  • Snowflake connection support for OAuth authentication

Transformer for Snowflake 5.0.0

  • Snowflake connection

    Using this connection requires an authoring Data Collector version 5.10.0 or later.

Working with Connections

The Connections view lists all connections that you have access to.

You can complete the following tasks in the Connections view:
  • Create connections.
  • Assign tags to connections.
  • Test that configured connection values are valid.
  • View connection details, including the connection type, assigned tags, and the list of pipelines and pipeline fragments that use the connection.
  • Edit connection details.
  • Duplicate connections.
  • Share connections with other users and groups.
  • Delete connections.
The following image shows a list of connections in the Connections view. Each connection is listed with its name, type, tags, and creator.
Tip: To resize, hide, or reorder the columns, see Customizing Table Columns.

Note the following icons that display for connections when you select a connection. You'll use these icons frequently as you manage connections:

Icon Name Description
Add Add a connection.
Refresh Refresh the list of connections.
Toggle Filter Column Toggle the display of the Filter column, where you can filter connections by connection type or tag. You can also search for connections by name or description.
Share Share connections with other users and groups, as described in Permissions.
Delete Delete the connection.