Deployments Overview

A deployment is a group of identical engine instances deployed within an environment. A deployment defines the StreamSets engine type, version, and configuration to use. You can deploy and launch multiple instances of the configured engine.

Note: Deployments are required for Data Collector and Transformer pipelines. They are not applicable for Transformer for Snowflake pipelines.

When you create a deployment, you select an active environment for that deployment. You must create and activate environments before creating deployments.

A deployment allows you to manage all deployed engine instances with a single configuration change. You can update a deployment to install an additional stage library on the engine or to customize engine configuration properties. After a deployment update, you instruct Control Hub to restart all engine instances in the deployment to replicate the changes to each instance.

When you deploy StreamSets engines to on-premise or cloud computing machines that reside behind a firewall, you must allow the required inbound and outbound connections to each machine.
Important: A deployment is the primary unit of tenancy in the StreamSets platform. Resources configured for a deployment, such as credential stores or AWS instance profiles, are accessible by all authorized users of that deployment. When multiple groups use the same environment, you can restrict access to deployment resources by creating different deployments for each group in the environment and assigning the groups appropriate permissions on the deployments.

You can create the following types of deployments:

Self-managed
In a self-managed deployment, you take full control of procuring the resources needed to run engine instances. The resources can be local on-premises machines or cloud computing machines.
You must set up the machines and complete the installation prerequisites required by the engine type. You manually run an installation script to install and launch an engine instance on each machine that you have set up.
Control Hub-managed
In a Control Hub-managed deployment, Control Hub connects to the external system represented by the parent environment and automatically provisions the resources needed to run the engine type, ensuring that the resources meet engine requirements. Engine instances are then automatically deployed and launched on those resources.
After an administrator completes the required prerequisites, you can create the following types of Control Hub-managed deployments:

Engine Types

A deployment defines the type of engine to deploy and launch.

When you create a deployment, you select one of the following engine types:
Data Collector
Use a Data Collector engine to run data ingestion pipelines that can read from and write to a large number of heterogeneous origins and destinations. Data Collector pipelines perform record-based data transformations in streaming, CDC, or batch modes.
Transformer
Use a Transformer engine to run data processing pipelines on Apache Spark. Because Transformer pipelines run on Spark deployed on a cluster, the pipelines can perform set-based transformations such as joins, aggregates, and sorts on the entire data set.

Once you save a deployment, you cannot change the engine type.

For more information about the pipeline types, see Comparing StreamSets Pipelines.

Engine Versions

A deployment defines the engine version to deploy and launch. StreamSets recommends using the latest engine version to ensure that you have the latest updates and features.

New deployments support the following minimum engine versions:
Engine Minimum Supported Version
Data Collector 4.3.0
Transformer 4.2.0
Note: Existing deployments can continue to use Data Collector 4.0.x to 4.2.x or Transformer 4.0.x to 4.1.x.

Transformer engines provide engine versions based on the Scala version that the engine is built with, in addition to the engine version. For example, for Transformer 4.2.0, you can choose between engine version 4.2.0 (Scala 11) and 4.2.0 (Scala 12). For information about choosing a Scala version, see Choosing an Engine Version.

Note: When allowed on the parent environment, a deployment can use nightly engine builds in addition to released engine versions. The version number of a nightly build includes a -SNAPSHOT suffix and the build number. For example, 5.2.0-SNAPSHOT (Build 1013). Nightly builds are for testing features under development and should not be used in production systems.

Once you save a deployment, you cannot change the engine version. To upgrade to a later engine version, see Upgrading Engines for Self-Managed Deployments.

If you design and run pipelines across engine instances managed by different deployments, ensure that all engine versions are the same. Since engine functionality can differ from version to version, using a different engine version can result in a pipeline that is invalid. Use engine labels to ensure that you do not mix engine versions for a single pipeline.

Engine Java Version

All deployed StreamSets engines require that the appropriate Java version be installed on the engine machine. Each engine type has different Java version requirements. In addition, some stage libraries and use cases require specific Java versions. For details, see the following engine documentation:

When you configure a self-managed deployment using an engine tarball file, you are responsible for installing the appropriate Java version as a prerequisite before you run the installation script command that installs and launches the engine tarball.

For all other deployment types, Control Hub deploys and installs the appropriate Java version for you. For some deployment types, you can choose between supported Java versions. StreamSets recommends using the default Java version unless you have a specific need for another version.

You can define a Java version for the following deployment types:

Self-managed deployment using an engine Docker image
You can define the Java version in the following ways:
  • When creating the deployment

    In the Review and Launch step of the deployment wizard, select a version from the Java Version property under the generated installation script command.

  • When retrieving the installation script for an existing deployment

    By default, the Install Engine Script dialog box displays the version selected during deployment creation. You can alternatively select a different version from the Java Version property under the generated installation script command. The selection made in this dialog box is not saved.

Control Hub bundles the selected Java version into the Docker image.
Azure VM deployment
Define the Java version when you create the deployment. Alternatively, you can edit the Java version for an existing deployment when the deployment is deactivated. In the Configure Engine step of the deployment wizard, click Advanced Configuration, then click Java Configuration. Select a version from the Java Version property.
Control Hub sets up the selected Java version on the provisioned Azure VM instance.

At this time, all other deployment types use the default Java version.

Note: Deployments for Data Collector 5.9.x and earlier support selecting Java 8 only. Deployments for Transformer support selecting the Java version required for the Scala version associated with the Transformer engine version.

Engine Configuration

A deployment defines the configuration of the engine to deploy and launch.

Note: As you get started, you can typically use the default engine configuration. You might need to modify the engine configuration as you further explore StreamSets.

You can define the engine configuration when you create or edit a deployment. If you edit the engine configuration for an existing deployment, you instruct Control Hub to restart all engine instances managed by the deployment to replicate the changes to the instances.

You can define the following engine configurations for a deployment: