Deployments Overview

A deployment is a group of identical engine instances deployed within an environment. A deployment defines the StreamSets engine type, version, and configuration to use. You can deploy and launch multiple instances of the configured engine.

When you create a deployment, you select an active environment for that deployment. You must create and activate environments before creating deployments.

A deployment allows you to manage all deployed engine instances with a single configuration change. You can update a deployment to install an additional stage library on the engine or to customize engine configuration properties. After a deployment update, you instruct Control Hub to restart all engine instances in the deployment to replicate the changes to each instance.

When you deploy StreamSets engines to on-premise or cloud computing machines that reside behind a firewall, you must allow the required inbound and outbound connections to each machine.
Important: A deployment is the primary unit of tenancy in the StreamSets platform. Resources configured for a deployment, such as credential stores or AWS instance profiles, are accessible by all authorized users of that deployment. When multiple groups use the same environment, you can restrict access to deployment resources by creating different deployments for each group in the environment and assigning the groups appropriate permissions on the deployments.

You can create the following types of deployments:

Self-managed
In a self-managed deployment, you take full control of procuring the resources needed to run engine instances. The resources can be local on-premises machines or cloud computing machines.
You must set up the machines and complete the installation prerequisites required by the engine type. You manually run an installation script to install and launch an engine instance on each machine that you have set up.
Control Hub-managed
In a Control Hub-managed deployment, Control Hub connects to the external system represented by the parent environment and automatically provisions the resources needed to run the engine type, ensuring that the resources meet engine requirements. Engine instances are then automatically deployed and launched on those resources.
After an administrator completes the required prerequisites, you can create the following types of Control Hub-managed deployments:

Engine Types

A deployment defines the type of engine to deploy and launch.

When you create a deployment, you select one of the following engine types:
Data Collector
Use a Data Collector engine to run data ingestion pipelines that can read from and write to a large number of heterogeneous origins and destinations. Data Collector pipelines perform record-based data transformations in streaming, CDC, or batch modes.
Transformer
Use a Transformer engine to run data processing pipelines on Apache Spark. Because Transformer pipelines run on Spark deployed on a cluster, the pipelines can perform set-based transformations such as joins, aggregates, and sorts on the entire data set.

Once you save a deployment, you cannot change the engine type.

For more information about the pipeline types, see Comparing StreamSets Pipelines.

Engine Versions

A deployment defines the engine version to deploy and launch. StreamSets recommends using the latest engine version to ensure that you have the latest updates and features.

New deployments support the following minimum engine versions:
Engine Minimum Supported Version
Data Collector 4.3.0
Transformer 4.2.0
Note: Existing deployments can continue to use Data Collector 4.0.x to 4.2.x or Transformer 4.0.x to 4.1.x.

Transformer engines provide engine versions based on the Scala version that the engine is built with, in addition to the engine version. For example, for Transformer 4.2.0, you can choose between engine version 4.2.0 (Scala 11) and 4.2.0 (Scala 12). For information about choosing a Scala version, see Choosing an Engine Version.

Note: When allowed on the parent environment, a deployment can use nightly engine builds in addition to released engine versions. The version number of a nightly build includes a -SNAPSHOT suffix and the build number. For example, 5.2.0-SNAPSHOT (Build 1013). Nightly builds are for testing features under development and should not be used in production systems.

Once you save a deployment, you cannot change the engine version. To upgrade to a later engine version, see Upgrading Engines for Self-Managed Deployments.

If you design and run pipelines across engine instances managed by different deployments, ensure that all engine versions are the same. Since engine functionality can differ from version to version, using a different engine version can result in a pipeline that is invalid. Use engine labels to ensure that you do not mix engine versions for a single pipeline.

Engine Configuration

A deployment defines the configuration of the engine to deploy and launch.

Note: As you get started, you can typically use the default engine configuration. You might need to modify the engine configuration as you further explore StreamSets.

You can define the engine configuration when you create or edit a deployment. If you edit the engine configuration for an existing deployment, you instruct Control Hub to restart all engine instances managed by the deployment to replicate the changes to the instances.

You can define the following engine configurations for a deployment: