Deployments Overview
A deployment is a group of identical engine instances deployed within an environment. A deployment defines the StreamSets engine type, version, and configuration to use. You can deploy and launch multiple instances of the configured engine.
When you create a deployment, you select an active environment for that deployment. You must create and activate environments before creating deployments.
A deployment allows you to manage all deployed engine instances with a single configuration change. You can update a deployment to install an additional stage library on the engine or to customize engine configuration properties. After a deployment update, you instruct Control Hub to restart all engine instances in the deployment to replicate the changes to each instance.
You can create the following types of deployments:
- Self-managed
- In a self-managed deployment, you take full control of procuring the resources needed to run engine instances. The resources can be local on-premises machines or cloud computing machines.
- Control Hub-managed
- In a Control Hub-managed deployment, Control Hub connects to the external system represented by the parent environment and automatically provisions the resources needed to run the engine type, ensuring that the resources meet engine requirements. Engine instances are then automatically deployed and launched on those resources.
Engine Types
A deployment defines the type of engine to deploy and launch.
- Data Collector
- Use a Data Collector engine to run data ingestion pipelines that can read from and write to a large number of heterogeneous origins and destinations. Data Collector pipelines perform record-based data transformations in streaming, CDC, or batch modes.
- Transformer
- Use a Transformer engine to run data processing pipelines on Apache Spark. Because Transformer pipelines run on Spark deployed on a cluster, the pipelines can perform set-based transformations such as joins, aggregates, and sorts on the entire data set.
Once you save a deployment, you cannot change the engine type.
For more information about the pipeline types, see Comparing StreamSets Pipelines.
Engine Versions
A deployment defines the engine version to deploy and launch. StreamSets recommends using the latest engine version to ensure that you have the latest updates and features.
Engine | Minimum Supported Version |
---|---|
Data Collector | 4.3.0 |
Transformer | 4.2.0 |
Transformer engines provide engine versions based on the Scala version that the engine is built with, in addition to the engine version. For example, for Transformer 4.2.0, you can choose between engine version 4.2.0 (Scala 11) and 4.2.0 (Scala 12). For information about choosing a Scala version, see Choosing an Engine Version.
Once you save a deployment, you cannot change the engine version. To upgrade to a later engine version, see Upgrading Engines for Self-Managed Deployments.
If you design and run pipelines across engine instances managed by different deployments, ensure that all engine versions are the same. Since engine functionality can differ from version to version, using a different engine version can result in a pipeline that is invalid. Use engine labels to ensure that you do not mix engine versions for a single pipeline.
Engine Java Version
When you configure a self-managed deployment using an engine tarball file, you are responsible for installing the appropriate Java version as a prerequisite before you run the installation script command that installs and launches the engine tarball.
For all other deployment types, Control Hub deploys and installs the appropriate Java version for you. For some deployment types, you can choose between supported Java versions. StreamSets recommends using the default Java version unless you have a specific need for another version.
You can define a Java version for the following deployment types:
- Self-managed deployment using an engine Docker image
- You can define the Java version in the following ways:
- When creating the deployment
In the Review and Launch step of the deployment wizard, select a version from the Java Version property under the generated installation script command.
- When retrieving the installation script for an existing deployment
By default, the Install Engine Script dialog box displays the version selected during deployment creation. You can alternatively select a different version from the Java Version property under the generated installation script command. The selection made in this dialog box is not saved.
- When creating the deployment
- Azure VM deployment
- Define the Java version when you create the deployment. Alternatively, you can edit the Java version for an existing deployment when the deployment is deactivated. In the Configure Engine step of the deployment wizard, click Advanced Configuration, then click Java Configuration. Select a version from the Java Version property.
At this time, all other deployment types use the default Java version.
Engine Configuration
A deployment defines the configuration of the engine to deploy and launch.
You can define the engine configuration when you create or edit a deployment. If you edit the engine configuration for an existing deployment, you instruct Control Hub to restart all engine instances managed by the deployment to replicate the changes to the instances.