Deployments Overview
A deployment is a group of identical engine instances deployed within an environment. A deployment defines the StreamSets engine type, version, and configuration to use. You can deploy and launch multiple instances of the configured engine.
When you create a deployment, you select an active environment for that deployment. You must create and activate environments before creating deployments.
A deployment allows you to manage all deployed engine instances with a single configuration change. You can update a deployment to install an additional stage library on the engine or to customize engine configuration properties. After a deployment update, you instruct Control Hub to restart all engine instances in the deployment to replicate the changes to each instance.
You can create the following types of deployments:
- Self-managed
- In a self-managed deployment, you take full control of procuring the resources needed to run engine instances. The resources can be local on-premises machines or cloud computing machines.
- Control Hub-managed
- In a Control Hub-managed deployment, Control Hub connects to the external system represented by the parent environment and automatically provisions the resources needed to run the engine type, ensuring that the resources meet engine requirements. Engine instances are then automatically deployed and launched on those resources.
Engine Types
A deployment defines the type of engine to deploy and launch.
- Data Collector
- Use a Data Collector engine to run data ingestion pipelines that can read from and write to a large number of heterogeneous origins and destinations. Data Collector pipelines perform record-based data transformations in streaming, CDC, or batch modes.
- Transformer
- Use a Transformer engine to run data processing pipelines on Apache Spark. Because Transformer pipelines run on Spark deployed on a cluster, the pipelines can perform set-based transformations such as joins, aggregates, and sorts on the entire data set.
Once you save a deployment, you cannot change the engine type.
For more information about the pipeline types, see Comparing StreamSets Pipelines.
Engine Versions
A deployment defines the engine version to deploy and launch. StreamSets recommends using the latest engine version to ensure that you have the latest updates and features.
Engine | Minimum Supported Version |
---|---|
Data Collector | 4.3.0 |
Transformer | 4.2.0 |
Transformer engines provide engine versions based on the Scala version that the engine is built with, in addition to the engine version. For example, for Transformer 4.2.0, you can choose between engine version 4.2.0 (Scala 11) and 4.2.0 (Scala 12). For information about choosing a Scala version, see Choosing an Engine Version.
Once you save a deployment, you cannot change the engine version. To upgrade to a later engine version, see Upgrading Engines for Self-Managed Deployments.
If you design and run pipelines across engine instances managed by different deployments, ensure that all engine versions are the same. Since engine functionality can differ from version to version, using a different engine version can result in a pipeline that is invalid. Use engine labels to ensure that you do not mix engine versions for a single pipeline.
Engine Configuration
A deployment defines the configuration of the engine to deploy and launch.
You can define the engine configuration when you create or edit a deployment. If you edit the engine configuration for an existing deployment, you instruct Control Hub to restart all engine instances managed by the deployment to replicate the changes to the instances.