Engines Overview
An engine runs pipelines.
You set up and deploy IBM StreamSets engines in your corporate network, which can be on-premises or on a protected cloud computing platform. Control Hub works with the engines when you design pipelines and when you run pipelines from jobs.
IBM StreamSets has three engines, Data Collector, Transformer, and Transformer for Snowflake, each of which can be deployed independently in your corporate network but managed together in Control Hub.
- Authoring
- To build pipelines in Control Hub, you select an available authoring engine. The selected authoring engine determines the stages, stage libraries, and functionality that display in the pipeline canvas. When you create connections, the selected authoring Data Collector determines the connection types that you can create.
- Execution
- After checking in a pipeline and adding it to a job, you start the job on an execution engine. The execution engine runs the pipeline, moving the data from origin to destination systems and processing the data on the way.
To set up engines, you create a deployment. A deployment defines the engine type, version, and configuration to deploy.
After engines are deployed and launched, the engines communicate with Control Hub and the web browser. Control Hub monitors the resources that each engine uses. Control Hub only starts jobs on an engine that has not reached any resource thresholds.
You can monitor the performance, metrics, and pipeline status of each engine.
Engine Types and Versions
When you create a deployment to set up engines, you define the following information:
- Engine type
- You create a deployment for one of the following engine types - Data Collector, Transformer, or Transformer for Snowflake. For more information, see Engine Types.
- Engine version
- You create a deployment for a specific engine version. Use the latest engine version to ensure that you have the latest updates and features.
Once you save the deployment, you cannot change the engine type or version.