Transformer Registration Overview

Transformer is an execution engine that works directly with StreamSets Control Hub. You install Transformer on a machine that is configured to submit Spark jobs to a cluster, such as a Hadoop edge or data node or a cloud virtual machine. You then register Transformer to work with Control Hub.

When you register Transformer, you assign labels to the Transformer. The labels determine which Control Hub jobs are run on that Transformer.

You can install and register multiple instances of Transformer with Control Hub. For example, you might install multiple instances of Transformer to work with different Hadoop YARN clusters. Or you might use one Transformer installation as a test environment and another installation as a production environment.

You can use each registered Transformer for both authoring and execution in Control Hub. You design pipelines in the Control Hub Pipeline Designer after selecting an available authoring Transformer to use. When you run pipelines from Control Hub jobs, you assign labels to the jobs and to the Transformers to determine the execution Transformer that runs the pipeline.

Before you register Transformer, ensure that you have enabled HTTPS for Transformer.

Transformer Versions

StreamSets recommends using the latest version of Transformer with Control Hub to ensure that you can use the newest features.

You can register earlier Transformer versions. You can even register Transformers of different versions. However, since Transformer functionality can differ from version to version, use an authoring Transformer that is the same version as the execution Transformers that you intend to use to run the pipeline. Using a different Transformer version can result in a pipeline that is invalid for execution Transformers.

For example, if the authoring Transformer is a more recent version than the execution Transformer, pipelines might include a stage, stage library, or stage functionality that does not exist in the execution Transformer.

In addition, ensure that all execution Transformers assigned the same label are the same Transformer version.

When you start a job, the Transformer with the same labels that is running the least number of pipelines runs the job. Since any Transformer in the group might run the job, all Transformers that function as a group must be the same Transformer version and have identical configuration to ensure consistent processing.

For the minimum Transformer versions that Control Hub supports, see Control Hub Requirements.