Labels Overview

Use labels to group engines. All engines of the same type that have the same label function as a group.

When you create a job, you assign labels to the job so Control Hub knows the group of associated engines to run the job on. For example, if you assign a sales label to a Transformer job, then the job can run on any Transformer with a sales label.

When applying labels, you can use any classification structure that you choose. For example, you might use labels to group Data Collectors by environment and to group Transformers by department.

Important: Labels are case sensitive.

When you start a job, Control Hub runs one pipeline instance on an available engine by default. For example, if three Data Collectors have all of the specified labels for a job that contains a standalone pipeline, by default Control Hub runs one pipeline instance on the Data Collector running the fewest number of pipelines.

You can increase the number of pipeline instances that Control Hub runs for a job for Data Collector pipelines. For Transformer pipelines, Control Hub runs a single pipeline instance for each job.

You can include one or more backup engines in a group to support pipeline failover for jobs.

Labels and Execution Component

You assign labels to the following execution components, using the same label for the components that you want to function as a group:

Data Collector: Each Data Collector can act as an authoring Data Collector used to design all types of pipelines and as an execution Data Collector used to run standalone and cluster pipelines.; Use labels to clearly designate which Data Collectors are dedicated to pipeline design. For example, assign an Authoring label to the authoring Data Collector used to design pipelines in Pipeline Designer. When you create jobs, avoid selecting the Authoring label to ensure that jobs are not started on the authoring Data Collector.; Use labels to group execution Data Collectors by any classification you choose.
When you start a job on a group of Data Collectors with the same label, any of the Data Collectors can run a pipeline instance for the job. As a result, all Data Collectors that function as a group must use the same Data Collector version and must have an identical configuration to ensure consistent processing.
Transformer: Each Transformer can act as an authoring Transformer and as an execution engine used to run Transformer pipelines. Use labels to group Transformers by any classification you choose.; When you start a job, the Transformer with the same labels that is running the least number of pipelines runs the job. Since any Transformer in the group might run the job, all Transformers that function as a group must be the same Transformer version and have identical configuration to ensure consistent processing.
Deployment: Labels that you assign to a deployment are assigned to all Data Collector containers provisioned by the deployment. After the provisioned Data Collectors are running and registered with Control Hub, you can assign additional labels to the provisioned Data Collectors just as you can for any registered Data Collector.; In most cases, a deployment automatically provisions execution Data Collectors used to run standalone or cluster pipelines. A deployment can also automatically provision authoring Data Collectors dedicated to pipeline design as long as the authoring Data Collectors are provisioned from a unique deployment that doesn't include any execution Data Collectors.; Use labels to clearly designate a deployment that provisions authoring Data Collectors. Use labels for deployments that provision execution Data Collectors by any classification you choose.

Labels and Pipeline Type

Control Hub determines the engine used to run a pipeline based on the pipeline type and the label assigned to the job and the engine.

For example, when you start a job for a Data Collector pipeline, Control Hub runs a pipeline instance on Data Collectors with labels that match those defined in the job.

Control Hub only runs a pipeline on the expected engine. That is, it won't run a Data Collector pipeline on a Transformer. You can, therefore, use the same labels on different engines, without worrying about whether Control Hub runs a pipeline on the wrong engine.

For example, you assign the label WestDataCenter to a Data Collector and to a Transformer. When you run a job with the WestDataCenter label that contains a Data Collector pipeline, Control Hub runs the pipeline on the Data Collector only. When you run a job with the WestDataCenter label that contains a Transformer pipeline, Control Hub runs the pipeline on the Transformer only.

Label Examples

Let's look at some ways that you can use labels to group engines:

Labels by geographic region: Your organization has multiple data centers located in different geographic regions, and one central location that manages the flow of data across all of the data centers. Data engineers in the central location design pipelines used for all of the data centers. You assign an Authoring label to the single authoring Data Collector that runs in the central location.; You create a unique label for each of your data centers to designate the Data Collectors that run in those data centers.; You assign the label WestDataCenter to the Data Collectors installed in the data center located in the western region, and assign the label EastDataCenter to the Data Collectors installed in the eastern data center. When you create jobs, you select the appropriate data center label to ensure that the jobs are started on the group of Data Collectors installed in that data center.
Labels by environment: Your organization uses development and test environments to design and test pipelines before replicating the final pipelines in the production environment. You assign an Authoring label to an authoring Data Collector used to design Data Collector pipelines. And, you assign the Authoring label to an authoring Transformer used to design Transformer pipelines.; You create Test and Production labels to designate the Data Collectors and Transformers that run pipelines in the two environments.; You assign the Test label to Data Collectors and Transformers used to run test pipelines. You assign the Production label to Data Collectors and Transformers used to run production pipelines. When you create jobs, you select the appropriate label to ensure that the jobs run in the correct environment.
Labels by project: Your organization needs to build some Transformer pipelines for the Marketing department and for the Finance department.; Since you can use the same Transformer for pipeline design and pipeline execution, you can skip the Authoring label. Instead, you assign the Marketing label to the Transformers dedicated to the Marketing department, and you assign the Finance label to the Transformers dedicated to the Finance department.; When designing pipelines, you use a Transformer with the Marketing label to design the Marketing pipelines, and a Transformer with the Finance label to design the Finance pipelines.; When you create jobs, you select the appropriate department label to ensure that the jobs run on one of Transformers dedicated to that department.