Labels Overview
Use labels to group engines. All engines of the same type that have the same label function as a group.
When
you create a job, you assign labels to the job so Control Hub
knows the group of associated engines to run the job on. For example, if you assign a
sales
label to a Transformer
job, then the job can run on any Transformer
with a sales
label.
When you start a job, Control Hub runs one pipeline instance on an available engine by default. For example, if three Data Collectors have all of the specified labels for a job that contains a standalone pipeline, by default Control Hub runs one pipeline instance on the Data Collector running the fewest number of pipelines.
You can increase the number of pipeline instances that Control Hub runs for a job for Data Collector pipelines. For Transformer pipelines, Control Hub runs a single pipeline instance for each job.
You can include one or more backup engines in a group to support pipeline failover for jobs.
Labels and Execution Component
You assign labels to the following execution components, using the same label for the components that you want to function as a group:
- Data Collector
- Each Data Collector can act as an authoring Data Collector used to design all types of pipelines and as an execution Data Collector used to run standalone and cluster pipelines.
- Transformer
- Each Transformer can act as an authoring Transformer and as an execution engine used to run Transformer pipelines. Use labels to group Transformers by any classification you choose.
- Deployment
- Labels that you assign to a deployment are assigned to all Data Collector containers provisioned by the deployment. After the provisioned Data Collectors are running and registered with Control Hub, you can assign additional labels to the provisioned Data Collectors just as you can for any registered Data Collector.
Labels and Pipeline Type
Control Hub determines the engine used to run a pipeline based on the pipeline type and the label assigned to the job and the engine.
For example, when you start a job for a Data Collector pipeline, Control Hub runs a pipeline instance on Data Collectors with labels that match those defined in the job.
Control Hub only runs a pipeline on the expected engine. That is, it won't run a Data Collector pipeline on a Transformer. You can, therefore, use the same labels on different engines, without worrying about whether Control Hub runs a pipeline on the wrong engine.
For example, you assign the label WestDataCenter
to a Data Collector and
to a Transformer. When you run a job with the WestDataCenter
label that
contains a Data Collector
pipeline, Control Hub
runs the pipeline on the Data Collector
only. When you run a job with the WestDataCenter
label that contains a
Transformer pipeline, Control Hub
runs the pipeline on the Transformer only.
Label Examples
Let's look at some ways that you can use labels to group engines:
- Labels by geographic region
- Your organization has multiple data centers located in different geographic
regions, and one central location that manages the flow of data across all
of the data centers. Data engineers in the central location
design pipelines used for all of the data centers. You assign an
Authoring
label to the single authoring Data Collector that runs in the central location. - Labels by environment
- Your organization uses development and test environments to design and test
pipelines before replicating the final pipelines in the production
environment. You assign an
Authoring
label to an authoring Data Collector used to design Data Collector pipelines. And, you assign theAuthoring
label to an authoring Transformer used to design Transformer pipelines. - Labels by project
- Your organization needs to build some Transformer pipelines for the Marketing department and for the Finance department.