Using Labels to Support Pipeline Failover

When deciding how to group execution Data Collectors or Transformers, consider whether you want to include one or more backup engines in the group to support pipeline failover for jobs.

You can enable pipeline failover for a job to minimize downtime due to unexpected pipeline failures and to help you achieve high availability. When you enable failover for a job, you must first define a group of execution engines that the job can start on, reserving at least one engine as a backup for pipeline failover.

When you enable failover for a Data Collector job, you must set the number of pipeline instances to a value less than the number of Data Collectors assigned all of the same labels as the job. For more information, see Data Collector Pipeline Failover.

For example, you use Test and Production labels to designate the Data Collectors that run in those environments. Each production job must run two remote pipeline instances. You want to ensure that there is minimal downtime in the production environment. You assign the Production label to four Data Collectors and configure each production job to run two pipeline instances. When you start the job, Control Hub identifies two available Data Collectors and starts pipeline instances on both. The third and fourth Data Collectors serve as backups and are available to continue processing pipelines if another Data Collector shuts down or a pipeline encounters an error.

For Transformer jobs, you cannot increase the number of pipeline instances. Control Hub always runs a single pipeline instance on one Transformer for each job. When you enable failover for a Transformer job, you must ensure that the job is configured to start on a group of at least two Transformers. For more information, see Transformer Pipeline Failover.