Job Instances Overview

A job instance is the execution of a pipeline. A job instance defines the pipeline to run and the engine that runs the pipeline.

After you publish pipelines, you create a job to specify the published pipeline to run. You can create the job from a pipeline or from a job template. You also assign labels to the job so that Control Hub knows which group of execution engines should run the pipeline.

By default when you start a job that contains a Data Collector pipeline, Control Hub sends an instance of the pipeline to one Data Collector with all labels after verifying that the Data Collector does not exceed its resource thresholds. The Data Collector remotely runs the pipeline instance. You can increase the number of pipeline instances that Control Hub runs for a Data Collector job.

In contrast, you cannot increase the number of pipeline instances that Control Hub runs for a Transformer or Transformer for Snowflake job.

To minimize downtime due to unexpected failures, enable pipeline failover for jobs. Control Hub manages pipeline failover differently based on the engine type.

If a job includes a pipeline that uses runtime parameters, you specify the parameter values that the job uses for the pipeline instances.

When you stop a job, Control Hub instructs all execution engines running pipelines for the job to stop the pipelines.

After you create jobs, you create a topology to map multiple related jobs into a single view. Topologies are the end-to-end view of multiple dataflows. From a single topology view, you can start, stop, monitor, and synchronize all jobs included in the topology.