Job Instances Overview

A job instance is the execution of a pipeline. A job instance defines the pipeline to run and the engine that runs the pipeline.

After you publish pipelines, you create a job to specify the published pipeline to run. You can create the job from a pipeline or from a job template. You also assign labels to the job so that Control Hub knows which group of execution engines should run the pipeline.

By default when you start a job that contains a Data Collector pipeline, Control Hub sends an instance of the pipeline to one Data Collector with all labels after verifying that the Data Collector does not exceed its resource thresholds. The Data Collector remotely runs the pipeline instance. You can increase the number of pipeline instances that Control Hub runs for a Data Collector job.

In contrast, you cannot increase the number of pipeline instances that Control Hub runs for a Transformer job. When you start a job that contains a Transformer pipeline, Control Hub sends an instance of the pipeline to one Transformer with all labels after verifying that Transformer does not exceed its resource thresholds. Transformer remotely runs the pipeline instance on Apache Spark deployed to a cluster. Because Transformer runs pipelines on Spark, Spark runs the application just as it runs any other application, distributing the processing across nodes in the cluster.

To minimize downtime due to unexpected failures, enable pipeline failover for jobs. Control Hub manages pipeline failover differently for Data Collector and Transformer jobs.

If a job includes a pipeline that uses runtime parameters, you specify the parameter values that the job uses for the pipeline instances.

When you stop a job, Control Hub instructs all execution engines running pipelines for the job to stop the pipelines.

After you create jobs, you create a topology to map multiple related jobs into a single view. Topologies are the end-to-end view of multiple dataflows. From a single topology view, you can start, stop, monitor, and synchronize all jobs included in the topology.

Working with Job Instances

The Job Instances view lists all job instances that have been created for your organization.

You can complete the following tasks in the Job Instances view:

  • View job details, including the pipeline version, the job status, and the engine that runs the pipeline.
  • Create jobs.
  • Create and start job instances from job templates.
  • Search for job instances.
  • Duplicate jobs.
  • Start and stop jobs.
  • View the job status and the status of remote pipeline instances run from the job.
  • Monitor active jobs.
  • Upgrade a job to use the latest pipeline version.
  • Reset the origin and metrics for jobs.
  • Enable pipeline failover for jobs.
  • Balance a job enabled for pipeline failover to redistribute the pipeline load across available engines.
  • Synchronize an active job after you update the labels assigned to engines.
  • Schedule jobs to start, stop, or upgrade on a regular basis.
  • Create a topology for selected jobs, as described in Create Topologies.
  • Import and export jobs.
  • Share a job with other users and groups.
  • Delete jobs.

The following image shows a list of jobs in the Job Instances view. Each job is listed with the job name, pipeline name, pipeline version, job status, and pipeline status:

Note the following icons that display in the Job Instances view or when you hover over a single job. You'll use these icons frequently as you manage jobs:

Icon Name Description
Add Job Add a job.
Import Jobs Import jobs.
Refresh Refresh the list of jobs in the view.
Duplicate Job Duplicate a job.
Start Job Start the job.
Synchronize Job Synchronize an active job after you have updated the labels assigned to engines.
Balance Job Balance a job enabled for pipeline failover to redistribute the pipeline load across available engines.
Stop Job Stop the job.
Acknowledge Error Acknowledge error messages for the job.
Upload Offset Upload an initial offset file for the job.
Schedule Job Schedule the job to start on a regular basis.
Share Share the job with other users and groups, as described in Permissions.
New Pipeline Version Upgrade a job to use the latest pipeline version.
Edit Edit an inactive job.
Delete Delete an inactive job.
Export Jobs Export the selected jobs.

Requirement for Jobs

Before you create a job, you need to publish the pipeline that you want to use.

To publish a pipeline, use the Check In icon () in the pipeline canvas.