Managing Jobs

After publishing pipelines, create a job for each pipeline that you want to run. When you create a job that includes a pipeline with runtime parameters, you can enable the job to work as a job template.

When you start a job, Control Hub sends an instance of the pipeline to an execution engine assigned all labels assigned to the job.

When a job is active, you can synchronize or stop the job.

When a job is inactive, you can reset the origin for the job, edit the job, or delete the job.

When a job is active or inactive, you can edit the latest pipeline version, upgrade the job to use the latest pipeline version, or schedule the job to start, stop, or upgrade on a regular basis.

Creating Jobs and Job Templates

After publishing pipelines, create a job for each pipeline that you want to run. You can create a job for each published pipeline version.

When you create a job that includes a pipeline with runtime parameters, you can enable the job to work as a job template.

You can create jobs and job templates from the Jobs view or from the Pipelines view. You can also create a job or job template from a published pipeline in the pipeline canvas.

  1. From the Jobs view, click the Add Job icon: .
  2. Select the published pipeline to run, and then click Next.
  3. On the Add Job window, configure the following properties:
    Job Property Description
    Job Name Job name.
    Description Optional description of the job.
    Pipeline Published pipeline that you want to run.
    Pipeline Commit / Tag Pipeline commit or pipeline tag assigned to the published pipeline version that you want to run. You can create a job for any published pipeline version.

    By default, Control Hub displays the latest published pipeline version.

    Execution Engine Labels Label or labels that determine the group of execution engines that run the pipeline.

    Labels are case sensitive.

    Job Tags Tags that identify similar jobs or job templates. Use job tags to easily search and filter jobs and job templates.

    Enter nested tags using the following format:

    <tag1>/<tag2>/<tag3>

    Enable Job Template Enables the job to work as a job template. A job template lets you run multiple job instances with different runtime parameter values from a single job definition.

    Enable only for jobs that include pipelines that use runtime parameters.

    Note: You can enable a job to work as a job template only during job creation. You cannot enable an existing job to work as a job template.
    Statistics Refresh Interval (ms) Milliseconds to wait before automatically refreshing statistics when you monitor the job.

    The minimum and default value is 60,000 milliseconds.

    Enable Time Series Analysis Enables Control Hub to store time series data which you can analyze when you monitor the job.

    When time series analysis is disabled, you can still view the total record count and throughput for a job, but you cannot view the data over a period of time. For example, you can’t view the record count for the last five minutes or for the last hour.

    Number of Instances Number of pipeline instances to run for the job. Increase the value only when the pipeline is designed for scaling out.

    Default is 1, which runs one pipeline instance on an available Data Collector running the fewest number of pipelines. An available Data Collector is an engine assigned all labels specified for the job.

    Available for Data Collector jobs only.

    Enable Failover

    Enables Control Hub to restart a pipeline on another available engine when the original engine shuts down unexpectedly.

    Default is disabled.

    Control Hub manages pipeline failover differently based on the engine type, as described in the following topics:
    Failover Retries per Data Collector Maximum number of pipeline failover retries to attempt on each available Data Collector.

    When a Data Collector reaches the maximum number of failover retries, Control Hub does not attempt to restart additional failed pipelines for the job on that Data Collector.

    Use -1 to retry indefinitely.

    Available for Data Collector jobs when failover is enabled.

    Global Failover Retries Maximum number of pipeline failover retries to attempt across all available engines. When the maximum number of global failover retries is reached, Control Hub stops the job.

    Use -1 to retry indefinitely.

    Control Hub manages failover retries differently based on the engine type, as described in the following topics:

    Available when failover is enabled.

    Require Job Error Acknowledgement Requires that users acknowledge an inactive error status due to connectivity issues before the job can be restarted.
    Clear the property for a scheduled job so the job can automatically be restarted without requiring user intervention.
    Important: Clear the property with caution, as doing so might hide errors that the job has encountered.
    Pipeline Force Stop Timeout (ms) Number of milliseconds to wait before forcing remote pipeline instances to stop.

    In some situations when you stop a job, a remote pipeline instance can remain in a Stopping state. For example, if a scripting processor in the pipeline includes code with a timed wait or an infinite loop, the pipeline remains in a Stopping state until it is force stopped.

    Default is 120,000 milliseconds, or 2 minutes.

    Runtime Parameters Runtime parameter values to start the pipeline instances with. Overrides the default parameter values defined for the pipeline.

    Click Get Default Parameters to display the parameters and default values as defined in the pipeline, and then override the default values.

    You can configure parameter values using simple or bulk edit mode. In bulk edit mode, configure parameter values in JSON format.

  4. To add another job for the same pipeline, click Add Another and then configure the properties for the additional job.
  5. Click Save when you have finished configuring all jobs.
    Control Hub displays the job in the Jobs view.

Starting Jobs and Job Templates

When you start a job, Control Hub sends an instance of the pipeline to an execution engine assigned all labels assigned to the job.

When you start a job template, you create and start one or more job instances from the template. Control Hub sends an instance of the pipeline to an execution engine for each job instance that you create.

Tip: You can also start a job from a topology. Or, instead of manually starting jobs, you can also use the Control Hub scheduler to schedule jobs to start on a regular basis.

Before sending an instance of a pipeline to an execution engine, Control Hub verifies that the engine does not exceed its resource thresholds.

Starting Jobs

When you start a job, you start a single job instance.

  1. In the Navigation panel, click Jobs.
  2. To start a single job, hover over the inactive job and then click the Start Job icon: .
  3. To start multiple jobs, select inactive jobs in the list and then click the Start Job icon above the job list.

Starting Job Templates

When you start a job template, you create and start one or more job instances from the template.

You specify a suffix to uniquely name each job instance, the number of job instances to create from the template, and the parameter values to use for each job instance.

  1. In the Navigation panel, click Jobs.
  2. Hover over the job template and then click the Start Job icon: .
  3. On the Create and Start Job Instances window, configure the following properties:
    Job Instances Property Description
    Instance Name Suffix Suffix used to uniquely name each job instance:
    • Counter
    • Timestamp
    • Parameter Value
    The suffix is added to the job template name after a hyphen, as follows:
    <job template name> - <suffix>
    Runtime Parameters for Each Instance Runtime parameter values for each job instance. Overrides the default parameter values defined for the pipeline.

    Define the parameter values in bulk edit mode in JSON format. Or, define them in a file in JSON format and upload the file.

    Add Another Instance When using bulk edit mode to define parameter values, click Add Another Instance to create another job instance.

    Control Hub adds another group of runtime parameters and values to the list of runtime parameters.

    For example, the following image shows a job template that creates and starts two job instances using a counter for the suffix name:

  4. Click Create and Start.

Filtering Jobs and Job Templates

In the Jobs view, you can filter the list of displayed jobs and job templates by engine type, job status, job status color, execution engine label, or job tag. Or you can search for jobs and job templates by name.

  1. In the Navigation panel, click Jobs.
  2. If the Filter column does not display, click the Toggle Filter Column icon: .

    The following image displays the Filter column in the Jobs view:

  3. To search for jobs and job templates by name, enter text in the search field, and then press Return.
  4. Select an engine type, job status, job status color, execution engine label, or job tag to additionally filter the list of jobs and job templates.
  5. Select the Keep Filter Persistent checkbox to retain the filter when you return to the view.
    Tip: To share the applied filter, copy the URL and send it to another user in your organization.

Synchronizing Jobs

Synchronize a job when you've changed the labels assigned to Data Collectors and the job is actively running on those engines. Or, synchronize a job to trigger a restart of a non running pipeline that has encountered an error.

Note: You cannot synchronize a Transformer job.
When you synchronize an active job, Control Hub performs the following actions:
  • Stops the job so that all running pipeline instances are stopped, and then waits until each Data Collector sends the last-saved offset back to Control Hub. Control Hub maintains the last-saved offsets for all pipeline instances in the job.
  • Reassigns the pipeline instances to Data Collectors as follows, sending the last-saved offset for each pipeline instance to a Data Collector:
    • Assigns pipeline instances to additional Data Collectors that match the same labels as the job and that have not exceeded any resource thresholds.
    • Does not assign pipeline instances to Data Collectors that no longer match the same labels as the job.
    • Reassigns pipeline instances on the same Data Collector that matches the same labels as the job and that has not exceeded any resource thresholds. For example, a pipeline might have stopped running after encountering an error or after being deleted from that Data Collector.
  • Starts the job, which restarts the pipeline instances from the last-saved offsets so that processing can continue from where the pipelines last stopped.

For example, let’s say a job is active on three Data Collectors with label Test. If you remove label Test from one of the Data Collectors, synchronize the active job so that the pipeline stops running on that Data Collector. Or, let's say that one of the three pipelines running for the job has encountered an error and has stopped running. If you synchronize the active job, Control Hub triggers a restart of the pipeline on that same Data Collector.

Note: To redistribute the pipeline load for a job enabled for failover, you must balance the job. For a comparison of the key differences between balancing and synchronizing jobs, see Comparing Balance Jobs and Synchronize Jobs.
To synchronize active jobs from the Jobs view, select jobs in the list, and then click the Sync Job icon: . Or to synchronize an active job when monitoring the job, click the Sync Job icon.
Tip: You can also synchronize jobs from a topology.

Job Offsets

Just as execution engines maintain the last-saved offset for some origins when you stop a pipeline, Control Hub maintains the last-saved offset for the same origins when you stop a job.

Let's look at how Control Hub maintains the offset for Data Collector pipelines. Control Hub maintains the offset for Transformer pipelines the same way:

  1. When you start a job, Control Hub can run a remote pipeline instance on each Data Collector assigned all labels assigned to the job. As a Data Collector runs a pipeline instance, it periodically sends the latest offset to Control Hub. If a Data Collector becomes disconnected from Control Hub, the Data Collector maintains the offset. It updates Control Hub with the latest offset as soon as it reconnects to Control Hub.
  2. When you stop a job, Control Hub instructs all Data Collectors running pipelines for the job to stop the pipelines. The Data Collectors send the last-saved offsets back to Control Hub. Control Hub maintains the last-saved offsets for all pipeline instances in that job.
  3. When you restart the job, Control Hub sends the last-saved offset for each pipeline instance to a Data Collector so that processing can continue from where the pipeline last stopped. Control Hub determines the Data Collector to use on restart based on whether failover is enabled for the job:
    • Failover is disabled - Control Hub sends the offset to the same Data Collector that originally ran the pipeline instance. In other words, Control Hub associates each pipeline instance with the same Data Collector.
    • Failover is enabled - Control Hub sends the offset to a different Data Collector with matching labels.

You can view the last-saved offset sent by each execution engine in the job History view.

If you want the execution engines to process all available data instead of processing data from the last-saved offset, simply reset the origin for the job before restarting the job. When you reset the origin for a job, you also reset the job metrics.

Note: If you edit the job so that it contains a new pipeline version with a different origin, reset the origin before restarting the job.

Origins that Maintain Offsets

Control Hub maintains the last-saved offset for the same origins as execution engines. Execution engines maintain offsets for some origins only.

Data Collector Origins

Data Collector maintains offsets for the following origins:

  • Amazon S3
  • Aurora PostgreSQL CDC Client
  • Azure Blob Storage
  • Azure Data Lake Storage Gen1
  • Azure Data Lake Storage Gen2
  • Azure Data Lake Storage Gen2 (Legacy)
  • Directory
  • Elasticsearch
  • File Tail
  • Google Cloud Storage
  • Groovy Scripting
  • Hadoop FS Standalone
  • HTTP Client
  • JavaScript Scripting
  • JDBC Multitable Consumer
  • JDBC Query Consumer
  • Jython Scripting
  • Kinesis Consumer
  • MapR DB JSON
  • MapR FS Standalone
  • MongoDB
  • MongoDB Atlas
  • MongoDB Atlas CDC
  • MongoDB Oplog
  • MySQL Binary Log
  • Oracle CDC
  • Oracle CDC Client
  • PostgreSQL CDC Client
  • Salesforce
  • Salesforce Bulk API 2.0
  • SAP HANA Query Consumer
  • SFTP/FTP/FTPS Client
  • SQL Server 2019 BDC Multitable Consumer
  • SQL Server CDC Client
  • SQL Server Change Tracking
  • Teradata Consumer
  • Windows Event Log

Transformer Origins

Transformer maintains offsets for all origins that can be included in both batch and streaming pipelines as long as the origin has the Skip Offset Tracking property cleared.

Transformer does not maintain offsets for the following origins that can be included in batch pipelines only:
  • Delta Lake
  • Kudu
  • Whole Directory

Resetting the Origin for Jobs

Reset the origin when you want the execution engines running the pipeline to process all available data instead of processing data from the last-saved offset.

You can reset the origin for all inactive jobs. When you reset an origin that maintains the offset, you reset both the origin and the metrics for the job. When you reset an origin that does not maintain the offset, you reset only the metrics for the job.

You can reset the origin from the Jobs view.
Tip: You can also reset the origin for a job from a topology.

To reset origins from the Jobs view, select jobs in the list, click the More icon () and then click Reset Origin.

Uploading an Initial Offset File

You can upload an initial offset file for a job.

Upload an initial offset file when you first run a pipeline in Data Collector or Transformer, publish the pipeline to Control Hub, and then want to continue running the pipeline from the Control Hub job using the last-saved offset maintained by the execution engine.

When you run pipelines from Control Hub jobs only, you won't need to upload an initial offset file for the job. Control Hub maintains the last-saved offset when you stop a job.

You can upload an initial offset file for a job when all of the following are true:
  • The job is inactive.
  • The job runs a single pipeline instance.
  • The job has never been started, or the job has been started and stopped and is enabled for pipeline failover.
  1. Locate the pipeline offset file on the Data Collector or Transformer machine.
    Each pipeline offset file is named offset.json. Data Collector offset files are located in the following directory:
    $SDC_DATA/runinfo/<pipelineID>/0
    Transformer offset files are located in the following directory:
    $TRANSFORMER_DATA/runinfo/<pipelineID>/0
  2. In the Control Hub Navigation panel, click Jobs.
  3. Select a job and then click the Upload Offset icon: .
  4. In the Upload Offset window, select the pipeline offset file to upload.
  5. Click Import.

    When you monitor the job, the History tab displays the initial offset that you uploaded. When you start the job, the job uses the offset saved in the file as the initial offset.

Editing the Latest Pipeline Version

While viewing an inactive job or monitoring an active job, you can access the latest version of the pipeline to edit the pipeline.

When you view or monitor a job, Control Hub displays a read-only view of the pipeline in the pipeline canvas. To edit the latest version of the pipeline, click the Edit icon next to the job name, and then click Edit Latest Version of Pipeline, as follows:

Control Hub creates a new draft of the latest version of the pipeline, and opens the draft in edit mode in the pipeline canvas.

When you edit a pipeline from a job, the job is not automatically updated to use the newly edited version. You must upgrade the job to use the latest published pipeline version. When working with job templates, you upgrade the job template to use the latest version.

Upgrading to the Latest Pipeline Version

You can upgrade a job or a job template to use the latest published pipeline version.

When a job or job template includes a pipeline that has a later published version, Control Hub notifies you by displaying the New Pipeline Version icon () next to the job or template.

You can simply click the icon to upgrade the job or job template to use the latest pipeline version. Or, you can select jobs or job templates in the Jobs view, click the More icon () and then click Use Latest Pipeline Version.

When you upgrade to the latest pipeline version, the tasks that Control Hub completes depend on the following job types:

Inactive job or a job template
When you upgrade an inactive job or a job template, Control Hub updates the job or job template to use the latest pipeline version.
When working with job templates, you must stop and restart the job instances so that they use the latest published pipeline version included in the job template.
Active job
When you upgrade an active job, Control Hub stops the job, updates the job to use the latest pipeline version, and then restarts the job. During the process, Control Hub displays a temporary Upgrading status for the job.

Stopping Jobs

Stop a job when you want to stop processing data for the pipeline included in the job.

When stopping a job, Control Hub waits for the pipeline to gracefully complete all tasks for the in-progress batch. In some situations, this can take several minutes.

For example, if a scripting processor includes code with a timed wait, Control Hub waits for the scripting processor to complete its task. Then, Control Hub waits for the rest of the pipeline to complete all tasks before stopping the pipeline.

When you stop a job that includes an origin that can be reset, Control Hub maintains the last-saved offset for the job. For more information, see Job Offsets.

  1. In the Navigation panel, click Jobs.
  2. Select active jobs in the list, and then click the Stop icon: . Or when monitoring an active job, click the Stop icon.
    Tip: You can also stop a job from a topology.
  3. In the confirmation dialog box that appears, click Yes.

    Depending on the pipeline complexity, the job might take some time to stop.

    When a job remains in a Deactivating state for an unexpectedly long period of time, you can force Control Hub to stop the job immediately.

Forcing a Job to Stop

When a job remains in a Deactivating state, you can force Control Hub to stop the job immediately.

When forcing a job to stop, Control Hub often stops pipeline processes before they complete, which can lead to unexpected results.
Important: Jobs can take a long time to stop gracefully, depending on the processing logic included in the pipeline. Use this option only after waiting an appropriate amount of time for the job to come to a graceful stop.
  1. In the Navigation panel, click Jobs.
  2. Select the job in the list, click the More icon (), and then click Force Stop. Or from the job monitoring view, click Force Stop.

    A confirmation dialog box appears.

  3. To force stop the job, click Yes.

Scheduling Jobs

You can use the Control Hub scheduler to schedule jobs to start, stop, or upgrade to the latest pipeline version on a regular basis.

For more information about using the scheduler to schedule jobs, see Scheduled Task Types.

Editing Jobs

You can edit inactive jobs to change the job definition. When job instances are started from a job template, edit the job template to change the job definition. You cannot edit inactive job instances started from a job template.

Edit inactive jobs or job templates from the Jobs view. Hover over the inactive job or job template, and click the Edit icon: .

You can edit inactive jobs or job templates to change the following information:
  • Description
  • Pipeline commit/tag - You can select a different pipeline version to run.

    For example, after you start a job, you realize that the developer forgot to enable a metric rule for the pipeline, so you stop the job. You inform your developer, who edits the pipeline rules in the pipeline canvas and republishes the pipeline as another version. You edit the inactive job to select that latest published version of the pipeline, and then start the job again.

    Important: If you edit the job so that it contains a new pipeline version with a different origin, you must reset the origin before restarting the job.
  • Execution Engine Labels - You can assign and remove labels from the job to change the group of execution engines that run the pipeline.
  • Job Tags - You can assign and remove tags from the job to identify the job in a different way.
  • Statistics Refresh Interval - You can change the milliseconds to wait before Control Hub refreshes the statistics when you monitor the job.
  • Enable Time Series Analysis - You can change whether Control Hub stores time series data which you can analyze when you monitor the job.
  • Number of Instances - You can change the number of pipeline instances run for Data Collector jobs.
  • Pipeline Force Stop Timeout - You can change the number of milliseconds to wait before Control Hub forces remote pipeline instances to stop.
  • Runtime Parameters - You can change the values used for the runtime parameters defined in the pipeline.
  • Enable or disable failover - You can enable or disable pipeline failover for jobs.
    Control Hub manages pipeline failover differently based on the engine type, as described in the following topics:

Duplicating Jobs

Duplicate a job or job template to create one or more exact copies of an existing job or job template. You can then change the configuration and runtime parameters of the copies.

You duplicate jobs and job templates from the Jobs view in Control Hub.

  1. In the Navigation panel, click Jobs.
  2. Select a job or job template in the list and then click the Duplicate icon: .
  3. Enter a name for the duplicate and the number of copies to make.

    When you create multiple copies, Control Hub appends an integer to the job or job template name. For example, if you enter the name MyJob and create two copies of a job, Control Hub names the duplicate jobs MyJob1 and MyJob2.

  4. Click Duplicate.
    Control Hub adds the duplicated jobs or job templates to the list of jobs. You can edit them as necessary.

Deleting Jobs

You can delete inactive jobs and job templates. Control Hub automatically deletes inactive job instances older than 365 days that have never been run. Before you delete a job template, delete all inactive job instances created from that template.

Tip: As a best practice, delete any scheduled task or data delivery report based on the job before you delete the job. Scheduled tasks and report generations based on a deleted job will fail.
  1. In the Navigation panel, click Jobs.
  2. Select jobs or templates in the list, and then click the Delete icon: .