Job Offsets
Just as execution engines maintain the last-saved offset for some origins when you stop a pipeline, Control Hub maintains the last-saved offset for the same origins when you stop a job.
Let's look at how Control Hub maintains the offset for Data Collector pipelines. Control Hub maintains the offset for Data Collector Edge and Transformer pipelines the same way:
- When you start a job, Control Hub can run a remote pipeline instance on each Data Collector assigned all labels assigned to the job. As a Data Collector runs a pipeline instance, it periodically sends the latest offset to Control Hub. If a Data Collector becomes disconnected from Control Hub, the Data Collector maintains the offset. It updates Control Hub with the latest offset as soon as it reconnects to Control Hub.
- When you stop a job, Control Hub instructs all Data Collectors running pipelines for the job to stop the pipelines. The Data Collectors send the last-saved offsets back to Control Hub. Control Hub maintains the last-saved offsets for all pipeline instances in that job.
- When you restart the job, Control Hub sends the last-saved offset for each pipeline instance to a Data Collector so that processing can continue from where the pipeline last stopped. Control Hub determines the Data Collector to use on restart based on whether failover is enabled for the job:
- Failover is disabled - Control Hub sends the offset to the same Data Collector that originally ran the pipeline instance. In other words, Control Hub associates each pipeline instance with the same Data Collector.
- Failover is enabled - Control Hub sends the offset to a different Data Collector with matching labels.
Or, your system administrator can optionally configure Control Hub to always migrate job offsets to different Data Collectors when you restart a job.
You can view the last-saved offset sent by each execution engine in the job History view.
If you want the execution engines to process all available data instead of processing data from the last-saved offset, simply reset the origin for the job before restarting the job. When you reset the origin for a job, you also reset the job metrics.