Scheduled Task Types

A scheduled task periodically triggers an action on one of the following task types:

Jobs

Use the scheduler to start, stop, or upgrade jobs on a regular basis. A single scheduled task cannot complete multiple types of actions.

When you define a scheduled task for a job, you specify one of the following actions that the task completes:

Start: Starts the job at the specified frequency.
Stop: Stops the job at the specified frequency.
Upgrade: Upgrades the job to use the latest pipeline version at the specified frequency.; When a scheduled task upgrades an inactive job, Control Hub updates the job to use the latest pipeline version. When a scheduled task upgrades an active job, Control Hub stops the job, updates the job to use the latest pipeline version, and then restarts the job.; For example, you might use the Control Hub scheduler to create a scheduled task that runs every Saturday at 1:00 AM to check if an active job has a later pipeline version. If a later pipeline version exists, the scheduled task stops the job, updates the job to use the latest pipeline version, and then restarts the job.

If a scheduled task triggers a job start when the job is already active, a job stop when the job is already inactive, or a job upgrade when no later pipeline version exists, then no action is performed. The scheduled task simply logs that it was not able to start, stop, or upgrade the job. The task then continues running until the next scheduled time when it triggers another job start, stop, or upgrade.

Batch and Streaming Jobs

Before scheduling a job to start or stop, consider whether the job is a batch job or a streaming job:

Batch job

A batch job includes a pipeline that processes all available data, and then stops. Create schedules for batch jobs to start the jobs on a regular basis.

For example, let's say that your dataflow topology updates a database table daily at 4 AM. Rather than have the pipeline process the data in a few minutes and sit idle for the rest of the day, you want to kick off the pipeline, have it process all data and then stop - just like traditional "batch" processing. You use the Pipeline Finisher executor in the pipeline to stop the pipeline when all data is processed.

You add the pipeline to a job and schedule the job to run daily at 4:00 AM. The scheduler starts the job daily at the specified time. After the remote pipeline instance transitions to a finished state, the job also transitions to an inactive state. The next day, the scheduler starts the job again so that the pipeline can process the new set of data.

When you schedule batch jobs, you typically schedule them as recurring events.

Streaming job

A streaming job includes a pipeline that maintains a connection to the origin system and processes data as it becomes available. The pipeline runs continuously until you manually stop it because you expect data to continuously arrive. In most cases, there's no need to schedule streaming jobs.

However, you might want to schedule a streaming job so that the job initially starts at some point in the future. For example, you want to schedule a job to initially start next Saturday at midnight when no DevOps engineer is available to manually start the job.

In this case, you would schedule the start of the streaming job as a one-time event.

Or, you might want to schedule a streaming job to start and stop on a regular basis. For example, you want to run a streaming job continuously every day of the week except for Sunday. You create one scheduled task that starts the job every Monday at 12:00 AM. Then, you create another scheduled task that stops the same job every Sunday at 12:00 AM. The next Monday at 12:00 AM, the scheduler starts the job again so that the pipeline can continue running.

In this case, you would schedule both the start and the stop of the streaming job as recurring events.