Transformer Pipeline Failover
You can enable a Transformer job for pipeline failover for some cluster types. Enable pipeline failover to prevent Spark applications from failing due to an unexpected Transformer shutdown.
When you start a Transformer job, Control Hub sends an instance of the pipeline to one Transformer with all labels specified for the job. Transformer remotely runs the pipeline instance on Apache Spark deployed to a cluster. Spark runs the application just as it runs any other application, distributing the processing across nodes in the cluster and automatically handling failover in the cluster as needed.
As the pipeline runs, Spark sends Transformer the status, metrics, and offsets for the running pipeline. Transformer then passes this information to Control Hub. If Transformer unexpectedly shuts down, Spark continues to run the application and attempts to reconnect to Transformer for several minutes. If Spark cannot reconnect to Transformer before Control Hub considers the engine as unresponsive, the Spark application fails.
When a job is enabled for failover, Control Hub can reassign the job to a backup Transformer when the initial Transformer becomes unresponsive. In this case, Spark continues to run the application and attempts to reconnect to Transformer for 10 minutes by default, twice the amount of time configured in the execution engine heartbeat intervalexecution engine heartbeat interval. If Spark can reconnect to an available backup Transformer during this time, Spark continues running the application and sends all information about the running pipeline to the backup Transformer, resulting in no loss of processing. If Spark cannot reconnect to a backup Transformer during this time, the Spark application fails.
An available Transformer includes any Transformer that is assigned all labels specified for the job and that has not exceeded any resource thresholds. When multiple Transformers are available as a backup, Control Hub prioritizes Transformers that are currently running the fewest number of pipelines.