Understanding the Spark Cluster Callback URL

To run a Transformer pipeline, Spark requires a valid cluster callback URL to communicate with Transformer.

For cluster pipelines, the Spark cluster must be able to access Transformer to send the status, metrics, and offsets for running pipelines. Similarly for local pipelines, the local Spark installation must be able to access the local Transformer instance.

Define a Transformer configuration property to specify the cluster callback URL that Transformer uses for all pipelines, by default. When needed, you can override the specified cluster callback URL in individual pipelines.

Cluster Callback URL Properties

Transformer uses the following properties to locate a valid Spark cluster callback URL:
Transformer property: Driver Callback URL
The Transformer driver callback URL property, transformer.driver.callback.url, defines the cluster callback URL that Spark uses to communicate with Transformer.
Use this Transformer property to specify the cluster callback URL that is used with all pipelines by default. You can override the specified URL in individual pipelines when needed.
For information about configuring the driver callback URL property, see Granting the Spark Cluster Access to Transformer.
Pipeline override property: Cluster Callback URL
When needed, you can override the Spark cluster callback URL defined in Transformer properties by configuring the Cluster Callback URL property in individual pipelines.
Use this property when you want a pipeline to use a different cluster callback URL from the cluster callback URL defined in Transformer properties.
Important: Do not configure the Cluster Callback URL property when you plan to enable pipeline failover for the job that includes this pipeline. To support failover, the pipeline must use a cluster callback URL defined in the Transformer configuration properties.
For more information, see Cluster Callback URL.
Backup properties
Best practice is to specify an appropriate Spark cluster callback URL using the properties described above.
If those properties are not defined, Transformer attempts to use a URL defined elsewhere. The URLs in these backup options can only function as a Spark cluster callback URL under certain circumstances.
Transformer uses the following backup options when trying to find a URL to act as a Spark cluster callback URL:
  • Transformer base HTTP URL property, transformer.base.http.url - Defines the URL that Control Hub uses to communicate with Transformer when your Control Hub organization is configured to use direct engine REST APIs instead of WebSocket tunneling.

    The base HTTP URL can act as the Spark cluster callback URL when your network architecture allows using the same URL to communicate with Spark and Control Hub.

    For information about configuring the base URL property, see Enabling HTTPS.

  • Transformer HTTP bind host property, http.bindHost - Defines the host name or IP address that Transformer binds to. Used to define a specific network interface to listen to HTTP.
  • Host name of the Transformer machine - If none of the properties above are defined, Transformer attempts to use the host name of the Transformer machine, as it is reported by the operating system.

Example

Say you set up a reverse proxy or a Kubernetes Ingress service for Transformer. You set the base URL Transformer property, transformer.base.http.url, to the reverse proxy or Ingress service URL. This way, the Control Hub web browser can access Transformer as an authoring Transformer for pipeline design.

A Spark cluster that runs inside the internal network cannot access Transformer using the reverse proxy or Ingress service URL. So you define the Transformer driver callback URL property, transformer.driver.callback.url, to enable Spark to access Transformer directly. The URL defined for the property acts as the cluster callback URL for all Transformer pipelines, by default.

When you need to use a different cluster callback URL for certain pipelines, you configure the Cluster Callback URL property in those pipelines to use a different URL.