Understanding the Spark Cluster Callback URL
To run a Transformer pipeline, Spark requires a valid cluster callback URL to communicate with Transformer.
For cluster pipelines, the Spark cluster must be able to access Transformer to send the status, metrics, and offsets for running pipelines. Similarly for local pipelines, the local Spark installation must be able to access the local Transformer instance.
Define a Transformer configuration property to specify the cluster callback URL that Transformer uses for all pipelines, by default. When needed, you can override the specified cluster callback URL in individual pipelines.
Cluster Callback URL Properties
- Transformer property: Driver Callback URL
- The Transformer driver callback URL property,
transformer.driver.callback.url
, defines the cluster callback URL that Spark uses to communicate with Transformer. - Pipeline override property: Cluster Callback URL
- When needed, you can override the Spark cluster callback URL defined in Transformer properties by configuring the Cluster Callback URL property in individual pipelines.
- Backup properties
- Best practice is to specify an appropriate Spark cluster callback URL using the properties described above.
Example
Say you registered Transformer to work with Control Hub and you set up a reverse proxy or a Kubernetes
Ingress service for Transformer.
You set the base URL Transformer
property, transformer.base.http.url
, to the reverse proxy or
Ingress service URL. This way, the Control Hub
web browser can access Transformer as
an authoring Transformer
for pipeline design.
A Spark cluster that runs inside the internal network cannot access Transformer
using the reverse proxy or Ingress service URL. So you define the Transformer
driver callback URL property, transformer.driver.callback.url
, to
enable Spark to access Transformer
directly. The URL defined for the property acts as the cluster callback URL for all
Transformer pipelines, by default.
When you need to use a different cluster callback URL for certain pipelines, you configure the Cluster Callback URL property in those pipelines to use a different URL.