When Transformer works with a Spark installation that runs on a cluster, the Spark cluster must be
able to access Transformer to
send the status, metrics, and offsets for running pipelines.
Note: Granting the Spark cluster access to
Transformer involves configuring the default
Transformer URL. When needed, you can configure a
cluster
callback URL for a pipeline to override the default URL.
-
When using one of the cloud service provider integrations that StreamSets provides, such as an Amazon EC2 or a Google Compute Engine (GCE) deployment,
locate the public IP address of the provisioned instance.
-
Launch the deployment to provision the instance.
-
Use the console for your cloud service provider to locate the
provisioned instance.
-
Copy the public IP address of the instance.
-
In Control Hub, edit the deployment. In the
Configure Engine section, click Advanced
Configuration. Then, click Transformer
Configuration.
-
Uncomment the
transformer.base.http.url
property and set it to
the Transformer URL.
For example, if using an EC2 or GCE deployment with the default Transformer port, use the public IP address that you copied from the cloud service
provider console to define the property as follows:
transformer.base.http.url=http://<IP address>:19630
If using a self-managed deployment with the default Transformer port on a host machine named myhost, define the
property as follows:
transformer.base.http.url=http://myhost:19630
Important: If a self-managed Transformer runs on a cloud-computing
platform, define the publicly accessible URL to that instance.
-
Save the changes to the deployment and restart all engine
instances.
-
Grant the Spark cluster access to Transformer at this URL.
For information about granting the Spark cluster access to other machines,
see the documentation for your Spark vendor.