Granting the Spark Cluster Access to Transformer
When Transformer works with a Spark installation that runs on a cluster, the Spark cluster must be able to access Transformer to send the status, metrics, and offsets for running pipelines.
-
When using one of the cloud service provider integrations that StreamSets provides, such as an Amazon EC2 or a Google Compute Engine (GCE) deployment,
locate the public IP address of the provisioned instance.
- Launch the deployment to provision the instance.
- Use the console for your cloud service provider to locate the provisioned instance.
- Copy the public IP address of the instance.
- In Control Hub, edit the deployment. In the Configure Engine section, click Advanced Configuration. Then, click Transformer Configuration.
-
Uncomment the
transformer.base.http.url
property and set it to the Transformer URL.For example, if using an EC2 or GCE deployment with the default Transformer port, use the public IP address that you copied from the cloud service provider console to define the property as follows:
transformer.base.http.url=http://<IP address>:19630
If using a self-managed deployment with the default Transformer port on a host machine named myhost, define the property as follows:
transformer.base.http.url=http://myhost:19630
Important: If a self-managed Transformer runs on a cloud-computing platform, define the publicly accessible URL to that instance. - Save the changes to the deployment and restart all engine instances.
-
Grant the Spark cluster access to Transformer at this URL.
For information about granting the Spark cluster access to other machines, see the documentation for your Spark vendor.