Granting the Spark Cluster Access to Transformer
When Transformer works with a Spark installation that runs on a cluster, the Spark cluster must be able to access Transformer to send the status, metrics, and offsets for running pipelines.
Granting the Spark cluster access to Transformer involves specifying a cluster callback URL that Spark uses to communicate with Transformer.
- Self-managed deployment
- Cloud service provider deployment, including an Amazon EC2, Azure VM, or GCE deployment
- Kubernetes deployment
Granting Access for a Self-Managed Deployment
Complete the following steps when the Transformer engine belongs to a self-managed deployment.
- In Control Hub, edit the deployment. In the Configure Engine section, click Advanced Configuration. Then, click Transformer Configuration.
-
To specify a
cluster callback URL, uncomment the
transformer.driver.callback.url
property and set it to the Transformer URL.For example, if using the default Transformer port on a host machine named myhost, define the property as follows:
transformer.driver.callback.url=http://myhost:19630
For more information about specifying a cluster callback URL or overriding the URL in individual pipelines, see Understanding the Spark Cluster Callback URL.
- Save the changes to the deployment and restart all engine instances.
-
Grant the Spark cluster access to Transformer at this URL.
For information about granting the Spark cluster access to other machines, see the documentation for your Spark vendor.
Granting Access for Cloud Service Provider Deployments
Complete the following steps when the Transformer engine belongs to a cloud service provider deployment, including an Amazon EC2, Azure VM, or GCE deployment.
-
Locate the public IP address of the provisioned instance.
- Launch the deployment to provision the instance.
- Use the console for your cloud service provider to locate the provisioned instance.
- Copy the public IP address of the instance.
- In Control Hub, edit the deployment. In the Configure Engine section, click Advanced Configuration. Then, click Transformer Configuration.
-
To specify a
cluster callback URL, uncomment the
transformer.driver.callback.url
property and set it to the Transformer URL.For example, if using the default Transformer port on a host machine named myhost, define the property as follows:transformer.driver.callback.url=http://myhost:19630
For more information about specifying a cluster callback URL or overriding the URL in individual pipelines, see Understanding the Spark Cluster Callback URL.
- Save the changes to the deployment and restart all engine instances.
-
Grant the Spark cluster access to Transformer at this URL.
For information about granting the Spark cluster access to other machines, see the documentation for your Spark vendor.
Granting Access for a Kubernetes Deployment
Granting the Spark cluster access to Transformer when using a Kubernetes deployment involves exposing the Transformer container outside the cluster using a Kubernetes service.
You can also optionally associate an Ingress with the service. An Ingress can provide load balancing, SSL termination, and name-based virtual hosting to the services in a Kubernetes cluster.
For more information, see the Kubernetes services and Ingress documentation.
- In Control Hub, edit the deployment. In the Configure Engine section, click Advanced Configuration. Then, click Transformer Configuration.
-
To specify a
cluster callback URL, uncomment the
transformer.driver.callback.url
property and set it to the Transformer URL.For example, if using the default Transformer port on a host machine named myhost, define the property as follows:
transformer.driver.callback.url=http://myhost:19630
For more information about specifying a cluster callback URL or overriding the URL in individual pipelines, see Understanding the Spark Cluster Callback URL.
- Click Save, and then click Save & Next.
-
In the Configure Kubernetes Deployment section, click
Advanced Mode.
Make the following modifications to the generated YAML:
The following sample YAML displays the required modifications in bold:apiVersion: v1 kind: Service metadata: name: transformer-service namespace: streamsets spec: selector: app: streamsets-deployment-af79941c-ce31-42a5-aca4-16289259c2ff ports: - name: transformer-port protocol: TCP port: 19630 targetPort: 19630 clusterIP: None --- apiVersion: apps/v1 kind: Deployment metadata: labels: app: streamsets-deployment-af79941c-ce31-42a5-aca4-16289259c2ff name: streamsets-deployment-af79941c-ce31-42a5-aca4-16289259c2ff namespace: default spec: replicas: 1 selector: matchLabels: app: streamsets-deployment-af79941c-ce31-42a5-aca4-16289259c2ff template: metadata: labels: app: streamsets-deployment-af79941c-ce31-42a5-aca4-16289259c2ff spec: containers: - env: - name: STREAMSETS_DEPLOYMENT_ID value: 7d66fre4-dd0e-4d39-8747-e1aaa31561fd:9143b710-04d2-11ec-b891-41da57d4f127 - name: STREAMSETS_DEPLOYMENT_TOKEN value: eyJ0eXAiOiJKV1QiLCJhbGci5lIn0.eyJzIjoiNjdiZWIzM2NkOTA1ZmCJhbGci5lIMyMWRmNjBmNTVhCJhbGci5lIMTM0NTY4MMDQxYU3OThZGECJhbGci5lI1N2Q0ZjEyNyJ9. - name: STREAMSETS_DEPLOYMENT_SCH_URL value: https://na01.hub.streamsets.com image: streamsets/transformer:scala-2.12_5.3.0 name: streamsets-deployment-af71641c-ce31-43a5-aca4-18288259c2ff ports: - containerPort: 19630 resources: requests: memory: 1Gi cpu: 1 dnsPolicy: ClusterFirstWithHostNet
- When you finish modifying the YAML, click Save & Next.
- Save the changes to the deployment and restart all engine instances.
-
Grant the Spark cluster access to Transformer at this URL.
For information about granting the Spark cluster access to other machines, see the documentation for your Spark vendor.