***

You can run Transformer pipelines using Spark deployed on (BDC). Transformer supports . uses Apache Livy to submit Spark jobs.

To run a pipeline on , configure the pipeline to use as the cluster manager type on the Cluster tab of pipeline properties.

Important: The cluster must be able to access Transformer to send the status, metrics, and offsets for running pipelines. Grant the cluster access to the Transformer URL, as described in the installation instructionsGranting the Spark Cluster Access to Transformer.

You specify the Livy endpoint, as well as the user name and password to access the cluster through the endpoint.

You also define the staging directory within the cluster to store the StreamSets libraries and resources needed to run the pipeline.

Important: Due to an unresolved issue, you must complete the following task before running a pipeline. On the cluster, remove the mssql-mleap-lib-assembly-1.0.jar file from the following HDFS ZIP file: /system/spark/spark_libs.zip. This issue should be fixed in the next release.

The following image displays a pipeline configured to run using Spark deployed on at the specified Livy endpoint:

Note: The first time that you run a pipeline on , it can take 5-10 minutes for the pipeline to start. This occurs because Transformer must deploy Transformer files across the cluster. This should only occur the first time that you run a Transformer pipeline on the cluster.

StreamSets provides a quick start deployment script that enables you to easily try using as a cluster manager for Transformer pipelines without additional configuration. For example, you might use the script to try using as a cluster manager but aren't ready to upgrade to Transformer 3.13.x or later.