SQL Server 2019 Big Data Cluster
You can run Transformer pipelines using Spark deployed on SQL Server 2019 Big Data Cluster (BDC). Transformer supports SQL Server 2019 Cumulative Update 4 or later. SQL Server 2019 BDC uses Apache Livy to submit Spark jobs.
You specify the Livy endpoint, as well as the user name and password to access the cluster through the endpoint. When you start the pipeline, Transformer uses these credentials to launch the Spark application.
You also define the staging directory within the cluster to store the StreamSets libraries and resources needed to run the pipeline.
mssql-mleap-lib-assembly-1.0.jar
file from
the following HDFS ZIP file: /system/spark/spark_libs.zip
. This
issue should be fixed in the next SQL Server 2019 BDC release.The following image displays a pipeline configured to run using Spark deployed on SQL Server 2019 BDC at the specified Livy endpoint:
StreamSets provides a quick start deployment script that enables you to easily try using SQL Server 2019 BDC as a cluster manager for Transformer pipelines without additional configuration. For example, you might use the script to try using SQL Server 2019 BDC as a cluster manager but aren't ready to upgrade to Transformer 3.13.x or later.