Spark Versions and Available Features
The Spark version on a cluster determines the Transformer features that you can use in pipelines that the cluster runs. The Spark version that you install on the Transformer machine determines the features that you can use in local and standalone pipelines.
Transformer does not need a local Spark installation to run cluster pipelines. However, Transformer does require a local Spark installation to perform certain tasks, such as using embedded Spark libraries to preview or validate pipelines, and starting pipelines in client deployment mode on Hadoop YARN clusters.
Important: StreamSets does not provide support for Spark
installations.
The following table describes the features available with
different Spark versions:Spark Version | Features |
---|---|
Apache Spark 2.3.x | Provides access to all Transformer features, except those listed below. |
Apache Spark 2.4.0 and later | Provides access to the following additional features:
|
Apache Spark 2.4.2 and later | Provides access to the following additional features:
|
Apache Spark 2.4.4 and later | Provides access to the following additional feature:
|
Apache Spark 3.0.0 and later | When you use Spark 3.0.0 or later, the following features are not
available at this time:
|