Overview

Transformer pipelines run on Spark. Generally, you run Transformer pipelines on Spark deployed on a cluster to leverage the performance and scale that Spark offers. Though, when needed, you can run a local pipeline on the Transformer machine.

When running a pipeline on a cluster, Transformer submits the pipeline as a Spark application to the cluster. Spark distributes the processing across the nodes in the cluster.

You specify the cluster to run a pipeline on the Cluster tab of the pipeline properties. Then, you configure related cluster properties.

Transformer can run pipelines on the following cluster types:

For more information about supported versions and distributions, see the Installation chapter.