Overview

Transformer can work with Apache Spark that runs locally on a single machine or that runs on a cluster.

To get started with Transformer in a development environment, you can simply install both Transformer and Spark on the same machine and run Spark locally on that machine.

In a production environment, use a Spark installation that runs on a cluster to leverage the performance and scale that Spark offers. Install Transformer on a machine that is configured to submit Spark jobs to a cluster. You can install Transformer on premises or in the cloud.

After the installation, enable HTTPS for Transformer to secure the communication to the Transformer UI and REST API and to use Transformer with StreamSets Control Hub.

You must also complete the required prerequisites for some stages before using them in a pipeline.