Installing when Spark Runs Locally

To get started with Transformer in a development environment, install both Transformer and Spark on the same machine. This allows you to easily develop and test local pipelines, which run on the local Spark installation.

All users can install Transformer from a tarball and run it manually. Users with an enterprise account can install Transformer from an RPM package and run it as a service. Installing an RPM package requires root privileges.

When you install from an RPM package, Transformer uses the default directories and runs as the default system user and group. The default system user and group are named transformer. If a transformer user and a transformer group do not exist on the machine, the installation creates the user and group for you and assigns them the next available user ID and group ID.
Tip: To use specific IDs for the transformer user and group, create the user and group before installation and specify the IDs that you want to use. For example, if you're installing Transformer on multiple machines, you might want to create the system user and group before installation to ensure that the user ID and group ID are consistent across the machines.

Before you start, ensure that the machine meets all installation requirementsself-managed deployment and general installation requirements and choose the engine versioninstallation package that you want to use.

  1. Download the Transformer installation package from one of the following locations:

    If using the RPM package, download the appropriate package for your operating system:

    • For CentOS 6, Oracle Linux 6, or Red Hat Enterprise Linux 6, download the RPM EL6 package.
    • For CentOS 7, Oracle Linux 7, or Red Hat Enterprise Linux 7, download the RPM EL7 package.
  2. If you downloaded the tarball, use the following command to extract the tarball to the desired location:
    tar xf streamsets-transformer-all_<scala version>-<transformer version>.tgz -C <extraction directory>

    For example, to extract Transformer version 4.1.0 prebuilt with Scala 2.11.x, use the following command:

    tar xf streamsets-transformer-all_2.11-4.1.0.tgz -C /opt/streamsets-transformer/
  3. If you downloaded the RPM package, complete the following steps to extract and install the package:
    1. Use the following command to extract the package to the desired location:
      tar xf streamsets-transformer-<transformer version>-<operating system>-all-rpms.tar
      For example, to extract Transformer version 4.1.0 on CentOS 7, use the following command:
      tar xf streamsets-transformer-4.1.0-el7-all-rpms.tar
    2. To install the package, use the following command from the directory where you extracted the package:
      yum localinstall streamsets*.rpm
  4. Download Apache Spark from the Apache Spark Download page to the same machine as the Transformer installation.

    Download a supported Spark version that is valid for the Transformer features that you want to use.

  5. Extract the downloaded Spark file.
  6. Add the SPARK_HOME environment variable to the Transformer environment configuration file to define the Spark installation path on the Transformer machine.

    Modify environment variables using the method required by your installation type.

    Set the environment variable as follows:
    export SPARK_HOME=<Spark path>
    For example:
    export SPARK_HOME=/opt/spark-2.4.0-bin-hadoop2.7/