Installing when Spark Runs on a Cluster
All users can install Transformer from a tarball and run it manually. Users with an enterprise account can install Transformer from an RPM package and run it as a service. Installing an RPM package requires root privileges.
transformer
. If a transformer
user and a
transformer
group do not exist on the machine, the installation
creates the user and group for you and assigns them the next available user ID and group
ID.transformer
user
and group, create the user and group before installation and specify the IDs that
you want to use. For example, if you're installing Transformer on
multiple machines, you might want to create the system user and group before
installation to ensure that the user ID and group ID are consistent across the
machines.Before you start, ensure that the machine meets all installation requirementsself-managed deployment and general installation requirements and choose the engine versioninstallation package that you want to use.
-
Download the Transformer installation package from one of the following locations:
- StreamSets Support portal if you have an enterprise account.
- StreamSets website if you do not have an enterprise account.
If using the RPM package, download the appropriate package for your operating system:
- For CentOS 6, Oracle Linux 6, or Red Hat Enterprise Linux 6, download the RPM EL6 package.
- For CentOS 7, Oracle Linux 7, or Red Hat Enterprise Linux 7, download the RPM EL7 package.
-
If you downloaded the tarball, use the following command to extract the tarball
to the desired location:
tar xf streamsets-transformer-all_<scala version>-<transformer version>.tgz -C <extraction directory>
For example, to extract Transformer version 4.1.0 prebuilt with Scala 2.11.x, use the following command:
tar xf streamsets-transformer-all_2.11-4.1.0.tgz -C /opt/streamsets-transformer/
-
If you downloaded the RPM package, complete the following steps to extract and
install the package:
-
Edit the Transformer configuration file,
$TRANSFORMER_CONF/transformer.properties.
-
Add the following environment variables to the Transformer environment configuration file.
Modify environment variables using the method required by your installation type.
Environment Variable Description JAVA_HOME Path to the Java installation on the machine. SPARK_HOME Path to the Spark installation on the machine. Required for Hadoop YARN and Spark standalone clusters only. Clusters can include multiple Spark installations. Be sure to point to a supported Spark version that is valid for the Transformer features that you want to use.
On Cloudera clusters, Spark is generally installed into the parcels directory. For example, for CDH 5.11, you might use: /opt/cloudera/parcels/SPARK2/lib/spark2.
Tip: To verify the version of a Spark installation, you can run thespark-shell
command. Then, usesc.getConf.get("spark.home")
to return the installation location.HADOOP_CONF_DIR or YARN_CONF_DIR Directory that contains the client side configuration files for the Hadoop cluster. Required for Hadoop YARN and Spark standalone clusters only. For more information about these environment variables, see the Apache Spark documentation.