Upgrade an Installation from the Tarball
Use the same procedure to upgrade an installation from the tarball when Spark runs locally on the Transformer machine or when Spark runs on a cluster.
Step 1. Shut Down the Previous Version
Stop all running pipelines and then shut down the previous version of Transformer.
-
Use one of the following methods to stop all running pipelines:
- If Transformer is
not registered to work with StreamSets Control Hub, stop
the pipelines using the Transformer UI.
From the Transformer Home page, select all running pipelines in the list and then click the Stop icon.
- If Transformer is
registered to work with StreamSets Control Hub, stop
all jobs running on Transformers using the
Control Hub UI.
From the Control Hub Jobs page, filter the jobs by engine and by engine label. Select all active jobs in the list and then click the Stop Jobs icon.
- If Transformer is
not registered to work with StreamSets Control Hub, stop
the pipelines using the Transformer UI.
-
In the Transformer UI, click .
When the confirmation dialog box appears, click Yes.
Step 2. Back Up the Previous Version
Before you install the new version, create a backup of the files in the previous version by copying and renaming the configuration, data, and resource directories. That way, you can continue to run the previous version if needed.
- TRANSFORMER_EXTERNAL_RESOURCES - The Transformer directory for external resources.
- TRANSFORMER_RESOURCES - The Transformer directory for runtime resource files.
-
STREAMSETS_LIBRARIES_EXTRA_DIR - The Transformer directory for external libraries.
For example, if you are upgrading version 3.12.0, back up the Transformer
configuration directory and name it as follows:
/etc/transformer3120
.
Step 3. Install the New Version
Install the new version of the tarball on the same machine as the previous version.
-
Download the Transformer tarball from one of the following locations:
- StreamSets Support portal if you have an enterprise account.
- StreamSets website if you do not have an enterprise account.
- Extract the tarball to a different directory than the previous version.
-
Use the following command to set the TRANSFORMER_DIST environment variable to
the location where you extracted the tarball:
export TRANSFORMER_DIST=<extraction directory>
For example:export TRANSFORMER_DIST=/transformer/streamsets-transformer-5.9.0
Step 4. Update Environment Variables
Update the Transformer environment configuration file so that the new version of Transformer uses a new configuration directory but the same data, log, resource, Java, and Spark directories as the previous version.
For example, let's say your previous Transformer
version used the directory /var/lib/transformer
to store the data
files for pipeline configuration and run details. When you upgrade, you configure
the new version of Transformer to
use the same working directory /var/lib/transformer
for the data
files. As a result, the new version has access to the pipelines created in the
previous version.
For more information about modifying Transformer environment variables, see Modifying Environment Variables.
-
Use a text editor to open
$TRANSFORMER_DIST/libexec/transformer-env.sh
, the environment configuration file used by a tarball installation. -
Update the directory environment variables to use the following values:
Environment Variable Value TRANSFORMER_CONF New location outside of the base Transformer runtime directory and unique from the previous renamed directory. For example, if you renamed the previous configuration directory to /etc/transformer3120
, use the value/etc/transformer
.TRANSFORMER_DATA Same directory that the previous version used. TRANSFORMER_LOG Same directory that the previous version used. TRANSFORMER_RESOURCES Same directory that the previous version used. -
Add the following environment variables to the file based on whether Spark runs
locally or on a cluster, and set them to use the following values:
Environment Variable Spark Installation Value SPARK_HOME Local Cluster - Required for Hadoop YARN and Spark standalone clusters only.
Same directory that the previous version used. JAVA_HOME Cluster Same directory that the previous version used. HADOOP_CONF_DIR or YARN_CONF_DIR Cluster - Required for Hadoop YARN and Spark standalone clusters only. Same directory that the previous version used. -
If you use any of the following environment variables, add them to the required
file, and set them to the same directory used in the previous version:
- TRANSFORMER_EXTERNAL_RESOURCES
- STREAMSETS_LIBRARIES_EXTRA_DIR
- Update the file with any other customized environment variable values that you defined in the previous version.
-
Use the following command to create the Transformer configuration directory at
/etc/transformer
:mkdir /etc/transformer
-
Use the following command from the directory where you extracted the tarball to
copy all files from etc into the Transformer configuration directory that you just created:
cp -R etc/* /etc/transformer
Step 5. Update the Configuration Files
A new Transformer version can include new properties and configuration files required for Transformer to start or function properly. In the previous step, you updated the environment
configuration file so that the new version of Transformer uses the new configuration files stored in the $TRANSFORMER_CONF
directory. In this step, you’ll compare the previous and new versions of the
configuration files, and update the new files as needed with the same customized
property values.
For example when upgrading from version 3.12.0, you compare the files in your back up directory, /etc/transformer3120, with the files in the /etc/transformer directory. You update the new files in the /etc/transformer directory with any customizations made in the previous files in the /etc/transformer3120 directory.
-
Compare the previous and new versions of the
transformer.properties
file, and update the new file as needed with the same customized property values. -
If the previous Transformer was registered with StreamSets Control Hub, complete the following steps to update the configuration files
used by Control Hub:
-
Compare the previous and new versions of the remaining files, and update the
new files as needed with the same customized property values:
- The appropriate realm.properties file, based on the authentication type that you use.
- credential stores properties file
email-password.txt
- keystore files
- LDAP files
- Log4j2 properties fileImportant: Transformer versions 5.x and later use the Apache Log4j 2.17.2 library. Earlier versions use the Log4j 1.x library which is now end-of-life. If you customized the transformer-log4j.properties file in a previous version, you must update the new transformer-log4j2.properties file with the same customized property values using the Log4j 2.x syntax. For more information, see Upgrade Impact.
- security policy file
Step 6. Start the New Version of Transformer
Start the new version of Transformer, as described in Starting Transformer Manually.
If the previous version of Transformer was registered with StreamSets Control Hub and you correctly updated the configuration files during the upgrade, then the new version of Transformer is automatically registered and enabled to work with Control Hub.