Upgrade an Installation from the Tarball

When you upgrade an installation from the tarball, you configure the new version to use a new configuration directory outside of the base Transformer runtime directory. You then configure the new version to use the same data, log, and resource directories as the previous version. As a result, the new version has access to the files created in the previous version.
Note: If you installed a driver or other library as an external library, verify that those libraries are stored in a local directory external to the Transformer runtime directory before you upgrade. That way, Transformer can still use the libraries after the upgrade.

Use the same procedure to upgrade an installation from the tarball when Spark runs locally on the Transformer machine or when Spark runs on a cluster.

Step 1. Shut Down the Previous Version

Stop all running pipelines and then shut down the previous version of Transformer.

  1. Use one of the following methods to stop all running pipelines:
    • If Transformer is not registered to work with StreamSets Control Hub, stop the pipelines using the Transformer UI.

      From the Transformer Home page, select all running pipelines in the list and then click the Stop icon.

    • If Transformer is registered to work with StreamSets Control Hub, stop all jobs running on Transformers using the Control Hub UI.

      From the Control Hub Jobs page, filter the jobs by engine and by engine label. Select all active jobs in the list and then click the Stop Jobs icon.

  2. In the Transformer UI, click Administration > Shut Down.

    When the confirmation dialog box appears, click Yes.

Step 2. Back Up the Previous Version

Before you install the new version, create a backup of the files in the previous version by copying and renaming the configuration, data, and resource directories. That way, you can continue to run the previous version if needed.

Back up the following directories:
  • TRANSFORMER_EXTERNAL_RESOURCES - The Transformer directory for external resources.
  • TRANSFORMER_RESOURCES - The Transformer directory for runtime resource files.
  • STREAMSETS_LIBRARIES_EXTRA_DIR - The Transformer directory for external libraries.

For example, if you are upgrading version 3.12.0, back up the Transformer configuration directory and name it as follows: /etc/transformer3120.

Important: These directories should be outside of the $TRANSFORMER_DIST runtime directory.

Step 3. Install the New Version

Install the new version of the tarball on the same machine as the previous version.

  1. Download the Transformer tarball from one of the following locations:
    • StreamSets Support portal if you have an enterprise account.
    • StreamSets website if you do not have an enterprise account.
  2. Extract the tarball to a different directory than the previous version.
  3. Use the following command to set the TRANSFORMER_DIST environment variable to the location where you extracted the tarball:
    export TRANSFORMER_DIST=<extraction directory>
    For example:
    export TRANSFORMER_DIST=/transformer/streamsets-transformer-5.9.0

Step 4. Update Environment Variables

Update the Transformer environment configuration file so that the new version of Transformer uses a new configuration directory but the same data, log, resource, Java, and Spark directories as the previous version.

For example, let's say your previous Transformer version used the directory /var/lib/transformer to store the data files for pipeline configuration and run details. When you upgrade, you configure the new version of Transformer to use the same working directory /var/lib/transformer for the data files. As a result, the new version has access to the pipelines created in the previous version.

For more information about modifying Transformer environment variables, see Modifying Environment Variables.

  1. Use a text editor to open $TRANSFORMER_DIST/libexec/transformer-env.sh, the environment configuration file used by a tarball installation.
  2. Update the directory environment variables to use the following values:
    Environment Variable Value
    TRANSFORMER_CONF New location outside of the base Transformer runtime directory and unique from the previous renamed directory. For example, if you renamed the previous configuration directory to /etc/transformer3120, use the value /etc/transformer.
    TRANSFORMER_DATA Same directory that the previous version used.
    TRANSFORMER_LOG Same directory that the previous version used.
    TRANSFORMER_RESOURCES Same directory that the previous version used.
  3. Add the following environment variables to the file based on whether Spark runs locally or on a cluster, and set them to use the following values:
    Environment Variable Spark Installation Value
    SPARK_HOME Local

    Cluster - Required for Hadoop YARN and Spark standalone clusters only.

    Same directory that the previous version used.
    JAVA_HOME Cluster Same directory that the previous version used.
    HADOOP_CONF_DIR or YARN_CONF_DIR Cluster - Required for Hadoop YARN and Spark standalone clusters only. Same directory that the previous version used.
  4. If you use any of the following environment variables, add them to the required file, and set them to the same directory used in the previous version:
    • TRANSFORMER_EXTERNAL_RESOURCES
    • STREAMSETS_LIBRARIES_EXTRA_DIR
  5. Update the file with any other customized environment variable values that you defined in the previous version.
  6. Use the following command to create the Transformer configuration directory at /etc/transformer:
    mkdir /etc/transformer
  7. Use the following command from the directory where you extracted the tarball to copy all files from etc into the Transformer configuration directory that you just created:
    cp -R etc/* /etc/transformer

Step 5. Update the Configuration Files

A new Transformer version can include new properties and configuration files required for Transformer to start or function properly. In the previous step, you updated the environment configuration file so that the new version of Transformer uses the new configuration files stored in the $TRANSFORMER_CONF directory. In this step, you’ll compare the previous and new versions of the configuration files, and update the new files as needed with the same customized property values.

For example when upgrading from version 3.12.0, you compare the files in your back up directory, /etc/transformer3120, with the files in the /etc/transformer directory. You update the new files in the /etc/transformer directory with any customizations made in the previous files in the /etc/transformer3120 directory.

  1. Compare the previous and new versions of the transformer.properties file, and update the new file as needed with the same customized property values.
  2. If the previous Transformer was registered with StreamSets Control Hub, complete the following steps to update the configuration files used by Control Hub:
    1. Compare the previous and new version of the dpm.properties file, and update the new file as needed with the same customized property values.
    2. Replace the new version of the application-token.txt file with the previous version of the file.
      The previous version of the file includes the authentication token that this Transformer instance requires to issue authenticated requests to Control Hub. As a result, you'll need Transformer to use the previous version of the file.
  3. Compare the previous and new versions of the remaining files, and update the new files as needed with the same customized property values:
    • The appropriate realm.properties file, based on the authentication type that you use.
    • credential stores properties file
    • email-password.txt
    • keystore files
    • LDAP files
    • Log4j2 properties file
      Important: Transformer versions 5.x and later use the Apache Log4j 2.17.2 library. Earlier versions use the Log4j 1.x library which is now end-of-life. If you customized the transformer-log4j.properties file in a previous version, you must update the new transformer-log4j2.properties file with the same customized property values using the Log4j 2.x syntax. For more information, see Upgrade Impact.
    • security policy file

Step 6. Start the New Version of Transformer

Start the new version of Transformer, as described in Starting Transformer Manually.

If the previous version of Transformer was registered with StreamSets Control Hub and you correctly updated the configuration files during the upgrade, then the new version of Transformer is automatically registered and enabled to work with Control Hub.