Upgrade an Installation from the RPM Package

When you upgrade an installation from the RPM package, the new version uses the default Transformer configuration, data, log, and resource directories. If the previous version used the default directories, the new version has access to the files created in the previous version.

If the previous version used customized values for the directory environment variables, you must make the same customizations in the new version so that the new version can access the same files.

Note: If you installed a driver or other library as an external library, verify that those libraries are stored in a local directory external to the Transformer runtime directory before you upgrade. That way, Transformer can still use the libraries after the upgrade.

Use the same procedure to upgrade an installation from the RPM package when Spark runs locally on the Transformer machine or when Spark runs on a cluster.

Step 1. Shut Down the Previous Version

Stop all running pipelines and then shut down the previous version of Transformer.

  1. From the Control Hub Jobs page, filter the jobs by engine and by engine label. Select all active jobs in the list and then click the Stop Jobs icon.
  2. Use the command line to shut down Transformer. Use the required command for your operating system.
    For CentOS 6, Oracle Linux 6, or Red Hat Enterprise Linux 6, use:
    service transformer stop
    For CentOS 7, Oracle Linux 7, or Red Hat Enterprise Linux 7, use:
    systemctl stop transformer

Step 2. Back Up the Previous Version

Before you install the new version, create a backup of the files in the data and resource directories in the previous version. You’ll also need to create a backup of the environment configuration file so that the file is not overwritten when you install the new version. That way, you can continue to run the previous version if needed.

Back up the following file and directory:
  • File that defines environment variables, based on the operating system:
    • CentOS 6, Oracle Linux 6, or Red Hat Enterprise Linux 6 - the $TRANSFORMER_DIST/libexec/transformerd-env.sh file.
    • CentOS 7, Oracle Linux 7, or Red Hat Enterprise Linux 7 - the /usr/lib/systemd/system/transformer.service file.
  • Data directory defined in the TRANSFORMER_DATA environment variable. Default is /var/lib/transformer.
  • If used, copy and rename the following directories as well:
    • TRANSFORMER_EXTERNAL_RESOURCES - The Transformer directory for external resources.
    • TRANSFORMER_RESOURCES - The Transformer directory for runtime resource files.
    • STREAMSETS_LIBRARIES_EXTRA_DIR - The Transformer directory for external libraries.

For example, if you are upgrading version 3.12.0 on CentOS 6, Oracle Linux 6, or Red Hat Enterprise Linux 6, back up the Transformer data directory and name it as follows: /var/lib/transformer3120. Create a backup of the environment configuration file and name the backup file as follows: transformerd-env-3120.sh.

Step 3. Install the New Version

Install the new version of the RPM package on the same machine as the previous version.

  1. Access the Transformer RPM package from one of the following locations:
    • StreamSets Support portal if you have an enterprise account.
    • StreamSets website if you do not have an enterprise account.
  2. Download the appropriate Transformer RPM package for your operating system:
    • For CentOS 6, Oracle Linux 6, or Red Hat Enterprise Linux 6, download the RPM EL6 package.
    • For CentOS 7, Oracle Linux 7, or Red Hat Enterprise Linux 7, download the RPM EL7 package.
  3. Use the following command to extract the file to a different directory than the previous version:
    tar xf streamsets-transformer-<transformer version>-<operating system>-all-rpms.tar
    For example, to extract Transformer version 6.0.0 on CentOS 7, use the following command:
    tar xf streamsets-transformer-6.0.0-el7-all-rpms.tar
  4. To install the package, use the following command from the directory where you extracted the package:
    yum localinstall streamsets*.rpm

Step 4. Update Environment Variables

Update the Transformer environment configuration file so that the new version of Transformer uses the same Java and Spark installation directories as the previous version.

Note: Each RPM installation uses the same default values as the previous version for all of the Transformer directories. If the previous version used the default values, you do not need to update the environment variables that define Transformer directories.

Update the environment variables in the required file based on your installation type. For more information about the required file to edit, see Modifying Environment Variables.

  1. Add the following environment variables to the file based on whether Spark runs locally or on a cluster, and set them to use the following values:
    Environment Variable Spark Installation Value
    SPARK_HOME Local

    Cluster - Required for Hadoop YARN and Spark standalone clusters only.

    Same directory that the previous version used.
    JAVA_HOME Cluster Same directory that the previous version used.
    HADOOP_CONF_DIR or YARN_CONF_DIR Cluster - Required for Hadoop YARN and Spark standalone clusters only. Same directory that the previous version used.
  2. If you use any of the following environment variables, add them to the required file, and set them to the same directory used in the previous version:
    • TRANSFORMER_EXTERNAL_RESOURCES
    • STREAMSETS_LIBRARIES_EXTRA_DIR
  3. Update the file with any other customized environment variable values that you defined in the previous version.

Step 5. Update the Configuration Files

A new Transformer version can include new properties and configuration files required for Transformer to start or function properly.

When you install the new RPM package, the configuration files are written to the same default directory as the previous version, /etc/transformer. The new versions of the configuration files are renamed with the following extension: .rpmnew. For example, the new version of the Transformer configuration file is renamed to transformer.properties.rpmnew.

To update the configuration files, you must rename the previous and new versions of the files and then update the new files with any customized property values defined in the previous version.
Note: If the previous version used a customized value for the TRANSFORMER_CONF environment variable, the new configuration files are written to a different directory than the previous version, and so do not require the .rpmnew file extension. In this case, you do not rename the configuration files, but must update the new files with any customized values defined in the previous version.
  1. In the $TRANSFORMER_CONF directory, /etc/transformer by default, rename all previous configuration files except for the application-token.txt file with the following extension: .old.

    If the previous Transformer was registered with StreamSets Control Hub, the previous version of the application-token.txt file includes the authentication token that this Transformer instance requires to issue authenticated requests to Control Hub. As a result, you'll need Transformer to use the previous version of the file.

  2. Remove the following extension from all new configuration files except for the application-token.txt file: .rpmnew.
  3. Compare the previous and new versions of the transformer.properties file, and update the new file as needed with the same customized property values.
  4. If the previous Transformer was registered with Control Hub, compare the previous and new version of the dpm.properties file, and update the new file as needed with the same customized property values.
  5. Compare the previous and new versions of the remaining files, and update the new files as needed with the same customized property values:
    • The appropriate realm.properties file, based on the authentication type that you use.
    • credential stores properties file
    • email-password.txt
    • keystore files
    • LDAP files
    • Log4j2 properties file
      Important: Transformer versions 5.x and later use the Apache Log4j 2.17.2 library. Earlier versions use the Log4j 1.x library which is now end-of-life. If you customized the transformer-log4j.properties file in a previous version, you must update the new transformer-log4j2.properties file with the same customized property values using the Log4j 2.x syntax. For more information, see Upgrade Impact.
    • security policy file

Step 6. Start the New Version of Transformer

Start the new version of Transformer, as described in Starting Transformer as a Service.

If the previous version of Transformer was registered with StreamSets Control Hub and you correctly updated the configuration files during the upgrade, then the new version of Transformer is automatically registered and enabled to work with Control Hub.