Upgrade an Installation from the RPM Package

When you upgrade an installation from the RPM package, the new version uses the default configuration, data, log, and resource directories. If the previous version used the default directories, the new version has access to the files created in the previous version.

If the previous version used customized values for the directory environment variables, you must make the same customizations in the new version so that the new version can access the same files.

Note: If you installed external libraries or developed custom stages, verify that those libraries are stored in a local directory external to the Data Collector installation directory before you upgrade. That way, Data Collector can still use the libraries after the upgrade.

To upgrade an installation from the RPM package, perform the following steps:

Step 1. Shut Down the Previous Version

Step 2. Back Up the Previous Version

Step 3. Install the New Version

Step 4. Update Environment Variables

Step 5. Update the Configuration Files

Step 6. Install Additional Libraries for the Core Installation

Step 7. Uninstall Previous Libraries

Step 8. Start the New Version of Data Collector

Step 1. Shut Down the Previous Version

Stop all pipelines and then shut down the previous version of Data Collector.

  1. Use one of the following methods to stop all running pipelines:
    • If the Data Collector is not registered to work with StreamSets Control Hub, stop the pipelines using the Data Collector UI.

      From the Data Collector Home page, select all running pipelines in the list and then click the Stop icon.

    • If the Data Collector is registered to work with StreamSets Control Hub, stop all jobs running on the Data Collector using the Control Hub UI.

      From the Control Hub Jobs page, filter the jobs by engine and by engine label. Select all active jobs in the list and then click the Stop Jobs icon.

  2. Use one of the following methods to shut down Data Collector:
    • To use the command line for shutdown, use the required command for your operating system.

      For CentOS 6, Oracle Linux 6, or Red Hat Enterprise Linux 6, use: service sdc stop

      For CentOS 7, Oracle Linux 7, or Red Hat Enterprise Linux 7, use: systemctl stop sdc

    • To use the Data Collector UI, click Administration > Shut Down. When the confirmation dialog box appears, click Yes.

Step 2. Back Up the Previous Version

Before you install the new version, create a backup of the files in the data and resource directories in the previous version. You’ll also need to create a backup of the environment configuration file so that the file is not overwritten when you install the new version. That way, you can continue to run the previous version if needed.

Copy and rename the following file and directory:
  • File that defines environment variables, based on the operating system:
    • CentOS 6, Oracle Linux 6, or Red Hat Enterprise Linux 6 - the $SDC_DIST/libexec/sdcd-env.sh file.
    • CentOS 7, Oracle Linux 7, or Red Hat Enterprise Linux 7 - the /usr/lib/systemd/system/sdc.service file.
  • SDC_DATA - The Data Collector directory for pipeline state and configuration information.
If used, copy and rename the following directories as well:
  • SDC_EXTERNAL_RESOURCES - The Data Collector directory for external resources.
  • SDC_RESOURCES - The Data Collector directory for runtime resource files.
  • STREAMSETS_LIBRARIES_EXTRA_DIR - The Data Collector directory for external libraries.

  • USER_LIBRARIES_DIR - The Data Collector directory for custom stages.

For example, if you are upgrading version 3.0.0.0 on CentOS 6, Oracle Linux 6, or Red Hat Enterprise Linux 6, back up the Data Collector data directory and name it as follows: /var/lib/sdc3000. Create a backup of the environment configuration file and name the backup file as follows: sdcd-env-3000.sh.

Important: These directories should be outside of the $SDC_DIST runtime directory.

For more information about these directories, see Data Collector Directories.

Step 3. Install the New Version

Install the new version of the RPM package. Installing the full Data Collector as a service requires root privileges.

  1. Access the Data Collector RPM package from one of the following locations:
  2. Download the RPM package for your operating system:
    • For CentOS 6, Oracle Linux 6, or Red Hat Enterprise Linux 6, download the RPM EL6 package.
    • For CentOS 7, Oracle Linux 7, or Red Hat Enterprise Linux 7, download the RPM EL7 package.
    • For Oracle Linux 8 or Red Hat Enterprise Linux 8, download the RPM EL8 package.
  3. Use the following command to extract the file to a different directory than the previous version:
    tar xf streamsets-datacollector-<version>-<operating_system>-all-rpms.tar
    For example, to extract version 5.11.0 on CentOS 7, use the following command:
    tar xf streamsets-datacollector-5.11.0-el7-all-rpms.tar
  4. To install the full RPM package and all available stage libraries, use the following command:
    yum localinstall streamsets*
  5. Or, to install the core RPM package and then install individual stage libraries as needed, use the following command:
    yum localinstall streamsets-datacollector-<version>-1.noarch.rpm
    For example, to install version 5.11.0, use the following command:
    yum localinstall streamsets-datacollector-5.11.0-1.noarch.rpm

Step 4. Update Environment Variables

Each RPM installation uses the same default values as the previous version for all of the environment variables. If the previous version used the default values, the new version is configured to use the same environment variables.

If the previous version used customized values for the environment variables, you must make the same customizations in the new version. The new version must use the same data, log, and resource directories as the previous version.

  1. Open the environment configuration file that you backed up in the previous version.
    For example, on CentOS 6, Oracle Linux 6, or Red Hat Enterprise Linux 6, open the $SDC_DIST/libexec/sdcd-env-3000.sh file.
  2. In the new version of Data Collector, open the environment configuration file.
    For example, on CentOS 6, Oracle Linux 6, or Red Hat Enterprise Linux 6, open the $SDC_DIST/libexec/sdcd-env.sh file.
  3. Compare the previous and new versions of the environment configuration file, and update the new file as needed with the same customized environment variables.

Step 5. Update the Configuration Files

A new Data Collector version can include new properties and configuration files required for Data Collector to start or function properly.

When you install the new RPM package, the configuration files are written to the same default directory as the previous version, /etc/sdc. The new versions of the configuration files are renamed with the following extension: .rpmnew. For example, the new version of the Data Collector configuration file is renamed to sdc.properties.rpmnew.

To update the configuration files, you must rename the previous and new versions of the files and then update the new files with any customized property values defined in the previous version.

Note: If the previous version used a customized value for $SDC_CONF, the new configuration files are written to a different directory than the previous version, and so do not require the .rpmnew file extension. In this case, you do not rename the configuration files, but must update the new files with any customized values defined in the previous version.
  1. In the working $SDC_CONF directory, /etc/sdc by default, rename all previous configuration files except for the application-token.txt file with the following extension: .old.
    The previous version of the application-token.txt file includes the authentication token that this Data Collector instance requires to issue authenticated requests to Control Hub. As a result, you'll need Data Collector to use the previous version of the file.
  2. Remove the following extension from all new configuration files except for the application-token.txt file: .rpmnew.
  3. Compare the previous and new versions of the sdc.properties file, and update the new file as needed with the same customized property values.
  4. Compare the previous and new versions of the remaining files, and update the new files as needed with the same customized property values:
    • The appropriate realm.properties file, based on the authentication type that you use.
    • credential stores properties file
    • email-password.txt
    • keystore files
    • LDAP files
    • Log4j2 properties file
      Important: Data Collector versions 5.x and later use the Apache Log4j 2.17.2 library. Earlier versions use the Log4j 1.x library which is now end-of-life. If you customized the sdc-log4j.properties file in a previous version, you must update the new sdc-log4j2.properties file with the same customized property values using the Log4j 2.x syntax. For more information, see Upgrade Impact.
    • security policy file
    • Vault properties file

      As of version 2.7.0.0, most of the Vault configuration properties have been moved to the new credential stores properties file. The properties use the same name, with an added "credentialStore.vault.config" prefix. If you are upgrading from a version earlier than 2.7.0.0, copy any values that you customized in the previous Vault properties file into the same property names in the credential stores properties file.

Step 6. Install Additional Libraries for the Core Installation

If you installed the core RPM package, install the individual stage libraries that the upgraded pipelines require.

For instructions on installing additional stage libraries, see Installing for RPM.

Step 7. Uninstall Previous Libraries

Uninstall all stage libraries used by the previous Data Collector version.

  1. Run the following command to list all stage libraries used by the previous Data Collector version:
    rpm -qa | grep streamsets | grep "<version>"
    For example, to list all stage libraries used by Data Collector version 3.0.0.0, run the following command:
    rpm -qa | grep streamsets | grep "3.0.0.0"
  2. Run the following command to uninstall all stage libraries used by the previous version:
    yum remove <library package name> <library package name> ...

    Where library package name is the full name of the libraries that you want to uninstall. Separate each name with commas. Do not include spaces in the command.

Step 8. Start the New Version of Data Collector

Use the required command for your operating system to start the new version of Data Collector:
  • For CentOS 6, Oracle Linux 6, or Red Hat Enterprise Linux 6, use:
    service sdc start
  • For CentOS 7, Oracle Linux 7, or Red Hat Enterprise Linux 7, use:
    systemctl start sdc