Upgrade an Installation from the RPM Package
When you upgrade an installation from the RPM package, the new version uses the default configuration, data, log, and resource directories. If the previous version used the default directories, the new version has access to the files created in the previous version.
If the previous version used customized values for the directory environment variables, you must make the same customizations in the new version so that the new version can access the same files.
To upgrade an installation from the RPM package, perform the following steps:
Step 1. Shut Down the Previous Version
Step 2. Back Up the Previous Version
Step 3. Install the New Version
Step 4. Update Environment Variables
Step 5. Update the Configuration Files
Step 6. Install Additional Libraries for the Core Installation
Step 1. Shut Down the Previous Version
Stop all pipelines and then shut down the previous version of Data Collector.
-
Use one of the following methods to stop all running pipelines:
-
If the Data Collector is not registered to work with StreamSets Control Hub, stop the pipelines using the Data Collector UI.
From the Data Collector Home page, select all running pipelines in the list and then click the Stop icon.
-
If the Data Collector is registered to work with StreamSets Control Hub, stop all jobs running on the Data Collector using the Control Hub UI.
From the Control Hub Jobs page, filter the jobs by engine and by engine label. Select all active jobs in the list and then click the Stop Jobs icon.
-
-
Use one of the following methods to shut down Data Collector:
- To use the command line for shutdown, use the required command for your
operating system.
For CentOS 6, Oracle Linux 6, or Red Hat Enterprise Linux 6, use:
service sdc stop
For CentOS 7, Oracle Linux 7, or Red Hat Enterprise Linux 7, use:
systemctl stop sdc
-
To use the Data Collector UI, click . When the confirmation dialog box appears, click Yes.
- To use the command line for shutdown, use the required command for your
operating system.
Step 2. Back Up the Previous Version
Before you install the new version, create a backup of the files in the data and resource directories in the previous version. You’ll also need to create a backup of the environment configuration file so that the file is not overwritten when you install the new version. That way, you can continue to run the previous version if needed.
- File that defines environment variables, based on the operating system:
- CentOS 6, Oracle Linux 6, or Red Hat Enterprise Linux
6 - the
$SDC_DIST/libexec/sdcd-env.sh
file. - CentOS 7, Oracle Linux 7, or Red Hat Enterprise Linux
7 - the
/usr/lib/systemd/system/sdc.service
file.
- CentOS 6, Oracle Linux 6, or Red Hat Enterprise Linux
6 - the
- SDC_DATA - The Data Collector directory for pipeline state and configuration information.
- SDC_EXTERNAL_RESOURCES - The Data Collector directory for external resources.
- SDC_RESOURCES - The Data Collector directory for runtime resource files.
-
STREAMSETS_LIBRARIES_EXTRA_DIR - The Data Collector directory for external libraries.
-
USER_LIBRARIES_DIR - The Data Collector directory for custom stages.
For example, if you are upgrading version 3.0.0.0 on CentOS 6, Oracle Linux 6, or Red Hat Enterprise Linux
6, back up the Data Collector
data directory and name it as follows: /var/lib/sdc3000
. Create a
backup of the environment configuration file and name the backup file as follows:
sdcd-env-3000.sh
.
For more information about these directories, see Data Collector Directories.
Step 3. Install the New Version
Install the new version of the RPM package. Installing the full Data Collector as a service requires root privileges.
-
Access the Data Collector RPM package from one of the following locations:
- StreamSets Support portal if you have an enterprise account.
- StreamSets archives page if you do not have an enterprise account.
-
Download the RPM package for your operating system:
- For CentOS 6, Oracle Linux 6, or Red Hat Enterprise Linux 6, download the RPM EL6 package.
- For CentOS 7, Oracle Linux 7, or Red Hat Enterprise Linux 7, download the RPM EL7 package.
- For Oracle Linux 8 or Red Hat Enterprise Linux 8, download the RPM EL8 package.
-
Use the following command to extract the file to a different directory than the
previous version:
tar xf streamsets-datacollector-<version>-<operating_system>-all-rpms.tar
For example, to extract version 5.11.0 on CentOS 7, use the following command:tar xf streamsets-datacollector-5.11.0-el7-all-rpms.tar
-
To install the full RPM package and all available stage libraries, use the
following command:
yum localinstall streamsets*
-
Or, to install the core RPM package and then install individual stage libraries
as needed, use the following command:
For example, to install version 5.11.0, use the following command:yum localinstall streamsets-datacollector-<version>-1.noarch.rpm
yum localinstall streamsets-datacollector-5.11.0-1.noarch.rpm
Step 4. Update Environment Variables
Each RPM installation uses the same default values as the previous version for all of the environment variables. If the previous version used the default values, the new version is configured to use the same environment variables.
If the previous version used customized values for the environment variables, you must make the same customizations in the new version. The new version must use the same data, log, and resource directories as the previous version.
-
Open the environment configuration file that you backed up in the previous
version.
For example, on CentOS 6, Oracle Linux 6, or Red Hat Enterprise Linux 6, open the
$SDC_DIST/libexec/sdcd-env-3000.sh
file. -
In the new version of Data Collector, open the environment configuration file.
For example, on CentOS 6, Oracle Linux 6, or Red Hat Enterprise Linux 6, open the
$SDC_DIST/libexec/sdcd-env.sh
file. - Compare the previous and new versions of the environment configuration file, and update the new file as needed with the same customized environment variables.
Step 5. Update the Configuration Files
A new Data Collector version can include new properties and configuration files required for Data Collector to start or function properly.
When you install the new RPM package, the configuration files are written to the same
default directory as the previous version, /etc/sdc
. The new
versions of the configuration files are renamed with the following extension:
.rpmnew
. For example, the new version of the Data Collector configuration file is renamed to sdc.properties.rpmnew
.
To update the configuration files, you must rename the previous and new versions of the files and then update the new files with any customized property values defined in the previous version.
$SDC_CONF
,
the new configuration files are written to a different directory than the previous
version, and so do not require the .rpmnew
file extension. In this
case, you do not rename the configuration files, but must update the new files with
any customized values defined in the previous version.-
In the working
$SDC_CONF
directory,/etc/sdc
by default, rename all previous configuration files except for theapplication-token.txt
file with the following extension:.old
.The previous version of theapplication-token.txt
file includes the authentication token that this Data Collector instance requires to issue authenticated requests to Control Hub. As a result, you'll need Data Collector to use the previous version of the file. -
Remove the following extension from all new configuration files except for the
application-token.txt
file:.rpmnew
. -
Compare the previous and new versions of the
sdc.properties
file, and update the new file as needed with the same customized property values. -
Compare the previous and new versions of the remaining files, and update the
new files as needed with the same customized property values:
- The appropriate realm.properties file, based on the authentication type that you use.
- credential stores properties file
email-password.txt
- keystore files
- LDAP files
- Log4j2 properties fileImportant: Data Collector versions 5.x and later use the Apache Log4j 2.17.2 library. Earlier versions use the Log4j 1.x library which is now end-of-life. If you customized the sdc-log4j.properties file in a previous version, you must update the new sdc-log4j2.properties file with the same customized property values using the Log4j 2.x syntax. For more information, see Upgrade Impact.
- security policy file
- Vault properties file
As of version 2.7.0.0, most of the Vault configuration properties have been moved to the new credential stores properties file. The properties use the same name, with an added "credentialStore.vault.config" prefix. If you are upgrading from a version earlier than 2.7.0.0, copy any values that you customized in the previous Vault properties file into the same property names in the credential stores properties file.
Step 6. Install Additional Libraries for the Core Installation
If you installed the core RPM package, install the individual stage libraries that the upgraded pipelines require.
For instructions on installing additional stage libraries, see Installing for RPM.
Step 7. Uninstall Previous Libraries
Uninstall all stage libraries used by the previous Data Collector version.
- Run the following command to list all stage libraries used by the previous Data Collector
version:
rpm -qa | grep streamsets | grep "<version>"
For example, to list all stage libraries used by Data Collector version 3.0.0.0, run the following command:rpm -qa | grep streamsets | grep "3.0.0.0"
- Run the following command to uninstall all stage libraries used by the previous
version:
yum remove <library package name> <library package name> ...
Where
library package name
is the full name of the libraries that you want to uninstall. Separate each name with commas. Do not include spaces in the command.
Step 8. Start the New Version of Data Collector
- For CentOS 6, Oracle Linux 6, or Red Hat Enterprise Linux
6, use:
service sdc start
- For CentOS 7, Oracle Linux 7, or Red Hat Enterprise Linux
7, use:
systemctl start sdc