Upgrade an Installation from the Tarball
Users with an enterprise account can upgrade to a full, common, or core tarball installation. Other users can upgrade to the common tarball installation and install additional stage libraries as needed.
To upgrade a full, common, or core installation from the tarball, perform the following steps:
Step 1. Shut Down the Previous Version
Step 2. Back Up the Previous Version
Step 3. Install the New Version
Step 4. Update Environment Variables
Step 5. Update the Configuration Files
Step 6. Install Additional Libraries for the Core Installation
Step 1. Shut Down the Previous Version
Stop all running pipelines and then shut down the previous version of Data Collector.
- From the Control Hub Jobs page, filter the jobs by engine and by engine label. Select all active jobs in the list and then click the Stop Jobs icon.
-
Use the command line to shut down Data Collector. Use the required command for your operating system.
For CentOS 6, Oracle Linux 6, Red Hat Enterprise Linux 6, or Ubuntu 14.04 LTS, use:
service sdc stop
For CentOS 7, Oracle Linux 7, Red Hat Enterprise Linux 7, or Ubuntu 16.04 LTS, use:
systemctl stop sdc
Step 2. Back Up the Previous Version
Before you install the new version, create a backup of the files in the previous version by copying and renaming the configuration, data, and resource directories. That way, you can continue to run the previous version if needed.
- SDC_CONF - The Data Collector configuration directory.
- SDC_DATA - The Data Collector directory for pipeline state and configuration information.
- SDC_EXTERNAL_RESOURCES - The Data Collector directory for external resources.
- SDC_RESOURCES - The Data Collector directory for runtime resource files.
-
STREAMSETS_LIBRARIES_EXTRA_DIR - The Data Collector directory for external libraries.
-
USER_LIBRARIES_DIR - The Data Collector directory for custom stages.
/etc/sdc3000
.For more information about these directories, see Data Collector Directories.
Step 3. Install the New Version
The instructions that you use to install the new version depend on whether you start Data Collector manually or as a service and on your operating system.
Users with an enterprise account can upgrade to a full, common, or core tarball installation. Other users can upgrade to the common tarball installation and install additional stage libraries as needed.
Installing from the Tarball (Manual Start)
Install the new version of the tarball.
-
Download the tarball from one of the following locations:
- StreamSets Support portal if you have a StreamSets enterprise account.
- StreamSets website if you do not have an enterprise account.
- Extract the tarball to a different directory than the previous version.
-
Use the following command to set the
$SDC_DIST
environment variable to the location where you extracted the tarball:export SDC_DIST=<extraction directory>
For example:export SDC_DIST=/sdc/streamsets-datacollector-6.0.0
Installing from the Tarball for Systems Using SysV (Service Start)
Operating systems that use the SysV init system include CentOS 6, Oracle Linux 6, Red Hat Enterprise Linux 6, or Ubuntu 14.04 LTS.
Install the new version of the tarball. Installing the full Data Collector as a service requires root privileges.
-
Download the tarball from one of the following locations:
- StreamSets Support portal if you have a StreamSets enterprise account.
- StreamSets website if you do not have an enterprise account.
- Extract the tarball to a different directory than the previous version.
-
Create a backup of the
/etc/init.d/sdc
file that was used in the previous version. -
Use the following commands from the directory where you extracted the tarball
to copy the initd/_sdcinitd_prototype file to the
/etc/init.d directory and then change ownership of the
file to
sdc
:cp initd/_sdcinitd_prototype /etc/init.d/sdc chown sdc:sdc /etc/init.d/sdc
-
Edit the
/etc/init.d/sdc
file and set the$SDC_DIST
and$SDC_HOME
environment variables to the location where you extracted the tarball. -
Use the following command to make the sdc file executable:
chmod 755 /etc/init.d/sdc
Installing from the Tarball for Systems Using Systemd (Service Start)
Operating systems that use the systemd init system include CentOS 7, Oracle Linux 7, Red Hat Enterprise Linux 7, or Ubuntu 16.04 LTS.
Install the new version of the tarball. Installing the full Data Collector as a service requires root privileges.
-
Download the tarball from one of the following locations:
- StreamSets Support portal if you have a StreamSets enterprise account.
- StreamSets website if you do not have an enterprise account.
- Extract the tarball to a different directory than the previous version.
-
Use the following command from the directory where you extracted the tarball to
copy
systemd/sdc.service
to the/etc/systemd/system
directory:cp systemd/sdc.service /etc/systemd/system/sdc.service
-
Optionally, override the
/etc/systemd/system/sdc.service file to modify
the environment variables that define the directories and the system user and group.
For example, if you modified the default system user and group used by the previous version of Data Collector, configure the new version to use the same system user and group.
Override the default values in the
sdc.service
file using the same procedure that you use to override unit configuration files on a systemd init system. For an example, see "Example 2. Overriding vendor settings" in this systemd.unit manpage. -
Use the following command from the directory where you extracted the tarball to
copy
systemd/sdc.socket
to the/etc/systemd/system
directory:cp systemd/sdc.socket /etc/systemd/system/sdc.socket
-
Edit the /etc/systemd/system/sdc.socket file to
modify the Data Collector port number to match the previous version. The port must match the one
defined in
sdc.properties
. Default is18630
. -
Use the following command to reload the systemd manager configuration:
systemctl daemon-reload
Step 4. Update Environment Variables
Update the Data Collector environment variables so that the new version of Data Collector uses a new configuration directory but the same working data, log, and resource directories as the previous version.
For example, your previous Data Collector
version used the directory /var/lib/sdc
to store the data files for
pipeline configuration and run details. When you upgrade, you configure the new
version of Data Collector
to use the same working directory /var/lib/sdc
for the data files.
As a result, the new version has access to the pipelines created in the previous
version.
Update the environment variables in the required file based on your installation type. For more information about the required file to edit, see Modifying Environment Variables.
-
Update the directory environment variables to use the following values:
Environment Variable Value SDC_CONF New location outside of the base Data Collector runtime directory and unique from the previous renamed directory. For example, if you renamed the previous configuration directory to /etc/sdc3000
, use the value/etc/sdc
.SDC_DATA Same directory that the previous version used. SDC_LOG Same directory that the previous version used. SDC_RESOURCES Same directory that the previous version used. -
If you use any of the following environment variables, add them to the required
file, and set them to the same directory used in the previous version:
- SDC_EXTERNAL_RESOURCES
- STREAMSETS_LIBRARIES_EXTRA_DIR
- USER_LIBRARIES_DIR
- Manually update the required file with any other customized environment variable values that you defined in the previous version.
-
Use the following command to create the Data Collector configuration directory at
/etc/sdc
:mkdir /etc/sdc
-
Use the following command from the directory where you extracted the tarball to
copy all files from
etc
into the Data Collector configuration directory that you just created:cp -R etc/* /etc/sdc
-
To run Data Collector as a service, change the owner of the
/etc/sdc
directory and all files in the directory to the system user and group that starts Data Collector.By default, Data Collector uses a system user and group namedsdc
. -
Use the following command to set owner only permission on the
form-realm.properties
file in the/etc/sdc
directory:chmod go-rwx /etc/sdc/form-realm.properties
Step 5. Update the Configuration Files
A new Data Collector
version can include new properties and configuration files required for Data Collector
to start or function properly. In the previous step, you updated the environment
configuration file so that the new version of Data Collector
uses the new configuration files stored in the $SDC_CONF
directory.
In this step, you’ll compare the previous and new versions of the configuration
files, and update the new files as needed with the same customized property
values.
For example when upgrading from version 3.0.0.0, you'd compare the files in your back
up directory, /etc/sdc3000
, with the files in the /etc/sdc
directory. Then update the new files in the /etc/sdc
directory with any customizations made in the previous files in the
/etc/sdc3000
directory.
-
Compare the previous and new versions of the
sdc.properties
file, and update the new file as needed with the same customized property values. -
If you registered the previous Data Collector to work with StreamSets Control Hub, complete the following steps to update the configuration files used by Control Hub:
-
Compare the previous and new versions of the remaining files, and update the
new files as needed with the same customized property values:
- The appropriate realm.properties file, based on the authentication type that you use.
- credential stores properties file
email-password.txt
- keystore files
- LDAP files
- Log4j2 properties fileImportant: Data Collector versions 5.x and later use the Apache Log4j 2.17.2 library. Earlier versions use the Log4j 1.x library which is now end-of-life. If you customized the sdc-log4j.properties file in a previous version, you must update the new sdc-log4j2.properties file with the same customized property values using the Log4j 2.x syntax. For more information, see Upgrade Impact.
- security policy file
- Vault properties file
As of version 2.7.0.0, most of the Vault configuration properties have been moved to the new credential stores properties file. The properties use the same name, with an added "credentialStore.vault.config" prefix. If you are upgrading from a version earlier than 2.7.0.0, copy any values that you customized in the previous Vault properties file into the same property names in the credential stores properties file.
Step 6. Install Additional Libraries for the Core Installation
If you upgraded a common or core installation of Data Collector, install the individual stage libraries that the upgraded pipelines require.
Step 7. Start the New Version of Data Collector
Start the new version of Data Collector.
- To start Data Collector manually
- Use the following command from the
$SDC_DIST
directory to run Data Collector as the system user account logged into the command prompt:bin/streamsets dc
- To start Data Collector as a service
- Use the required command for your operating system:
- For CentOS 6, Oracle Linux 6, Red Hat Enterprise Linux 6, or Ubuntu
14.04 LTS, use:
service sdc start
- For CentOS 7, Oracle Linux 7, Red Hat Enterprise Linux 7, or Ubuntu
16.04 LTS, use:
systemctl start sdc
- For CentOS 6, Oracle Linux 6, Red Hat Enterprise Linux 6, or Ubuntu
14.04 LTS, use: