Upgrading the Initial Control Hub Instance
If you are upgrading a development environment, follow these instructions to upgrade the single Control Hub instance.
If you are upgrading a highly available production environment, follow these instructions to upgrade the initial Control Hub instance. When you upgrade additional Control Hub instances on separate machines, use the shortened upgrade process described in Upgrade a Highly Available Environment.
Upgrading the initial Control Hub instance involves upgrading the system Data Collector, installing and setting up the new version, updating the Control Hub schemas, and generating authentication tokens for the Control Hub applications.
Step 1. Upgrade the System Data Collector
When upgrading a Control Hub installation, you might also upgrade the system Data Collector.
As of version 3.16.0, administrators can enable or disable the system Data Collector for use as the default authoring Data Collector in Control Hub. In upgrades from earlier versions of Control Hub, the system Data Collector is enabled as an authoring Data Collector by default.
For more information about how Pipeline Designer uses the system Data Collector, see System Data Collector.
When you upgrade a Control Hub installation, you can use the same system Data Collector in the new Control Hub version or you can upgrade the system Data Collector to a newer version. For best results, the system Data Collector should be the same version as the earliest execution Data Collector in use.
For upgrade instructions, see Upgrade in the Data Collector documentation.
Step 2. Install the New Version
Install the new version of Control Hub from the tarball or RPM package.
You can install the new version of Control Hub on the same machine as the previous version or on a separate machine.
- Use one of the following installation methods to install the new version of Control Hub:
-
If you installed the RPM package on the same machine as the previous version,
rename the previous and new versions of the configuration files.
When you install the new RPM package on the same machine as the previous version, the configuration files are written to the same default directory as the previous version, /etc/dpm. The new versions of the configuration files are renamed with the following extension: .rpmnew. For example, the new version of the Control Hub configuration file is renamed to dpm.properties.rpmnew.
- In the working $DPM_CONF directory, /etc/dpm by default, rename all previous configuration files with the following extension: .old.
- Remove the following extension from all new configuration files: .rpmnew.
-
Download the JDBC driver for the relational database instance that you are
using:
- MariaDB or MySQL when Control Hub uses Java 8 - Download the MySQL JDBC driver version 5 (5.1.44 or later) from the following location: https://dev.mysql.com/downloads/connector/j/5.1.html
- MariaDB or MySQL when Control Hub uses Java 11 - Download the MySQL JDBC driver version 8 (8.0.19 or later) from the following location: https://dev.mysql.com/downloads/connector/j/8.0.html
- PostgreSQL - Download the PostgreSQL JDBC driver version 42.1.4 or later from the following location: https://jdbc.postgresql.org/download.html
-
Copy the driver to the following directory:
$DPM_HOME/extra-lib
For example, copy the driver to the following directory in an RPM installation:
/opt/streamsets-dpm/extra-lib
-
Set the DPM_HOME and DPM_CONF environment variables.
- Use the following command to set the DPM_HOME environment
variable:
export DPM_HOME=<home directory>
For example:
export DPM_HOME=/opt/streamsets-dpm
- Use the following command to set the DPM_CONF environment
variable:
export DPM_CONF=<configuration directory>
For example:
export DPM_CONF=/etc/dpm
- Use the following command to set the DPM_HOME environment
variable:
Step 3. Set Up the New Version
Run the Control Hub setup script in the new Control Hub version to configure Control Hub properties and database connection details. Use the same values that the previous version used to ensure that you connect to the same databases.
The setup script uses the dialog command line utility to display the configuration properties using dialog boxes.
- If you installed the new Control Hub version on a separate machine from the previous version, install the dialog
command line utility.For CentOS, Oracle Linux, or Red Hat Enterprise Linux, use the following command:
yum install dialog
For Ubuntu, use the following commands:apt-get update apt-get install dialog
- If using PuTTY as the SSH client to install Control Hub on a remote machine,
configure PuTTY to use linux as the terminal emulation mode.
By default, PuTTY uses xterm emulation which does not correctly display the dialog command line utility.
In the PuTTY Configuration dialog box, click Terminal-type string to linux.
and then set - Use the following command to run the Control Hub
setup script from the $DPM_HOME directory:
dev/setup.sh
- Enter the same configuration values that the previous Control Hub version used. For a description of each property, see the following sections:
Step 4. Update the Configuration Files
If you enabled LDAP authentication or HTTPS, or if you made other customizations to the Control Hub configuration files in the previous version, you'll need to compare the previous and new versions of the files, and update the new files as needed with the same customized property values.
For example when upgrading from version 3.25.0, you'd compare the files in your back up directory, /etc/dpm3250, with the files in the /etc/dpm directory. Then update the new files in the /etc/dpm directory with any customizations made in the previous files in the /etc/dpm3250 directory.
- If you customized the Control Hub
log configuration file,
dpm-log4j.properties
, in version 3.25.x or earlier, you must update the newdpm-log4j2.properties
file with the same customized property values using the Log4j 2.x syntax.Important: As of version 3.51.0, Control Hub uses the Apache Log4j 2.17.2 library. Control Hub version 3.50.1 uses the Apache Log4j 2.17.1 library. For either of these Control Hub versions, you can customize the log format by modifying the log configuration file,dpm-log4j2.properties
, using the Log4j 2.x syntax. Earlier versions used the Log4j 1.x library which is now end-of-life. You customized the log format by modifying thedpm-log4j.properties
file using the Log4j 1.x syntax. - If you enabled LDAP authentication in the previous version, compare the previous and
new versions of the Control Hub
security configuration file, $DPM_CONF/security-app.properties,
and update the new file as needed with the same property values.Important: As of version 3.9.0, the security-app.properties file no longer includes the following unused properties:
sdc.minimum.version
,sdc.maximum.version
, andsdc.minimum.build.date
. Do not add these properties to the new version of the file. You configure the Data Collector version range in the Control Hub UI, not in this configuration file. - If you enabled Control Hub
to use HTTPS in the previous version, compare the previous and new versions of the
following Control Hub
configuration files and update the new files as needed with the same property
values:
- $DPM_CONF/dpm.properties
- $DPM_CONF/common-to-all-apps.properties
Important: As of version 3.6.0, the common-to-all-apps.properties file no longer includes thehttp.load.balancer.url
property. Do not add the property to the new version of the file. If you configured the load balancer URL in the previous version of the file, simply enter the load balancer URL for thedpm.base.url
property. - Compare the previous and new versions of the remaining configuration
files in the Control Hub
configuration directory,
$DPM_CONF
, and update the new versions of the files as needed with the same customized property values.Important: As of version 3.19.0, the $DPM_CONF/jobrunner-app.properties file no longer includes the following unused property:failover.time.secs
. Do not add this property to the new version of the file. You configure the execution engine heartbeat interval in the Control Hub UI, not in this configuration file.For multiple versions, the jobrunner-app.properties file has changed the default values for the following properties related to the purging of deleted jobs and the automatic deletion of job history. StreamSets recommends that you always keep the new default values for these properties:purge.job.immediate
enable.job.purge
enable.active.job.purge
job.purge.init.delay.minutes
job.purge.freq.minutes
job.purge.age.days
job.purge.batch.size
job.status.history.purge.batch.size
job.purge.batch.pause.millis
system.limit.job.status.history.records
enable.job.status.purge
enable.job.status.history.purge
Step 5. Enable PostgreSQL for the Scheduler Application
org.quartz.jobStore.driverDelegateClass = org.quartz.impl.jdbcjobstore.PostgreSQLDelegate
Step 6. Configure Component IDs (Optional)
The default component ID for each Control Hub application is <application name>000. For example: notification000 and jobrunner000. If you changed the component ID in the previous version, we recommend configuring the new version to use the same component ID.
You can configure the new version to use a different component ID. For example, if you install the new version on a different machine and your policy is to set the component ID to the IP address of the machine, you'd want to configure a different component ID for the new version. For example: <application name>199.57.90.24.
- connection-app.properties
- dpm.properties
- dynamic_preview-app.properties
- jobrunner-app.properties
- messaging-app.properties
- notification-app.properties
- pipelinestore-app.properties
- policy-app.properties
- provisioning-app.properties
- reporting-app.properties
- scheduler-app.properties
- sdp_classification-app.properties
- security-app.properties
- sla-app.properties
- timeseries-app.properties
- topology-app.properties
Step 7. Update Schemas in the Relational Databases
In the new Control Hub version, run the database initialization script to create and upgrade the required tables in the relational databases.
pipelinestore
database: - With 10,000 pipeline versions, the script can take a few minutes to a half an hour to run.
- With 100,000 pipeline versions, the script can take one to two hours to run.
dev/01-initdb.sh
Step 8. Generate Authentication Tokens
Modify and then run the security script to generate a unique authentication token for each Control Hub application. You must run the security script as the Control Hub system administrator. Define the system administrator username and password in environment variables before running the script.
-
In the command prompt, set the DPM_ADMIN_USER and DPM_ADMIN_PASSWORD
environment variables.
- Use the following command to set the DPM_ADMIN_USER environment
variable:
export DPM_ADMIN_USER=<user name>
For example:
export DPM_ADMIN_USER=admin@admin
- Use the following command to set the DPM_ADMIN_PASSWORD environment
variable:
export DPM_ADMIN_PASSWORD=<password>
For example:
export DPM_ADMIN_PASSWORD=mypassword
- Use the following command to set the DPM_ADMIN_USER environment
variable:
-
Use a text editor to modify the
$DPM_HOME/dev/02-initsecurity.sh script to comment out
the last line in the script as follows:
# create SAML configuration # "${DPM_DIST}/bin/streamsets" dpmcli security createSamlConfig -u "${DPM_ADMIN_USER}" -p "${DPM_ADMIN_PASSWORD}"
- Save and close the script.
-
Use the following command to run the security script from the $DPM_HOME
directory:
dev/02-initsecurity.sh <component ID>
For example, if you defined the component ID for this installation as <application name>002, use the following command:dev/02-initsecurity.sh 002
You do not need to specify the default component ID of 000.
The script creates a new authentication token for each application.
Step 9. Activate the Control Hub License
As of version 3.2.0, each Control Hub system requires an active license before you can start Control Hub.
Each license is activated for a specific Control Hub system ID. If you install multiple Control Hub instances for a highly available system, you only need to activate the license once.
-
Retrieve the Control Hub system ID by
running the following command from the $DPM_HOME directory:
bin/streamsets dpmcli security systemId -c
The command returns the system ID and temporarily activates the Control Hub license for seven days. This way, you can start and log in to Control Hub while you wait for your permanent activation key from StreamSets.
- Open a StreamSets support ticket or contact your StreamSets sales representative to request the permanent activation key for your Control Hub system ID.
-
After you receive the activation key, run the
following command from the $DPM_HOME directory to activate the license:
bin/streamsets dpmcli security activationKey -i activationKey.txt