Upgrading the Initial Control Hub Instance

If you are upgrading a development environment, follow these instructions to upgrade the single Control Hub instance.

If you are upgrading a highly available production environment, follow these instructions to upgrade the initial Control Hub instance. When you upgrade additional Control Hub instances on separate machines, use the shortened upgrade process described in Upgrade a Highly Available Environment.

Upgrading the initial Control Hub instance involves upgrading the system Data Collector, installing and setting up the new version, updating the Control Hub schemas, and generating authentication tokens for the Control Hub applications.

Step 1. Upgrade the System Data Collector

When upgrading a Control Hub installation, you might also upgrade the system Data Collector.

As of version 3.16.0, administrators can enable or disable the system Data Collector for use as the default authoring Data Collector in Control Hub. In upgrades from earlier versions of Control Hub, the system Data Collector is enabled as an authoring Data Collector by default.

For more information about how Pipeline Designer uses the system Data Collector, see System Data Collector.

When you upgrade a Control Hub installation, you can use the same system Data Collector in the new Control Hub version or you can upgrade the system Data Collector to a newer version. For best results, the system Data Collector should be the same version as the earliest execution Data Collector in use.

For upgrade instructions, see Upgrade in the Data Collector documentation.

Step 2. Install the New Version

Install the new version of Control Hub from the tarball or RPM package.

Important: The minimum requirements for Control Hub can change with each version. Before you upgrade to a new Control Hub version, verify that the machine meets the latest minimum requirements as described in Installation Requirements.

You can install the new version of Control Hub on the same machine as the previous version or on a separate machine.

  1. Use one of the following installation methods to install the new version of Control Hub:
  2. If you installed the RPM package on the same machine as the previous version, rename the previous and new versions of the configuration files.

    When you install the new RPM package on the same machine as the previous version, the configuration files are written to the same default directory as the previous version, /etc/dpm. The new versions of the configuration files are renamed with the following extension: .rpmnew. For example, the new version of the Control Hub configuration file is renamed to dpm.properties.rpmnew.

    1. In the working $DPM_CONF directory, /etc/dpm by default, rename all previous configuration files with the following extension: .old.
    2. Remove the following extension from all new configuration files: .rpmnew.
  3. Download the JDBC driver for the relational database instance that you are using:
  4. Copy the driver to the following directory:

    $DPM_HOME/extra-lib

    For example, copy the driver to the following directory in an RPM installation:

    /opt/streamsets-dpm/extra-lib

  5. Set the DPM_HOME and DPM_CONF environment variables.
    • Use the following command to set the DPM_HOME environment variable:
      export DPM_HOME=<home directory>

      For example:

      export DPM_HOME=/opt/streamsets-dpm 
    • Use the following command to set the DPM_CONF environment variable:
      export DPM_CONF=<configuration directory>

      For example:

      export DPM_CONF=/etc/dpm

Step 3. Set Up the New Version

Run the Control Hub setup script in the new Control Hub version to configure Control Hub properties and database connection details. Use the same values that the previous version used to ensure that you connect to the same databases.

The setup script uses the dialog command line utility to display the configuration properties using dialog boxes.

  1. If you installed the new Control Hub version on a separate machine from the previous version, install the dialog command line utility.
    For CentOS, Oracle Linux, or Red Hat Enterprise Linux, use the following command:
    yum install dialog
    For Ubuntu, use the following commands:
    apt-get update
    apt-get install dialog
  2. If using PuTTY as the SSH client to install Control Hub on a remote machine, configure PuTTY to use linux as the terminal emulation mode.

    By default, PuTTY uses xterm emulation which does not correctly display the dialog command line utility.

    In the PuTTY Configuration dialog box, click Connection > Data and then set Terminal-type string to linux.

  3. Use the following command to run the Control Hub setup script from the $DPM_HOME directory:
    dev/setup.sh
  4. Enter the same configuration values that the previous Control Hub version used. For a description of each property, see the following sections:

Step 4. Update the Configuration Files

If you enabled LDAP authentication or HTTPS, or if you made other customizations to the Control Hub configuration files in the previous version, you'll need to compare the previous and new versions of the files, and update the new files as needed with the same customized property values.

For example when upgrading from version 3.25.0, you'd compare the files in your back up directory, /etc/dpm3250, with the files in the /etc/dpm directory. Then update the new files in the /etc/dpm directory with any customizations made in the previous files in the /etc/dpm3250 directory.

  1. If you customized the Control Hub log configuration file, dpm-log4j.properties, in version 3.25.x or earlier, you must update the new dpm-log4j2.properties file with the same customized property values using the Log4j 2.x syntax.
    Important: As of version 3.51.0, Control Hub uses the Apache Log4j 2.17.2 library. Control Hub version 3.50.1 uses the Apache Log4j 2.17.1 library. For either of these Control Hub versions, you can customize the log format by modifying the log configuration file, dpm-log4j2.properties, using the Log4j 2.x syntax. Earlier versions used the Log4j 1.x library which is now end-of-life. You customized the log format by modifying the dpm-log4j.properties file using the Log4j 1.x syntax.
  2. If you enabled LDAP authentication in the previous version, compare the previous and new versions of the Control Hub security configuration file, $DPM_CONF/security-app.properties, and update the new file as needed with the same property values.
    Important: As of version 3.9.0, the security-app.properties file no longer includes the following unused properties: sdc.minimum.version, sdc.maximum.version, and sdc.minimum.build.date. Do not add these properties to the new version of the file. You configure the Data Collector version range in the Control Hub UI, not in this configuration file.
  3. If you enabled Control Hub to use HTTPS in the previous version, compare the previous and new versions of the following Control Hub configuration files and update the new files as needed with the same property values:
    • $DPM_CONF/dpm.properties
    • $DPM_CONF/common-to-all-apps.properties
    Important: As of version 3.6.0, the common-to-all-apps.properties file no longer includes the http.load.balancer.url property. Do not add the property to the new version of the file. If you configured the load balancer URL in the previous version of the file, simply enter the load balancer URL for the dpm.base.url property.
  4. Compare the previous and new versions of the remaining configuration files in the Control Hub configuration directory, $DPM_CONF, and update the new versions of the files as needed with the same customized property values.
    Important: As of version 3.19.0, the $DPM_CONF/jobrunner-app.properties file no longer includes the following unused property: failover.time.secs. Do not add this property to the new version of the file. You configure the execution engine heartbeat interval in the Control Hub UI, not in this configuration file.
    For multiple versions, the jobrunner-app.properties file has changed the default values for the following properties related to the purging of deleted jobs and the automatic deletion of job history. StreamSets recommends that you always keep the new default values for these properties:
    • purge.job.immediate
    • enable.job.purge
    • enable.active.job.purge
    • job.purge.init.delay.minutes
    • job.purge.freq.minutes
    • job.purge.age.days
    • job.purge.batch.size
    • job.status.history.purge.batch.size
    • job.purge.batch.pause.millis
    • system.limit.job.status.history.records
    • enable.job.status.purge
    • enable.job.status.history.purge

Step 5. Enable PostgreSQL for the Scheduler Application

If using PostgreSQL for the relational database instance, configure the Scheduler application to use the driver delegate class for PostgreSQL.
Note: If using MariaDB or MySQL for the relational database instance, skip this step.
Uncomment the following line in the $DPM_CONF/scheduler-app.properties file:
org.quartz.jobStore.driverDelegateClass = org.quartz.impl.jdbcjobstore.PostgreSQLDelegate

Step 6. Configure Component IDs (Optional)

The default component ID for each Control Hub application is <application name>000. For example: notification000 and jobrunner000. If you changed the component ID in the previous version, we recommend configuring the new version to use the same component ID.

You can configure the new version to use a different component ID. For example, if you install the new version on a different machine and your policy is to set the component ID to the IP address of the machine, you'd want to configure a different component ID for the new version. For example: <application name>199.57.90.24.

If needed, modify the value of the dpm.componentId property in each of these files located in the Control Hub configuration directory, $DPM_CONF:
  • connection-app.properties
  • dpm.properties
  • dynamic_preview-app.properties
  • jobrunner-app.properties
  • messaging-app.properties
  • notification-app.properties
  • pipelinestore-app.properties
  • policy-app.properties
  • provisioning-app.properties
  • reporting-app.properties
  • scheduler-app.properties
  • sdp_classification-app.properties
  • security-app.properties
  • sla-app.properties
  • timeseries-app.properties
  • topology-app.properties

Step 7. Update Schemas in the Relational Databases

In the new Control Hub version, run the database initialization script to create and upgrade the required tables in the relational databases.

When you upgrade from version 3.17.x or earlier, the database initialization script can take a longer time to run than in previous upgrades. The amount of time depends on the total number of pipeline versions or commits in the pipelinestore database:
  • With 10,000 pipeline versions, the script can take a few minutes to a half an hour to run.
  • With 100,000 pipeline versions, the script can take one to two hours to run.
Important: Before you run the database initialization script, be sure to prepare for the upgrade and create new databases required for the version you are upgrading from.
Use the following command to run the database initialization script from the $DPM_HOME directory:
dev/01-initdb.sh

Step 8. Generate Authentication Tokens

Modify and then run the security script to generate a unique authentication token for each Control Hub application. You must run the security script as the Control Hub system administrator. Define the system administrator username and password in environment variables before running the script.

  1. In the command prompt, set the DPM_ADMIN_USER and DPM_ADMIN_PASSWORD environment variables.
    • Use the following command to set the DPM_ADMIN_USER environment variable:
      export DPM_ADMIN_USER=<user name>

      For example:

      export DPM_ADMIN_USER=admin@admin 
    • Use the following command to set the DPM_ADMIN_PASSWORD environment variable:
      export DPM_ADMIN_PASSWORD=<password>

      For example:

      export DPM_ADMIN_PASSWORD=mypassword 
  2. Use a text editor to modify the $DPM_HOME/dev/02-initsecurity.sh script to comment out the last line in the script as follows:
    # create SAML configuration
    # "${DPM_DIST}/bin/streamsets" dpmcli security createSamlConfig -u "${DPM_ADMIN_USER}" -p "${DPM_ADMIN_PASSWORD}"
  3. Save and close the script.
  4. Use the following command to run the security script from the $DPM_HOME directory:
    dev/02-initsecurity.sh <component ID>
    For example, if you defined the component ID for this installation as <application name>002, use the following command:
    dev/02-initsecurity.sh 002

    You do not need to specify the default component ID of 000.

The script creates a new authentication token for each application.

Step 9. Activate the Control Hub License

As of version 3.2.0, each Control Hub system requires an active license before you can start Control Hub.

Important: If you are upgrading from version 3.2.0 or later, skip this step. An upgraded Control Hub system continues to use the valid license activated in the previous version.

Each license is activated for a specific Control Hub system ID. If you install multiple Control Hub instances for a highly available system, you only need to activate the license once.

  1. Retrieve the Control Hub system ID by running the following command from the $DPM_HOME directory:
    bin/streamsets dpmcli security systemId -c

    The command returns the system ID and temporarily activates the Control Hub license for seven days. This way, you can start and log in to Control Hub while you wait for your permanent activation key from StreamSets.

  2. Open a StreamSets support ticket or contact your StreamSets sales representative to request the permanent activation key for your Control Hub system ID.
  3. After you receive the activation key, run the following command from the $DPM_HOME directory to activate the license:
    bin/streamsets dpmcli security activationKey -i activationKey.txt