Setting Up a Highly Available Environment

In a production environment, we recommend using multiple Control Hub instances and a load balancer to ensure that Control Hub is highly available.

Note: You can run a single Control Hub instance in a development environment. If you are setting up a development environment, you can skip this section.
When you plan a highly available environment, consider the following requirements:
  • Use highly available database clusters dedicated to the production environment.

    Before you set up a highly available environment, ensure that you have installed database software that supports high availability as described in Install the Database Software. Then, create relational and time series databases dedicated to the production environment. If you created development databases for an initial Control Hub development environment as described in Creating the Databases, then follow the same steps to create the production databases.

  • Install Control Hub on at least two machines to support Control Hub failover. Install Control Hub on additional machines if you find that you need additional instances to handle the number of requests to Control Hub.
  • Use unique component IDs for each Control Hub instance.

    Define a unique application component ID for each instance. The default component ID is <application name>000. For example: notification000 and jobrunner000. You might want to incrementally update the component ID for each Control Hub instance. For example: <application name>001 for the first instance and <application name>002 for the second instance. Or, you can set the component ID to the IP address of each machine, for example: <application name>199.57.90.24.

Step 1. Set Up Time Synchronization

Set up time synchronization for each additional Control Hub machine using the same NTP server that you configured for the initial Control Hub instance.

See Set Up Time Synchronization.

Step 2. Set Up a Load Balancer for Control Hub

Set up a load balancer to distribute user and registered Data Collector, Transformer, and Provisioning Agent requests across the Control Hub system. These Control Hub clients communicate with the Control Hub system through the load balancer URL. We recommend using a Layer 7 load balancer such as HAProxy, NGINX, or F5.

Configure the following information to set up the load balancer:
HTTP headers to identify the client
Configure the load balancer to add the X-Forwarded-For and the Client-IP headers to all inbound requests. These HTTP headers identify the originating IP address of the client sending the request to Control Hub.
When you include these headers, Control Hub records the IP address of the client sending the request, rather than the IP address of the load balancer. For example, the Login Audit includes the client IP address rather than the load balancer IP address.
HTTP security headers
Configure the load balancer to add the following response headers required for security:
  • X-Frame-Options: SAMEORIGIN
  • X-Content-Type-Options: nosniff
  • Strict-Transport-Security: max-age=31536000

    Alternatively, to force other subdomains of the load balancer URL to use HTTPS, set this header to max-age=31536000;includeSubDomains

  • X-XSS-Protection: 1, mode=block
  • Content-Security-Policy: frame-ancestors 'none'; object-src 'none'; script-src 'self' cdn.cookielaw.org privacyportal.onetrust.com geolocation.onetrust.com 'sha256-AxMiKbGCpteW7H2AwmMGGAzdmzw7tm8HCJOHBEcSaNQ='; style-src 'self' fonts.googleapis.com 'unsafe-inline'; base-uri 'self';
HTTPS protocol
Configure the load balancer to use the HTTPS protocol.
As a best practice, configure the Control Hub instances to use the HTTPS protocol also. If you need the Control Hub instances to use the HTTP protocol, then you must configure the load balancer to add the X-Forwarded-Proto header to all inbound requests.
IP address and port numbers for each Control Hub instance
Configure the IP address and both of the following port numbers for each Control Hub instance:
  • Control Hub port number. Default is 18631.
  • Control Hub Admin tool port number. Default is 18632.
Control Hub backend definitions
Define the following backend definitions for Control Hub:
  • /connection
  • /dynamic_preview
  • /jobrunner
  • /messaging
  • /pipelinestore
  • /policy
  • /provisioning
  • /reporting
  • /scheduler
  • /sdp_classification
  • /security
  • /timeseries
  • /topology
  • /notification/rest/v1
  • /sla/rest/v1
Define the following backend definition for the Control Hub Admin tool:
  • /rest/v1/system
Health check URL
Optionally, configure the load balancer to use the following health check URL for Control Hub:
/public-rest/v1/health

The load balancer can use the health check URL to periodically check on the health of each Control Hub instance.

After setting up the load balancer, configure network routes and firewalls so that the Control Hub instances, web UI clients, and registered Data Collectors and Provisioning Agents can reach the load balancer.

Step 3. Install the Initial Control Hub Instance

When you install the initial Control Hub instance for a highly available environment, follow the complete installation process as described in Installing Control Hub.

Then, make sure that the Control Hub instance can access the front end of the load balancer to communicate with the other Control Hub instances.

Step 4. Install Additional Control Hub Instances

When you install additional Control Hub instances on separate machines for a highly available environment, use the shortened installation process described here.

  1. Install Control Hub on a separate machine that meets the installation requirements using one of the following installation methods:
  2. Download the JDBC driver for the relational database instance that you are using:
  3. Copy the driver to the following directory:

    $DPM_HOME/extra-lib

    For example, copy the driver to the following directory in an RPM installation:

    /opt/streamsets-dpm/extra-lib

  4. Set the DPM_HOME and DPM_CONF environment variables.
    • Use the following command to set the DPM_HOME environment variable:
      export DPM_HOME=<home directory>

      For example:

      export DPM_HOME=/opt/streamsets-dpm 
    • Use the following command to set the DPM_CONF environment variable:
      export DPM_CONF=<configuration directory>

      For example:

      export DPM_CONF=/etc/dpm
  5. Copy all files from the $DPM_CONF directory in the initial Control Hub instance to the $DPM_CONF directory in this additional Control Hub instance.
    By copying all configuration files, you ensure that this additional Control Hub instance connects to the same load balancer, databases, and SMTP account as the initial Control Hub instance.
  6. Update the configuration file for each Control Hub application to define a unique component ID for this Control Hub instance.
    Modify the value of the dpm.componentId property in these files located in the Control Hub configuration directory, $DPM_CONF:
    • connection-app.properties
    • dpm.properties
    • dynamic_preview-app.properties
    • jobrunner-app.properties
    • messaging-app.properties
    • notification-app.properties
    • pipelinestore-app.properties
    • policy-app.properties
    • provisioning-app.properties
    • reporting-app.properties
    • scheduler-app.properties
    • sdp_classification-app.properties
    • security-app.properties
    • sla-app.properties
    • timeseries-app.properties
    • topology-app.properties

    The default component ID for each instance is <application name>000. For example: notification000 and jobrunner000.

    You might want to incrementally update the component ID for each Control Hub instance. For example: <application name>001 for the first instance and <application name>002 for the second instance. Or, you can set the component ID to the IP address of each machine, for example: <application name>199.57.90.24.

  7. Run the security script to generate a unique authentication token for each Control Hub application.
    Use the following command to run the security script from the $DPM_HOME directory:
    dev/02-initsecurity.sh <component ID>
    For example, if you defined the component ID for this installation as <application name>002, use the following command:
    dev/02-initsecurity.sh 002
  8. Make sure that the Control Hub instance can access the front end of the load balancer to communicate with the other Control Hub instances.
Repeat these steps for each additional Control Hub instance.

Step 5. Start Each Control Hub Instance

Start each Control Hub instance from the command prompt, and then use the load balancer URL to log into Control Hub.

  1. Start the load balancer.
  2. Start each Control Hub instance from the command prompt, using the required command for your installation type:
    • Tarball installation for a manual start
      Use the following command from the $DPM_HOME directory to start Control Hub so that it runs as the system user account logged into the command prompt:
      bin/streamsets dpm
      Or, use the following command to start Control Hub and run it in the background:
      nohup bin/streamsets dpm &
      Use the following command to start Control Hub so that it runs as another system user account:
      sudo -u <user> bin/streamsets dpm
    • Tarball or RPM installation for a service start
      Use the required command for your operating system to start Control Hub as a service:
      • For CentOS 6.x, Oracle Linux 6.x, Red Hat Enterprise Linux 6.x, or Ubuntu 14.04, use:
        service dpm start
      • For CentOS 7.x, Oracle Linux 7.x - 8.x, or Red Hat Enterprise Linux 7.x - 8.x, use:
        systemctl start dpm

      For more details about starting each instance, see Start DPM.

  3. To log in to Control Hub, enter the load balancer URL in the address bar of your browser.