Setting Up a Highly Available Environment
In a production environment, we recommend using multiple Control Hub instances and a load balancer to ensure that Control Hub is highly available.
- Use highly available database clusters dedicated to the production
environment.
Before you set up a highly available environment, ensure that you have installed database software that supports high availability as described in Install the Database Software. Then, create relational and time series databases dedicated to the production environment. If you created development databases for an initial Control Hub development environment as described in Creating the Databases, then follow the same steps to create the production databases.
- Install Control Hub on at least two machines to support Control Hub failover. Install Control Hub on additional machines if you find that you need additional instances to handle the number of requests to Control Hub.
- Use unique component IDs for each Control Hub instance.
Define a unique application component ID for each instance. The default component ID is <application name>000. For example: notification000 and jobrunner000. You might want to incrementally update the component ID for each Control Hub instance. For example: <application name>001 for the first instance and <application name>002 for the second instance. Or, you can set the component ID to the IP address of each machine, for example: <application name>199.57.90.24.
Step 1. Set Up Time Synchronization
Set up time synchronization for each additional Control Hub machine using the same NTP server that you configured for the initial Control Hub instance.
Step 2. Set Up a Load Balancer for Control Hub
Set up a load balancer to distribute user and registered Data Collector, Transformer, and Provisioning Agent requests across the Control Hub system. These Control Hub clients communicate with the Control Hub system through the load balancer URL. We recommend using a Layer 7 load balancer such as HAProxy, NGINX, or F5.
- HTTP headers to identify the client
- Configure the load balancer to add the
X-Forwarded-For
and theClient-IP
headers to all inbound requests. These HTTP headers identify the originating IP address of the client sending the request to Control Hub. - HTTP security headers
- Configure the load balancer to add the following response headers required
for security:
X-Frame-Options: SAMEORIGIN
X-Content-Type-Options: nosniff
Strict-Transport-Security: max-age=31536000
Alternatively, to force other subdomains of the load balancer URL to use HTTPS, set this header to
max-age=31536000;includeSubDomains
X-XSS-Protection: 1, mode=block
Content-Security-Policy: frame-ancestors 'none'; object-src 'none'; script-src 'self' cdn.cookielaw.org privacyportal.onetrust.com geolocation.onetrust.com 'sha256-AxMiKbGCpteW7H2AwmMGGAzdmzw7tm8HCJOHBEcSaNQ='; style-src 'self' fonts.googleapis.com 'unsafe-inline'; base-uri 'self';
- HTTPS protocol
- Configure the load balancer to use the HTTPS protocol.
- IP address and port numbers for each Control Hub instance
- Configure the IP address and both of the following port numbers for each Control Hub instance:
- Control Hub port number. Default is 18631.
- Control Hub Admin tool port number. Default is 18632.
- Control Hub backend definitions
- Define the following backend definitions for Control Hub:
/connection
/dynamic_preview
/jobrunner
/messaging
/pipelinestore
/policy
/provisioning
/reporting
/scheduler
/sdp_classification
/security
/timeseries
/topology
/notification/rest/v1
/sla/rest/v1
- Health check URL
- Optionally, configure the load balancer to use the following health check
URL for Control Hub:
/public-rest/v1/health
The load balancer can use the health check URL to periodically check on the health of each Control Hub instance.
After setting up the load balancer, configure network routes and firewalls so that the Control Hub instances, web UI clients, and registered Data Collectors and Provisioning Agents can reach the load balancer.
Step 3. Install the Initial Control Hub Instance
When you install the initial Control Hub instance for a highly available environment, follow the complete installation process as described in Installing Control Hub.
Then, make sure that the Control Hub instance can access the front end of the load balancer to communicate with the other Control Hub instances.
Step 4. Install Additional Control Hub Instances
When you install additional Control Hub instances on separate machines for a highly available environment, use the shortened installation process described here.
- Install Control Hub on a separate machine that meets the installation requirements using one of the following installation methods:
-
Download the JDBC driver for the relational database instance that you are
using:
- MariaDB or MySQL when Control Hub uses Java 8 - Download the MySQL JDBC driver version 5 (5.1.44 or later) from the following location: https://dev.mysql.com/downloads/connector/j/5.1.html
- MariaDB or MySQL when Control Hub uses Java 11 - Download the MySQL JDBC driver version 8 (8.0.19 or later) from the following location: https://dev.mysql.com/downloads/connector/j/8.0.html
- PostgreSQL - Download the PostgreSQL JDBC driver version 42.1.4 or later from the following location: https://jdbc.postgresql.org/download.html
-
Copy the driver to the following directory:
$DPM_HOME/extra-lib
For example, copy the driver to the following directory in an RPM installation:
/opt/streamsets-dpm/extra-lib
-
Set the DPM_HOME and DPM_CONF environment variables.
- Use the following command to set the DPM_HOME environment
variable:
export DPM_HOME=<home directory>
For example:
export DPM_HOME=/opt/streamsets-dpm
- Use the following command to set the DPM_CONF environment
variable:
export DPM_CONF=<configuration directory>
For example:
export DPM_CONF=/etc/dpm
- Use the following command to set the DPM_HOME environment
variable:
-
Copy all files from the $DPM_CONF directory in the
initial Control Hub instance to the $DPM_CONF directory in this
additional Control Hub instance.
By copying all configuration files, you ensure that this additional Control Hub instance connects to the same load balancer, databases, and SMTP account as the initial Control Hub instance.
-
Update the configuration file for each Control Hub application to define a unique component ID for this Control Hub instance.
Modify the value of the dpm.componentId property in these files located in the Control Hub configuration directory, $DPM_CONF:
- connection-app.properties
- dpm.properties
- dynamic_preview-app.properties
- jobrunner-app.properties
- messaging-app.properties
- notification-app.properties
- pipelinestore-app.properties
- policy-app.properties
- provisioning-app.properties
- reporting-app.properties
- scheduler-app.properties
- sdp_classification-app.properties
- security-app.properties
- sla-app.properties
- timeseries-app.properties
- topology-app.properties
The default component ID for each instance is <application name>000. For example: notification000 and jobrunner000.
You might want to incrementally update the component ID for each Control Hub instance. For example: <application name>001 for the first instance and <application name>002 for the second instance. Or, you can set the component ID to the IP address of each machine, for example: <application name>199.57.90.24.
-
Run the security script to generate a unique authentication token for each Control Hub application.
Use the following command to run the security script from the $DPM_HOME directory:
dev/02-initsecurity.sh <component ID>
For example, if you defined the component ID for this installation as <application name>002, use the following command:dev/02-initsecurity.sh 002
- Make sure that the Control Hub instance can access the front end of the load balancer to communicate with the other Control Hub instances.
Step 5. Start Each Control Hub Instance
Start each Control Hub instance from the command prompt, and then use the load balancer URL to log into Control Hub.
- Start the load balancer.
-
Start each Control Hub instance from the command prompt, using the required command for your
installation type:
- Tarball installation for a manual startUse the following command from the $DPM_HOME directory to start Control Hub so that it runs as the system user account logged into the command prompt:
bin/streamsets dpm
Or, use the following command to start Control Hub and run it in the background:nohup bin/streamsets dpm &
Use the following command to start Control Hub so that it runs as another system user account:sudo -u <user> bin/streamsets dpm
- Tarball or RPM installation for a service startUse the required command for your operating system to start Control Hub as a service:
- For CentOS 6.x, Oracle Linux 6.x, Red Hat Enterprise Linux 6.x, or
Ubuntu 14.04,
use:
service dpm start
- For CentOS 7.x, Oracle Linux 7.x - 8.x, or Red Hat Enterprise Linux
7.x - 8.x,
use:
systemctl start dpm
For more details about starting each instance, see Start DPM.
- For CentOS 6.x, Oracle Linux 6.x, Red Hat Enterprise Linux 6.x, or
Ubuntu 14.04,
use:
- Tarball installation for a manual start
- To log in to Control Hub, enter the load balancer URL in the address bar of your browser.