Engine Communication

Control Hub runs on a public cloud service hosted by StreamSets - you simply need an account to get started. You set up and deploy Data Collector and Transformer engines in your corporate network, which can be on-premises or on a protected cloud computing platform.

Control Hub works with the engines when you design pipelines and when you run pipelines from jobs.

Engines communicate with the following components:
Control Hub
Engines use encrypted REST APIs to communicate with Control Hub. Engines initiate outbound connections to Control Hub over HTTPS on port number 443.

Engines send requests and information to Control Hub. Control Hub does not directly send requests to engines. Instead, Control Hub sends requests using encrypted REST APIs to a messaging queue managed by Control Hub. Engines periodically check with the queue to retrieve Control Hub requests. For more information, see Engine Requests to Control Hub.

Web browser
The web browser also uses encrypted REST APIs to communicate with Control Hub, initiating outbound connections to Control Hub over HTTPS on port number 443.
For some user actions, including when you design a pipeline, install additional stage libraries on engines, or monitor a job, the browser requests must reach the engines. By default for these actions, the browser initiates outbound connections to Control Hub over HTTPS. Control Hub then forwards the requests to the engines using an encrypted WebSocket tunnel.
WebSocket tunnel communication is sufficient for most use cases and does not require additional setup. However, you can configure the web browser to use direct engine REST APIs to directly connect to the engines instead.

WebSocket Tunneling

By default, the web browser uses WebSocket tunneling to communicate with engines.

When an engine starts up, the engine uses the WebSocket Secure (wss) protocol to establish a WebSocket tunnel with Control Hub over an encrypted SSL/TLS connection. Control Hub serves as the WebSocket server, and acts as an intermediary between the browser and the engine.

When you design pipelines or monitor jobs with WebSocket tunneling enabled, the web browser initiates outbound connections to Control Hub over HTTPS on port number 443. Control Hub then uses the encrypted WebSocket tunnel to communicate with the engine. The engine securely passes the requested data back through the WebSocket tunnel to Control Hub, and then the browser receives the data from Control Hub over HTTPS. Control Hub decrypts and then re-encrypts the data as it passes through. Control Hub does not use or inspect the data.

Each engine uses a single WebSocket tunnel connection that remains active until the engine restarts. Multiple users can use the same connection to securely request data from the engine. WebSocket tunneling ensures that your data is secure and does not require additional setup.

However, when you preview a pipeline or capture a snapshot of an active job, your source data does pass through encrypted connections beyond your corporate network into Control Hub, and then back to your web browser. If your data must remain behind a firewall due to corporate regulations, you can configure the browser to use direct engine REST APIs to directly communicate with the engines behind the firewall.

Note: Due to your account agreement, WebSocket tunneling might be disabled for your organization. For more information, contact your StreamSets account team.

The following image shows how the web browser uses a WebSocket tunnel to communicate with engines:

Direct Engine REST APIs

When your source data must remain behind a firewall due to corporate regulations, you can configure the web browser to use direct engine REST APIs to communicate with engines behind the firewall.

When using direct engine REST APIs, the browser initiates inbound connections to the engines over HTTPS on the engine port number when you design pipelines or monitor jobs. When you preview a pipeline or capture a snapshot of an active job, your source data does not pass through Control Hub. Instead, the web browser makes a direct connection to the engines within your corporate network.
Note: Engines that belong to a Control Hub Kubernetes deployment must use the default WebSocket tunneling communication method. You cannot enable the direct engine REST API communication method.

To use direct engine REST APIs, complete the following tasks:

  1. Enable engines to use the HTTPS protocol.
  2. Ensure browser access to the engines.
  3. Choose the direct engine REST APIs communication method in your browser settings.
  4. Optionally, require all users to use direct engine REST APIs.

The following image shows how the web browser can use direct engine REST APIs to communicate with engines:

Enabling HTTPS for Engines

To use direct engine REST APIs, you first must enable all engines to use the HTTPS protocol.

Enable HTTPS for all engines by creating keystore files for the engines and then modifying engine advanced configuration properties in the deployment so that the engines use a secure port and keystore file.

For instructions, see Enabling HTTPS for Data Collector or Enabling HTTPS for Transformer.

Ensuring Browser Access to Engines

To use direct engine REST APIs, you must ensure that the browser can reach the URLs of the engines.

Configure network routes and firewalls so that Control Hub web browsers can reach all engines on the configured HTTPS port number. For more information about inbound traffic to engines, see Firewall Configuration Overview.

To verify that the browser can access the engines, view the engines from the Engines view or from the deployment details on the Deployments view. When the engine is accessible, the Last Reported Time value is listed in green. When the engine cannot be reached, the Last Reported Time value is red.

Choosing the Communication Method

After you enable HTTPS for all engines and ensure that the browser can access all engines, you can choose the communication method that the browser uses.

By default, the browser uses WebSocket tunneling. You might choose direct engine REST APIs because the REST APIs can offer faster communication with the engines.

  1. In the top Control Hub toolbar, click the My Account icon (), and then click your user name.
  2. Click the Browser Settings tab.
  3. For the Browser to Engine Communication property, select one of the following options:
    • Using WebSocket Tunneling
    • Using Direct Engine REST APIs
    Note: The property is saved in the configured web browser only. It does not apply if you log in from another browser.
  4. Click Save.

Requiring Direct Engine REST APIs

An organization administrator can optionally require that all web browsers use direct engine REST APIs to communicate with the engines.

  1. In the Navigation panel, click Manage > My Organization.
  2. In the organization details, click Advanced.
  3. Clear Enable WebSocket Tunneling for UI Communication.
  4. Click Save.

    The web browser used by all users in your organization always uses direct engine REST APIs to communicate with engines, regardless of the user-defined communication method set from the My Account menu.