Data Collector Communication

StreamSets Control Hub works with Data Collector to design pipelines and to execute standalone and cluster pipelines.

After you install StreamSets Control Hub, you install Data Collectors on-premises or on a protected cloud computing platform, and then register them to work with Control Hub.

Each registered Data Collector serves one of the following purposes:
Authoring Data Collector
Use an authoring Data Collector to design pipelines and to create connections. You can design pipelines in the Control Hub after selecting an available authoring Data Collector. The selected authoring Data Collector determines the stages, stage libraries, and functionality that display in Pipeline Designer.
When you create connections, the selected authoring Data Collector determines the connection types that you can create.
Execution Data Collector
Use an execution Data Collector to execute standalone and cluster pipelines run from Control Hub jobs.

A single Data Collector can serve both purposes. However, StreamSets recommends dedicating each Data Collector as either an authoring or execution Data Collector.

Registered Data Collectors use encrypted REST APIs to communicate with Control Hub. Data Collectors initiate outbound connections to Control Hub on the port number configured in the Control Hub system.

The web browser that accesses Control Hub Pipeline Designer uses encrypted REST APIs to communicate with Control Hub. The web browser initiates outbound connections to Control Hub on the port number configured in the Control Hub system.

The authoring Data Collector selected for Pipeline Designer or for connection creation accepts inbound connections from the web browser on the port number configured for the Data Collector. Similarly, the execution Data Collector accepts inbound connections from the web browser when you monitor real time-summary statistics, error information, and snapshots for active jobs.

Each outbound and inbound connection must use the same protocol, HTTP or HTTPS, as the Control Hub system.

The following image shows how authoring and execution Data Collectors communicate with Control Hub: