System Data Collector

The system Data Collector provides a lightweight option for Control Hub users to explore and design pipelines and fragments.

Before you can design pipelines and pipeline fragments directly in Control Hub, you must select an authoring Data Collector. The selected authoring Data Collector determines the available stages, stage libraries, and functionality.

A Control Hub installation also includes a system Data Collector. Administrators can enable or disable the system Data Collector for use as the default authoring Data Collector in Control Hub. Any user can design pipelines and fragments with an installed and enabled system Data Collector. However, the system Data Collector cannot be used for data preview, explicit pipeline validation, or to configure a pipeline that uses connections. Furthermore, designing pipelines using a system Data Collector that is a newer version than the execution Data Collector can cause errors.

Important: StreamSets recommends using registered Data Collectors as authoring Data Collectors for Pipeline Designer. After the Control Hub installation, organization administrators register Data Collectors for their organization. A registered Data Collector owned by an organization can be used to design, preview, and explicitly validate pipelines and fragments.

The web browser that accesses Control Hub Pipeline Designer uses encrypted REST APIs to communicate with Control Hub and the system Data Collector. The web browser initiates outbound connections to Control Hub on the port number configured in the Control Hub system. The connection must use the same protocol, HTTP or HTTPS, as the Control Hub system.

As you design pipelines and fragments, the web browser sends requests to the Pipeline Store application. The Pipeline Store application saves and retrieves pipeline definitions in the Pipeline Store relational database. The system Data Collector is stateless - meaning that pipeline and fragment definitions are not saved with the Data Collector.

The Pipeline Store application sends additional requests to the system Data Collector to retrieve stage definitions and perform implicit validation.

The following image shows how Pipeline Designer interacts with the system Data Collector when you design pipelines and fragments: