Edge Data Collectors Overview
Control Hub uses StreamSets Data Collector Edge (SDC Edge) to execute edge pipelines. SDC Edge is an execution engine that runs pipelines that read data from an edge device or that receive data from another pipeline and then act on that data to control an edge device.
You install Edge Data Collectors on edge devices and then register them to work with Control Hub. When you register an SDC Edge, you assign labels to the SDC Edge. The labels determine which jobs are run on that SDC Edge.
Control Hub monitors the resources that each SDC Edge uses. Control Hub only starts jobs on an SDC Edge that has not reached any resource thresholds.
You use an authoring Data Collector to design edge pipelines. You can design edge pipelines in the Control Hub Pipeline Designer after selecting an available authoring Data Collector to use. Or, you can directly log into an authoring Data Collector to design edge pipelines.
- Edge sending pipeline
- An edge sending pipeline runs on SDC Edge. It uses an origin specific to the edge device to read local data residing on the device. The pipeline can perform minimal processing on the data before sending the data to a Data Collector receiving pipeline.
- Data Collector receiving pipeline
- A Data Collector receiving pipeline runs on Data Collector. It reads data from the edge sending pipeline destination. Some systems require an intermediary message broker. The Data Collector receiving pipeline performs more complex processing on the data as needed, and then it writes the data to the final destinations.
- Edge receiving pipeline
- An edge receiving pipeline runs on SDC Edge. It listens for data sent by another pipeline running on Data Collector or on SDC Edge and then acts on that data to control the edge device.
For more information about designing edge pipelines including the supported stages, see Edge Pipelines.
After designing edge pipelines, you publish them to Control Hub. Within Control Hub, you add edge pipelines to jobs that run on an SDC Edge. You add the Data Collector receiving pipelines to jobs that run on an execution Data Collector.
For more details about how Control Hub and SDC Edge work together, see SDC Edge Communication.