Authoring Data Collectors
You use authoring Data Collectors to design pipelines and to create connections. You install and register authoring Data Collectors just as you do execution Data Collectors.
You can design pipelines in the Control Hub after selecting an available authoring Data Collector. The selected authoring Data Collector determines the stages, stage libraries, and functionality that display in Pipeline Designer. When you create connections, the selected authoring Data Collector determines the connection types that you can create.
Use an authoring Data Collector that is the same version as the execution Data Collectors that you intend to use to run the pipeline. Using a different Data Collector version can result in pipelines that are invalid for the execution Data Collectors.
For example, if the authoring Data Collector is a more recent version than the execution Data Collector, pipelines might include a stage, stage library, or stage functionality that does not exist in the execution Data Collector.
- System Data Collector
-
Control Hub can include a system Data Collector for exploration and light development.Control Hub includes a system Data Collector for exploration and light development. Administrators can enable or disable the system Data Collector for use as the default authoring Data Collector in Control Hub.
- Registered Data Collector
- You can select a registered Data Collector that meets all of the following requirements:
- StreamSets recommends using the latest version of Data Collector.
The minimum supported Data Collector version is 3.0.0.0. To design pipeline fragments, the minimum supported Data Collector version is 3.2.0.0. To create and use connections, the minimum supported Data Collector version is 3.19.0.
- The Data Collector uses the HTTPS protocol because Control Hub also uses the HTTPS protocol. Note: StreamSets recommends using a certificate signed by a certifying authority for a Data Collector that uses the HTTPS protocol. If you use a self-signed certificate, you must first use a browser to access the Data Collector URL and accept the web browser warning message about the self-signed certificate before users can select the Data Collector as the authoring Data Collector.
- The Data Collector uses the same protocol, HTTP or HTTPS, as the Control Hub system.Note: StreamSets recommends using a certificate signed by a certifying authority for a Data Collector that uses the HTTPS protocol. If you use a self-signed certificate, you must first use a browser to access the Data Collector URL and accept the web browser warning message about the self-signed certificate before users can select the Data Collector as the authoring Data Collector.
- The Data Collector URL is reachable from the Control Hub web browser.
- The Data Collector URL is reachable from the Control Hub web browser.
- StreamSets recommends using the latest version of Data Collector.
Authoring
label to the authoring Data Collectors.
That way, data engineers can easily determine which Data Collectors
are authoring Data Collectors
when they use Control Hub.