Authoring Data Collectors

You use authoring Data Collectors to design pipelines and to create connections. You install and register authoring Data Collectors just as you do execution Data Collectors.

You can design pipelines in the Control Hub after selecting an available authoring Data Collector. The selected authoring Data Collector determines the stages, stage libraries, and functionality that display in Pipeline Designer. When you create connections, the selected authoring Data Collector determines the connection types that you can create.

Use an authoring Data Collector that is the same version as the execution Data Collectors that you intend to use to run the pipeline. Using a different Data Collector version can result in pipelines that are invalid for the execution Data Collectors.

For example, if the authoring Data Collector is a more recent version than the execution Data Collector, pipelines might include a stage, stage library, or stage functionality that does not exist in the execution Data Collector.

When using Pipeline Designer, select one of the following types of Data Collectors to use as the authoring Data Collector:
System Data Collector

Control Hub can include a system Data Collector for exploration and light development.Control Hub includes a system Data Collector for exploration and light development. Administrators can enable or disable the system Data Collector for use as the default authoring Data Collector in Control Hub.

When you select the system Data Collector, Control Hub displays the latest version of all stage libraries available with the latest version of Data Collector. When you select the system Data Collector, Control Hub displays the stage libraries installed in the configured system Data Collector.
Use the system Data Collector to design pipelines only - it cannot be used for data preview or explicit pipeline validation. It also cannot be used to configure a pipeline that uses connections.
Registered Data Collector
You can select a registered Data Collector that meets all of the following requirements:
  • StreamSets recommends using the latest version of Data Collector.

    The minimum supported Data Collector version is 3.0.0.0. To design pipeline fragments, the minimum supported Data Collector version is 3.2.0.0. To create and use connections, the minimum supported Data Collector version is 3.19.0.

  • The Data Collector uses the HTTPS protocol because Control Hub also uses the HTTPS protocol.
    Note: StreamSets recommends using a certificate signed by a certifying authority for a Data Collector that uses the HTTPS protocol. If you use a self-signed certificate, you must first use a browser to access the Data Collector URL and accept the web browser warning message about the self-signed certificate before users can select the Data Collector as the authoring Data Collector.
  • The Data Collector uses the same protocol, HTTP or HTTPS, as the Control Hub system.
    Note: StreamSets recommends using a certificate signed by a certifying authority for a Data Collector that uses the HTTPS protocol. If you use a self-signed certificate, you must first use a browser to access the Data Collector URL and accept the web browser warning message about the self-signed certificate before users can select the Data Collector as the authoring Data Collector.
  • The Data Collector URL is reachable from the Control Hub web browser.
  • The Data Collector URL is reachable from the Control Hub web browser.
When you select a registered Data Collector, Control Hub displays the stage libraries and custom stage libraries installed in the registered Data Collector. Use a registered Data Collector to design, preview, and explicitly validate pipelines.
Tip: Use labels to clearly designate which Data Collectors are dedicated to pipeline design. For example, assign an Authoring label to the authoring Data Collectors. That way, data engineers can easily determine which Data Collectors are authoring Data Collectors when they use Control Hub.