Registered Data Collectors (Control Hub)#


Once a Data Collector instance is started, it can be registered with an organization on a Control Hub instance. Those instances are referred to as Registered Data Collectors, and are only accessible from within the organization they were registered with.

Updating Data Collector Resource Thresholds#

Control Hub can limit the amount of resources consumed on a Data Collector instance, regardless of the workload, to ensure that no Data Collector is ever overloaded. To check the resource thresholds configured for a given streamsets.sdk.sch_models.DataCollector instance, you can reference the max_cpu_load, max_memory_used, and max_pipelines_running attributes:

data_collector = sch.data_collectors[0]
data_collector.max_cpu_load
data_collector.max_memory_used
data_collector.max_pipelines_running

Output:

# data_collector.max_cpu_load
100.0

# data_collector.max_memory_used
1000000000000

# data_collector.max_pipelines_running
1000000000000

To set new values for the resource thresholds, you can use the streamsets.sdk.ControlHub.update_data_collector_resource_thresholds() method to pass in the streamsets.sdk.sch_models.DataCollector instance and values you wish to set:

sch.update_data_collector_resource_thresholds(data_collector,
                                              max_cpu_load=51.5,
                                              max_memory_used=550,
                                              max_pipelines_running=25)
data_collector.max_cpu_load

Output:

51.5