StreamSets Control Hub

Main interface

This is the main entry point used by users when interacting with SCH instances.

class streamsets.sdk.ControlHub(server_url, username, password)[source]

Class to interact with StreamSets Control Hub.

Parameters
  • server_url (str) – SCH server base URL.

  • username (str) – SCH username.

  • password (str) – SCH password.

acknowledge_deployment_error(*deployments)[source]

Acknowledge errors for one or more deployments.

Parameters

*deployments – One or more instances of streamsets.sdk.sch_models.Deployment.

acknowledge_event_subscription_error(subscription)[source]

Acknowledge an error on given Event Subscription.

Parameters

subscription (streamsets.sdk.sch_models.Subscription) – A Subscription instance.

Returns

An instance of streamsets.sdk.sch_api.Command.

acknowledge_job_error(*jobs)[source]

Acknowledge errors for one or more jobs.

Parameters

*jobs – One or more instances of streamsets.sdk.sch_models.Job.

property action_audits

Action Audits.

Returns

An instance of streamsets.sdk.sch_models.ActionAudits.

activate_datacollector(data_collector)[source]

Activate data collector.

Parameters

data_collector (streamsets.sdk.sch_models.DataCollector) – Data Collector object.

activate_provisioning_agent(provisioning_agent)[source]

Activate provisioning agent.

Parameters

provisioning_agent (streamets.sdk.sch_models.ProvisioningAgent) –

Returns

An instance of streamsets.sdk.sch_api.Command.

add_classification_rule(classification_rule, commit=False)[source]

Add a classification rule.

Parameters
add_connection(connection)[source]

Add a connection.

Parameters

connection (streamsets.sdk.sch_models.Connection) – Connection object.

Returns

An instance of streamsets.sdk.sch_api.Command.

add_deployment(deployment)[source]

Add a deployment.

Parameters

deployment (streamsets.sdk.sch_models.Deployment) – Deployment object.

Returns

An instance of streamsets.sdk.sch_api.Command.

add_group(group)[source]

Add a group.

Parameters

group (streamsets.sdk.sch_models.Group) – Group object.

Returns

An instance of streamsets.sdk.sch_api.Command.

add_job(job)[source]

Add a job.

Parameters

job (streamsets.sdk.sch_models.Job) – Job object.

Returns

An instance of streamsets.sdk.sch_models.Job.

add_organization(organization)[source]

Add an organization.

Parameters

organization (streamsets.sdk.sch_models.Organization) – Organization object.

Returns

An instance of streamsets.sdk.sch_api.Command.

add_protection_policy(protection_policy)[source]

Add a protection policy.

Parameters

protection_policy (streamsets.sdk.sch_models.ProtectionPolicy) – Protection Policy object.

add_report_definition(report_definition)[source]

Add Report Definition to Control Hub.

Parameters

report_definition (streamsets.sdk.sch_models.ReportDefinition) – Report Definition instance.

Returns

An instance of streamsets.sdk.sch_api.Command.

add_subscription(subscription)[source]

Add Subscription to Control Hub.

Parameters

subscription (streamsets.sdk.sch_models.Subscription) – A Subscription instance.

Returns

An instance of streamsets.sdk.sch_api.Command.

add_user(user)[source]
Add a user. Some user attributes are updated by SCH such as

created_by, created_on, last_modified_by, last_modified_on, password_expires_on, password_system_generated.

Parameters

user (streamsets.sdk.sch_models.User) – User object.

Returns

An instance of streamsets.sdk.sch_api.Command.

property alerts

Alerts.

Returns

A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.Alert.

balance_data_collectors(*data_collectors)[source]

Balance all jobs running on given Data Collectors.

Parameters

*sdcs – One or more instances of streamsets.sdk.sch_models.DataCollector.

balance_job(*jobs)[source]

Balance one or more jobs.

Parameters

*jobs – One or more instances of streamsets.sdk.sch_models.Job.

property connection_audits

Connection Audits.

Returns

An instance of streamsets.sdk.sch_models.ConnectionAudits.

property connection_tags

Connection Tags.

Returns

An instance of streamsets.sdk.sch_models.ConnectionTags.

property connections

Connections.

Returns

An instance of streamsets.sdk.sch_models.Connections.

create_components(component_type, number_of_components=1, active=True)[source]

Create components.

Parameters
  • component_type (str) – Component type.

  • number_of_components (int, optional) – Default: 1.

  • active (bool, optional) – Default: True.

Returns

An instance of streamsets.sdk.sch_api.CreateComponentsCommand.

property data_collectors

Data Collectors registered to the Control Hub instance.

Returns

Returns a streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.DataCollector instances.

property data_protector_enabled

Whether Data Protector is enabled for the current organization.

Type

bool

property data_protector_version

Returns the StreamSets Data Protector version string configured in the system data collector. If data protector is not enabled a None value is returned

Type

str

deactivate_datacollector(data_collector)[source]

Deactivate data collector.

Parameters

data_collector (streamsets.sdk.sch_models.DataCollector) – Data Collector object.

deactivate_provisioning_agent(provisioning_agent)[source]

Deactivate provisioning agent.

Parameters

provisioning_agent (streamets.sdk.sch_models.ProvisioningAgent) –

Returns

An instance of streamsets.sdk.sch_api.Command.

deactivate_user(*users, organization=None)[source]

Deactivate Users for all given User IDs.

Parameters
  • *users – One or more instances of streamsets.sdk.sch_models.User.

  • organization (str, optional) – Default: None. If not specified, current organization will be used.

Returns

An instance of streamsets.sdk.sch_api.Command.

delete_and_unregister_data_collector(data_collector)[source]

Delete and Unregister data collector.

Parameters

data_collector (streamsets.sdk.sch_models.DataCollector) – Data Collector object.

delete_connection(*connections)[source]

Delete connections.

Parameters

*connections – One or more instances of streamsets.sdk.sch_models.Connection.

delete_data_collector(data_collector)[source]

Delete data collector.

Parameters

data_collector (streamsets.sdk.sch_models.DataCollector) – Data Collector object.

delete_deployment(*deployments)[source]

Delete deployments.

Parameters

*deployments – One or more instances of streamsets.sdk.sch_models.Deployment.

Returns

An instance of streamsets.sdk.sch_api.Command.

delete_group(*groups)[source]

Delete groups.

Parameters

*groups – One or more instances of streamsets.sdk.sch_models.Group.

Returns

An instance of streamsets.sdk.sch_api.Command.

delete_job(*jobs)[source]

Delete one or more jobs.

Parameters

*jobs – One or more instances of streamsets.sdk.sch_models.Job.

delete_pipeline(pipeline, only_selected_version=False)[source]

Delete a pipeline.

Parameters
Returns

An instance of streamsets.sdk.sch_api.Command.

delete_pipeline_labels(*pipeline_labels)[source]

Delete pipeline labels.

Parameters

*pipeline_labels – One or more instances of streamsets.sdk.sch_models.PipelineLabel.

Returns

An instance of streamsets.sdk.sch_api.Command.

delete_provisioning_agent(provisioning_agent)[source]

Delete provisioning agent.

Parameters

provisioning_agent (streamets.sdk.sch_models.ProvisioningAgent) –

Returns

An instance of streamsets.sdk.sch_api.Command.

delete_provisioning_agent_token(provisioning_agent)[source]

Delete provisioning agent token.

Parameters

provisioning_agent (streamets.sdk.sch_models.ProvisioningAgent) –

Returns

An instance of streamsets.sdk.sch_api.Command.

delete_report_definition(report_definition)[source]

Delete an existing Report Definition.

Parameters

report_definition (streamsets.sdk.sch_models.ReportDefinition) – Report Definition instance.

Returns

An instance of streamsets.sdk.sch_api.Command.

delete_subscription(subscription)[source]

Delete an exisiting Subscription.

Parameters

subscription (streamsets.sdk.sch_models.Subscription) – A Subscription instance.

Returns

An instance of streamsets.sdk.sch_api.Command.

delete_topology(topology, only_selected_version=False)[source]

Delete a topology.

Parameters
Returns

An instance of streamsets.sdk.sch_api.Command.

delete_user(*users, deactivate=False)[source]

Delete users. Deactivate users before deleting if configured.

Parameters
Returns

An instance of streamsets.sdk.sch_api.Command.

property deployments

Deployments.

Returns

An instance of streamsets.sdk.sch_models.Deployments.

duplicate_job(job, name=None, description=None, number_of_copies=1)[source]

Duplicate an existing job.

Parameters
  • job (streamsets.sdk.sch_models.Job) – Job object.

  • name (str, optional) – Name of the new job(s). Default: None. If not specified, name of the job with ' copy' appended to the end will be used.

  • description (str, optional) – Description for new job(s). Default: None.

  • number_of_copies (int, optional) – Number of copies. Default: 1.

Returns

A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.Job.

duplicate_pipeline(pipeline, name=None, description='New Pipeline', number_of_copies=1)[source]

Duplicate an existing pipeline.

Parameters
  • pipeline (streamsets.sdk.sch_models.Pipeline) – Pipeline object.

  • name (str, optional) – Name of the new pipeline(s). Default: None.

  • description (str, optional) – Description for new pipeline(s). Default: 'New Pipeline'.

  • number_of_copies (int, optional) – Number of copies. Default: 1.

Returns

A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.Pipeline.

edit_job(job)[source]

Edit a job.

Parameters

job (streamsets.sdk.sch_models.Job) – Job object.

Returns

An instance of streamsets.sdk.sch_models.Job.

export_jobs(jobs)[source]

Export jobs to a compressed archive.

Parameters

jobs (list) – A list of streamsets.sdk.sch_models.Job instances.

Returns

An instance of type bytes indicating the content of zip file with job json files.

export_pipelines(pipelines, fragments=False, include_plain_text_credentials=False)[source]

Export pipelines.

Parameters
  • pipelines (list) – A list of streamsets.sdk.sch_models.Pipeline instances.

  • fragments (bool) – Indicates if exporting fragments is needed.

  • include_plain_text_credentials (bool) – Indicates if plain text credentials should be included.

Returns

An instance of type bytes indicating the content of zip file with pipeline json files.

export_protection_policies(protection_policies)[source]

Export protection policies to a compressed archive.

Parameters
Returns

An instance of type bytes indicating the content of zip file with protection policy json files.

export_topologies(topologies)[source]

Export topologies.

Parameters

topologies (list) – A list of streamsets.sdk.sch_models.Topology instances.

Returns

An instance of type bytes indicating the content of zip file with pipeline json files.

get_admin_tool(base_url, username, password)[source]

Get SCH admin tool.

Returns

An instance of streamsets.sdk.sch_models.AdminTool.

get_classification_rule_builder()[source]

Get a classification rule builder instance with which a classification rule can be created.

Returns

An instance of streamsets.sdk.sch_models.ClassificationRuleBuilder.

get_components(component_type_id, offset=None, len_=None, order_by='LAST_VALIDATED_ON', order='ASC')[source]

Get components.

Parameters
  • component_type_id (str) – Component type id.

  • offset (str, optional) – Default: None.

  • len (str, optional) – Default: None.

  • order_by (str, optional) – Default: 'LAST_VALIDATED_ON'.

  • order (str, optional) – Default: 'ASC'.

get_connection_builder()[source]

Get a connection builder instance with which a connection can be created.

Returns

An instance of streamsets.sdk.sch_models.ConnectionBuilder.

get_current_job_status(job)[source]

Returns the current job status for given job id.

Parameters

job (streamsets.sdk.sch_models.Job) – Job object.

get_data_collector_labels(data_collector)[source]

Returns all labels assigned to data collector.

Parameters

data_collector (streamsets.sdk.sch_models.DataCollector) – Data Collector object.

Returns

A list of data collector assigned labels.

get_data_sla_builder()[source]

Get a Data SLA builder instance with which a Data SLA can be created.

Returns

An instance of streamsets.sdk.sch_models.DataSlaBuilder.

get_deployment_builder()[source]

Get a deployment builder instance with which a deployment can be created.

Returns

An instance of streamsets.sdk.sch_models.DeploymentBuilder.

get_group_builder()[source]

Get a group builder instance with which a group can be created.

Returns

An instance of streamsets.sdk.sch_models.GroupBuilder.

get_job_builder()[source]

Get a job builder instance with which a job can be created.

Returns

An instance of streamsets.sdk.sch_models.JobBuilder.

get_organization_builder()[source]

Get an organization builder instance with which an organization can be created.

Returns

An instance of streamsets.sdk.sch_models.OrganizationBuilder.

get_pipeline_builder(data_collector=None, transformer=None, fragment=False)[source]

Get a pipeline builder instance with which a pipeline can be created.

Parameters
  • data_collector (streamsets.sdk.sch_models.DataCollector, optional) – The Data Collector in which to author the pipeline. If omitted, Control Hub’s system SDC will be used. Default: None.

  • transformer (streamsets.sdk.sch_models.Transformer, optional) – The Transformer in which to author the pipeline.

  • fragment (boolean, optional) – Specify if a fragment builder. Default: False.

Returns

An instance of streamsets.sdk.sch_models.PipelineBuilder or streamsets.sdk.sch_models.StPipelineBuilder.

get_protection_method_builder()[source]

Get a protection method builder instance with which a protection method can be created.

Returns

An instance of streamsets.sdk.sch_models.ProtectionMethodBuilder.

get_protection_policy_builder()[source]

Get a protection policy builder instance with which a protection policy can be created.

Returns

An instance of streamsets.sdk.sch_models.ProtectionPolicyBuilder.

get_report_definition_builder()[source]

Get a Report Definition Builder instance with which a Report Definition can be created.

Returns

An instance of streamsets.sdk.sch_models.ReportDefinitionBuilder.

get_scheduled_task_builder()[source]

Get a scheduled task builder instance with which a scheduled task can be created.

Returns

An instance of streamsets.sdk.sch_models.ScheduledTaskBuilder.

get_subscription_builder()[source]

Get Event Subscription Builder.

Returns

An instance of streamsets.sdk.sch_models.SubscriptionBuilder.

get_topology_builder()[source]

Get a topology builder instance with which a topology can be created.

Returns

An instance of streamsets.sdk.sch_models.TopologyBuilder.

get_user_builder()[source]

Get a user builder instance with which a user can be created.

Returns

An instance of streamsets.sdk.sch_models.UserBuilder.

property groups

Groups.

Returns

An instance of streamsets.sdk.sch_models.Groups.

import_jobs(archive, pipeline=True, number_of_instances=False, labels=False, runtime_parameters=False, **kwargs)[source]

Import jobs from archived zip directory.

Parameters
  • archive (file) – file containing the jobs.

  • pipeline (boolean, optional) – Indicate if pipeline should be imported. Default: True.

  • number_of_instances (boolean, optional) – Indicate if number of instances should be imported. Default: False.

  • labels (boolean, optional) – Indicate if labels should be imported. Default: False.

  • runtime_parameters (boolean, optional) – Indicate if runtime parameters should be imported. Default: False.

Returns

A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.Job.

import_pipeline(pipeline, commit_message, name=None, data_collector_instance=None)[source]

Import pipeline from json file.

Parameters
  • pipeline (dict) – A python dict representation of ControlHub Pipeline.

  • commit_message (str) – Commit message.

  • name (str, optional) – Name of the pipeline. If left out, pipeline name from JSON object will be used. Default None.

  • data_collector_instance (streamsets.sdk.sch_models.DataCollector) – If excluded, system sdc will be used. Default None.

Returns

An instance of streamsets.sdk.sch_models.Pipeline.

import_pipelines_from_archive(archive, commit_message, fragments=False)[source]

Import pipelines from archived zip directory.

Parameters
  • archive (file) – file containing the pipelines.

  • commit_message (str) – Commit message.

  • fragments (bool, optional) – Indicates if pipeline contains fragments.

Returns

A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.Pipeline.

import_protection_policies(policies_archive)[source]

Import protection policies from a compressed archive.

Parameters

policies_archive (file) – file containing the protection policies.

Returns

A py:class:streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.ProtectionPolicy.

import_topologies(archive, import_number_of_instances=False, import_labels=False, import_runtime_parameters=False, **kwargs)[source]

Import topologies from archived zip directory.

Parameters
  • archive (file) – file containing the topologies.

  • import_number_of_instances (boolean, optional) – Indicate if number of instances should be imported. Default: False.

  • import_labels (boolean, optional) – Indicate if labels should be imported. Default: False.

  • import_runtime_parameters (boolean, optional) – Indicate if runtime parameters should be imported. Default: False.

Returns

A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.Topology.

property jobs

Jobs.

Returns

An instance of streamsets.sdk.sch_models.Jobs.

property ldap_enabled

Indication if LDAP is enabled or not.

Returns

An instance of boolean.

property login_audits

Login Audits.

Returns

An instance of streamsets.sdk.sch_models.LoginAudits.

property organizations

Organizations.

Returns

An instance of streamsets.sdk.sch_models.Organizations.

property pipeline_labels

Pipeline labels.

Returns

An instance of streamsets.sdk.sch_models.PipelineLabels.

property pipelines

Pipelines.

Returns

An instance of streamsets.sdk.sch_models.Pipelines.

preview_classification_rule(classification_rule, parameter_data, data_collector=None)[source]

Dynamic preview of a classification rule.

Parameters
Returns

An instance of streamsets.sdk.sdc_api.PreviewCommand.

property protection_policies

Protection policies.

Returns

An instance of streamsets.sdk.sch_models.ProtectionPolicies.

property provisioning_agents

Provisioning Agents registered to the Control Hub instance.

Returns

An instance of streamsets.sdk.sch_models.ProvisioningAgents.

publish_pipeline(pipeline, commit_message='New pipeline', draft=False)[source]

Publish a pipeline.

Parameters
  • pipeline (streamsets.sdk.sch_models.Pipeline) – Pipeline object.

  • commit_message (str, optional) – Default: 'New pipeline'.

  • draft (boolean, optional) – Default: False.

publish_scheduled_task(task)[source]

Send the scheduled task to Control Hub.

Parameters

task (streamsets.sdk.sch_models.ScheduledTask) – Scheduled task object.

Returns

An instance of streamsets.sdk.sch_api.Command.

publish_topology(topology, commit_message=None)[source]

Publish a topology.

Parameters
  • topology (streamsets.sdk.sch_models.Topology) – Topology object to publish.

  • commit_message (str, optional) – Commit message to supply with the Topology. Default: None

Returns

An instance of streamsets.sdk.sch_api.Command.

property report_definitions

Report Definitions.

Returns

An instance of streamsets.sdk.sch_models.ReportDefinitions.

reset_origin(*jobs)[source]

Reset all pipeline offsets for given jobs.

Parameters

*jobs – One or more instances of streamsets.sdk.sch_models.Job.

Returns

A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.Job.

run_pipeline_preview(pipeline, batches=1, batch_size=10, skip_targets=True, skip_lifecycle_events=True, timeout=120000, test_origin=False, read_policy=None, write_policy=None, executor=None, **kwargs)[source]

Run pipeline preview.

Parameters
  • pipeline (streamsets.sdk.sch_models.Pipeline) – Pipeline object.

  • batches (int, optional) – Number of batches. Default: 1.

  • batch_size (int, optional) – Batch size. Default: 10.

  • skip_targets (bool, optional) – Skip targets. Default: True.

  • skip_lifecycle_events (bool, optional) – Skip life cycle events. Default: True.

  • timeout (int, optional) – Timeout. Default: 120000.

  • test_origin (bool, optional) – Test origin. Default: False.

  • read_policy (streamsets.sdk.sch_models.ProtectionPolicy) – Read Policy for preview. If not provided, uses default read policy if one available. Default: None.

  • write_policy (streamsets.sdk.sch_models.ProtectionPolicy) – Write Policy for preview. If not provided, uses default write policy if one available. Default: None.

  • executor (streamsets.sdk.sch_models.DataCollector, optional) – The Data Collector in which to preview the pipeline. If omitted, Control Hub’s first executor SDC will be used. Default: None.

Returns

An instance of streamsets.sdk.sdc_api.PreviewCommand.

scale_deployment(deployment, num_instances)[source]

Scale up/down active deployment.

Parameters
Returns

An instance of streamsets.sdk.sch_api.Command.

property scheduled_tasks

Scheduled Tasks.

Returns

An instance of streamsets.sdk.sch_models.ScheduledTasks.

set_default_read_protection_policy(protection_policy)[source]

Set a default read protection policy.

Parameters

protection_policy

:param (streamsets.sdk.sch_models.ProtectionPolicy): Protection :param Policy object to be set as the default read policy.:

Returns

An updated instance of streamsets.sdk.sch_models.ProtectionPolicy.

Raises

UnsupportedMethodError – Only supported on Control Hub version 3.14.0+.

set_default_write_protection_policy(protection_policy)[source]

Set a default write protection policy.

Parameters

protection_policy

:param (streamsets.sdk.sch_models.ProtectionPolicy): Protection :param Policy object to be set as the default write policy.:

Returns

An updated instance of streamsets.sdk.sch_models.ProtectionPolicy.

Raises

UnsupportedMethodError – Only supported on Control Hub version 3.14.0+.

set_user(username, password)[source]

Set the user by which subsequent actions will be run.

Parameters
  • username (str) – SCH username.

  • password (str) – SCH password.

Returns

An instance of streamsets.sdk.sch_api.Command.

start_deployment(deployment, **kwargs)[source]

Start Deployment.

Parameters
  • deployment (streamsets.sdk.sch_models.Deployment) – Deployment instance.

  • wait (bool, optional) – Wait for deployment to start. Default: True.

  • wait_for_statuses (list, optional) – Deployment statuses to wait on. Default: ['ACTIVE'].

Returns

An instance of streamsets.sdk.sch_api.DeploymentStartStopCommand.

start_job(*jobs, wait=True, **kwargs)[source]

Start one or more jobs.

Parameters
  • *jobs – One or more instances of streamsets.sdk.sch_models.Job.

  • wait (bool, optional) – Wait for pipelines to reach RUNNING status before returning. Default: True.

  • **kwargs – Arbitrary keyword arguments.

Returns

An instance of streamsets.sdk.sch_api.StartJobsCommand.

start_job_template(job_template, instance_name_suffix='COUNTER', parameter_name=None, runtime_parameters=None, number_of_instances=1, wait_for_data_collectors=False)[source]

Start Job instances from a Job Template.

Parameters
  • job_template (streamsets.sdk.sch_models.Job) – A Job instance with the property job_template set to True.

  • instance_name_suffix (str, optional) – Suffix to be used for Job names in {‘COUNTER’, ‘TIME_STAMP’, ‘PARAM_VALUE’}. Default: COUNTER.

  • parameter_name (str, optional) – Specified when instance_name_suffix is ‘PARAM_VALUE’. Default: None.

  • runtime_parameters (dict) or (list) – Runtime Parameters to be used in the jobs. If a dict is specified, number_of_instances jobs will be started. If a list is specified, number_of_instances is ignored and job instances will be started using the elements of the list as Runtime Parameters for each job. If left out, Runtime Parameters from Job Template will be used. Default: None.

  • number_of_instances (int, optional) – Number of instances to be started using given parameters. Default: 1.

  • wait_for_data_collectors (bool, optional) – Default: False.

Returns

A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.Job instances.

stop_deployment(deployment, wait_for_statuses=['INACTIVE'])[source]

Stop Deployment.

Parameters
Returns

An instance of streamsets.sdk.sch_api.DeploymentStartStopCommand.

stop_job(*jobs, force=False, timeout_sec=300)[source]

Stop one or more jobs.

Parameters
  • *jobs – One or more instances of streamsets.sdk.sch_models.Job.

  • force (bool, optional) – Force job to stop. Default: False.

  • timeout_sec (int, optional) – Timeout in secs. Default: 300.

stop_test_pipeline_run(start_pipeline_command)[source]

Stop the test run of pipeline.

Parameters

start_pipeline_command (streamsets.sdk.sdc_api.StartPipelineCommand) –

Returns

An instance of streamsets.sdk.sdc_api.StopPipelineCommand.

property subscription_audits

Subscription audits.

Returns

A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.SubscriptionAudit.

property subscriptions

Event Subscriptions.

Returns

An instance of streamsets.sdk.sch_models.Subscriptions.

sync_job(*jobs)[source]

Sync one or more jobs.

Parameters

*jobs – One or more instances of streamsets.sdk.sch_models.Job.

test_pipeline_run(pipeline, reset_origin=False, parameters=None)[source]

Test run a pipeline.

Parameters
  • pipeline (streamsets.sdk.sch_models.Pipeline) – Pipeline object.

  • reset_origin (boolean, optional) – Default: False.

  • parameters (dict, optional) – Pipeline parameters. Default: None.

Returns

An instance of streamsets.sdk.sdc_api.StartPipelineCommand.

property topologies

Topologies.

Returns

An instance of streamsets.sdk.sch_models.Topologies.

property transformers

Transformers registered to the Control Hub instance.

Returns

Returns a streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.Transformer instances.

update_connection(connection)[source]

Update a connection.

Parameters

connection (streamsets.sdk.sch_models.Connection) – Connection object.

Returns

An instance of streamsets.sdk.sch_api.Command.

update_data_collector_labels(data_collector)[source]

Update data collector labels.

Parameters

data_collector (streamsets.sdk.sch_models.DataCollector) – Data Collector object.

Returns

An instance of streamsets.sdk.sch_models.DataCollector.

update_data_collector_resource_thresholds(data_collector, max_cpu_load=None, max_memory_used=None, max_pipelines_running=None)[source]

Updates data collector resource thresholds.

Parameters
  • data_collector (streamsets.sdk.sch_models.DataCollector) – Data Collector object.

  • max_cpu_load (float, optional) – Max CPU load in percentage. Default: None.

  • max_memory_used (int, optional) – Max memory used in MB. Default: None.

  • max_pipelines_running (int, optional) – Max pipelines running. Default: None.

Returns

An instance of streamsets.sdk.sch_api.Command.

update_deployment(deployment)[source]

Update a deployment.

Parameters

deployment (streamsets.sdk.sch_models.Deployment) – Deployment object.

Returns

An instance of streamsets.sdk.sch_api.Command.

update_group(group)[source]

Update a group.

Parameters

group (streamsets.sdk.sch_models.Group) – Group object.

Returns

An instance of streamsets.sdk.sch_api.Command.

update_job(job)[source]

Update a job.

Parameters

job (streamsets.sdk.sch_models.Job) – Job object.

Returns

An instance of streamsets.sdk.sch_models.Job.

update_organization(organization)[source]

Update an organization.

Parameters

organization (streamsets.sdk.sch_models.Organization) – Organization instance.

Returns

An instance of streamsets.sdk.sch_api.Command.

update_pipelines_with_different_fragment_version(pipelines, from_fragment_version, to_fragment_version)[source]

Update pipelines with latest pipeline fragment commit version.

Parameters
Returns

An instance of streamsets.sdk.sch_api.Command

update_report_definition(report_definition)[source]

Update an existing Report Definition.

Parameters

report_definition (streamsets.sdk.sch_models.ReportDefinition) – Report Definition instance.

Returns

An instance of streamsets.sdk.sch_api.Command.

update_subscription(subscription)[source]

Update an existing Subscription.

Parameters

subscription (streamsets.sdk.sch_models.Subscription) – A Subscription instance.

Returns

An instance of streamsets.sdk.sch_api.Command.

update_user(user)[source]
Update a user. Some user attributes are updated by SCH such as

last_modified_by, last_modified_on.

Parameters

user (streamsets.sdk.sch_models.User) – User object.

Returns

An instance of streamsets.sdk.sch_api.Command.

upgrade_job(*jobs)[source]

Upgrade job(s) to latest pipeline version.

Parameters

*jobs – One or more instances of streamsets.sdk.sch_models.Job.

Returns

A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.Job.

upload_offset(job, offset_file=None, offset_json=None)[source]

Upload offset for given job.

Parameters
  • job (streamsets.sdk.sch_models.Job) – Job object.

  • offset_file (file, optional) – File containing the offsets. Default: None. Exactly one of offset_file, offset_json should specified.

  • offset_json (dict, optional) – Contents of offset. Default: None. Exactly one of offset_file, offset_json should specified.

Returns

An instance of streamsets.sdk.sch_models.Job.

property users

Users.

Returns

An instance of streamsets.sdk.sch_models.Users.

verify_connection(connection)[source]

Verify connection.

Parameters

connection (streamsets.sdk.sch_models.Connection) – Connection object.

Returns

An instance of streamsets.sdk.sch_models.ConnectionVerificationResult.

wait_for_job_metrics_record_count(job, count, timeout_sec=200)[source]

Block until a job’s metrics reaches the desired count.

Parameters
  • job (streamsets.sdk.sch_models.Job) – The job instance.

  • count (int) – The desired value to wait for.

  • timeout_sec (int, optional) – Timeout to wait for metric to reach count, in seconds. Default: streamsets.sdk.sch.DEFAULT_WAIT_FOR_METRIC_TIMEOUT.

Raises

TimeoutError – If timeout_sec passes without metric reaching value.

wait_for_job_pipeline_metric(job, metric, value, timeout_sec=200)[source]

Block until a job’s pipeline metric reaches the desired value. Note: Currently this method works only for job with DataCollector executor.

Parameters
  • job (streamsets.sdk.sch_models.Job) – The job instance.

  • metric (str) – The desired metric (e.g. 'output_record_count' or 'data_batch_count').

  • value (int) – The desired value to wait for.

  • timeout_sec (int, optional) – Timeout to wait for metric to reach value, in seconds. Default: streamsets.sdk.sch.DEFAULT_WAIT_FOR_METRIC_TIMEOUT.

Raises

TimeoutError – If timeout_sec passes without metric reaching value.

wait_for_job_status(job, status, timeout_sec=200)[source]

Block until a job reaches the desired status.

Parameters
  • job (streamsets.sdk.sch_models.Job) – The job instance.

  • status (str) – The desired status to wait for.

  • timeout_sec (int, optional) – Timeout to wait for job to reach status, in seconds. Default: streamsets.sdk.sch.DEFAULT_WAIT_FOR_STATUS_TIMEOUT.

Raises

TimeoutError – If timeout_sec passes without job reaching status.

Models

These models wrap and provide useful functionality for interacting with common SCH abstractions.

ACLs

class streamsets.sdk.sch_models.ACL(acl, control_hub)[source]

Represents an ACL.

Parameters
permissions

A Collection of Permissions.

Type

streamsets.sdk.sch_models.Permissions

add_permission(permission)[source]

Add new permission to the ACL.

Parameters

permission (streamsets.sdk.sch_models.Permission) – A permission object.

Returns

An instance of streamsets.sdk.sch_api.Command

property permission_builder

Get a permission builder instance with which a pipeline can be created.

Returns

An instance of streamsets.sdk.sch_models.ACLPermissionBuilder.

remove_permission(permission)[source]

Remove a permission from ACL.

Parameters

permission (streamsets.sdk.sch_models.Permission) – A permission object.

Returns

An instance of streamsets.sdk.sch_api.Command

class streamsets.sdk.sch_models.ACLPermissionBuilder(permission, acl)[source]

Class to help build the ACL permission.

Parameters
build(subject_id, subject_type, actions)[source]

Method to help build the ACL permission.

Parameters
  • subject_id (str) – Id of the subject e.g. ‘test@test’.

  • subject_type (str) – Type of the subject e.g. ‘USER’.

  • actions (list) – A list of actions of type str e.g. [‘READ’, ‘WRITE’, ‘EXECUTE’].

Returns

An instance of streamsets.sdk.sch_models.Permission.

class streamsets.sdk.sch_models.Permission(permission, resource_type, api_client)[source]

A container for a permission.

Parameters
  • permission (dict) – A Python object representation of a permission.

  • resource_type (str) – String representing the type of resource e.g. ‘JOB’, ‘PIPELINE’.

  • api_client (streamsets.sdk.sch_api.ApiClient) – An instance of ApiClient.

resource_id

Id of the resource e.g. Pipeline or Job.

Type

str

subject_id

Id of the subject e.g. user id 'admin@admin'.

Type

str

subject_type

Type of the subject e.g. 'USER'.

Type

str

last_modified_by

User who last modified this permission e.g. 'admin@admin'.

Type

str

last_modified_on

Timestamp at which this permission was last modified e.g. 1550785079811.

Type

int

Alerts

class streamsets.sdk.sch_models.Alert(alert, control_hub)[source]

Model for Alerts.

message

The Alert’s message text.

Type

str

alert_status

The status of the Alert.

Type

str

Parameters
acknowledge()[source]

Acknowledge an active Alert.

delete()[source]

Delete an acknowledged Alert.

Classifiers

class streamsets.sdk.sch_models.Classifier(classifier)[source]

Classifier model.

Parameters

classifier (dict) – A Python dict representation of classifier.

Classification Rules

class streamsets.sdk.sch_models.ClassificationRule(classification_rule, classifiers)[source]

Classification Rule Model.

Parameters
class streamsets.sdk.sch_models.ClassificationRuleBuilder(classification_rule, classifier)[source]

Class with which to build instances of streamsets.sdk.sch_models.ClassificationRule.

Instead of instantiating this class directly, most users should use

streamsets.sdk.sch.ControlHub.get_classification_rule_builder().

Parameters
  • classification_rule (dict) – Python object defining a classification rule.

  • classifier (dict) – Python object defining a classifier.

add_classifier(patterns=None, match_with=None, regular_expression_type='RE2/J', case_sensitive=False)[source]

Add classifier to the classification rule.

Parameters
  • patterns (list, optional) – List of strings of patterns. Default: None.

  • match_with (str, optional) – Default: None.

  • regular_expression_type (str, optional) – Default: 'RE2/J'.

  • case_sensitive (bool, optional) – Default: False.

Returns

An instance of streamsets.sdk.sch_models.Classifier.

build(name, category, score)[source]

Build the classification rule.

Parameters
  • name (str) – Classification Rule name.

  • category (str) – Classification Rule category.

  • score (float) – Classification Rule score.

Returns

An instance of streamsets.sdk.sch_models.ClassificationRule.

Connections

class streamsets.sdk.sch_models.Connection(connection, control_hub)[source]

Model for connection.

Parameters
property acl

Get Connection ACL.

Returns

An instance of streamsets.sdk.sch_models.ACL.

add_tag(*tags)[source]

Add a tag

Parameters

*tags – One or more instances of str

property pipeline_commits

Get the pipeline commits using this connection.

Returns

A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.PipelineCommit

instances.

remove_tag(*tags)[source]

Remove a tag

Parameters

*tags – One or more instances of str

property tags

Get the connection tags.

Returns

A streamsets.sdk.utils.SeekableList of instances of streamsets.sdk.sch_models.Tag.

class streamsets.sdk.sch_models.Connections(control_hub, organization)[source]

Collection of streamsets.sdk.sch_models.Connection instances.

Parameters
class streamsets.sdk.sch_models.ConnectionBuilder(connection, control_hub)[source]

Class with which to build instances of streamsets.sdk.sch_models.Connection.

Instead of instantiating this class directly, most users should use

streamsets.sdk.sch.ControlHub.get_connection_builder().

Parameters
  • connection (dict) – Python object built from our Swagger ConnectionJson definition.

  • control_hub (streamsets.sdk.sch.ControlHub) – Control Hub object.

build(title, connection_type, authoring_data_collector, tags=None)[source]

Define the connection.

Parameters
  • title (str) – Connection title.

  • connecion_type (str) – Type of connection.

  • authoring_data_collector (streamsets.sdk.ControlHub.DataCollector) – Authoring Data Collector.

  • tags (list, optional) – List of tags (strings). Default: None.

Returns

An instance of streamsets.sdk.sch_models.Connection.

class streamsets.sdk.sch_models.ConnectionVerificationResult(connection_preview_json)[source]

Model for connection verification result.

Parameters

connection_preview_json (dict) – dynamic preview API response JSON

property issue_count

The count of the number of issues for the connection verification result.

Returns

A int that represents the number of issues.

property issue_message

The message provided for the connection verification result.

Returns

A str message detailing the response for the connection verification result.

DataCollectors

class streamsets.sdk.sch_models.DataCollector(data_collector, control_hub)[source]

Model for Data Collector.

execution_mode

True for Edge and False for SDC.

Type

bool

id

Data Collectort id.

Type

str

labels

Labels for Data Collector.

Type

list

last_validated_on

Last validated time for Data Collector.

Type

str

reported_labels

Reported labels for Data Collector.

Type

list

url

Data Collector’s url.

Type

str

version

Data Collector’s version.

Type

str

property accessible

Returns a bool for whether the Data Collector instance is accessible.

property acl

Get DataCollector ACL.

Returns

An instance of streamsets.sdk.sch_models.ACL.

property attributes

Returns a dict of Data Collector attributes.

property pipelines_committed

Control Hub Job IDs that are about to be started but have no corresponding pipeline status yet.

Returns

A list of Job IDs (str objects).

property responding

Returns a bool for whether the Data Collector instance is responding.

Transformers

class streamsets.sdk.sch_models.Transformer(transformer, control_hub)[source]

Model for Transformer.

execution_mode
Type

str

id

Transformer id.

Type

str

labels

Labels for Transformer.

Type

list

last_validated_on

Last validated time for Transformer.

Type

str

reported_labels

Reported labels for Transformer.

Type

list

url

Transformer’s url.

Type

str

version

Transformer’s version.

Type

str

property accessible

Returns a bool for whether the Transformer instance is accessible.

property acl

Get Transformer ACL.

Returns

An instance of streamsets.sdk.sch_models.ACL.

property attributes

Returns a dict of Transformer attributes.

Group

class streamsets.sdk.sch_models.Group(group, roles)[source]

Model for Group.

Parameters
  • group (dict) – A Python object representation of Group.

  • roles (dict) – A mapping of role IDs to role labels.

class streamsets.sdk.sch_models.Groups(control_hub, roles, organization)[source]

Collection of streamsets.sdk.sch_models.Group instances.

Parameters
  • control_hub – An instance of streamsets.sdk.sch.ControlHub.

  • roles (dict) – A mapping of role IDs to role labels.

  • organization (str) – Organization ID.

class streamsets.sdk.sch_models.GroupBuilder(group, roles)[source]

Class with which to build instances of streamsets.sdk.sch_models.Group.

Instead of instantiating this class directly, most users should use

streamsets.sdk.sch.ControlHub.get_group_builder().

Parameters
  • group (dict) – Python object built from our Swagger GroupJson definition.

  • roles (dict) – A mapping of role IDs to role labels.

build(id, display_name, ldap_groups=None)[source]

Build the group.

Parameters
  • id (str) – Group ID.

  • display_name (str) – Group display name.

  • ldap_groups (list) – List of LDAP groups (strings).

Returns

An instance of streamsets.sdk.sch_models.Group.

Jobs

class streamsets.sdk.sch_models.Job(job, control_hub=None)[source]

Model for Job.

commit_id

Pipeline commit id.

Type

str

commit_label

Pipeline commit label.

Type

str

created_by

User that created this job.

Type

str

created_on

Time at which this job was created.

Type

int

data_collector_labels

Labels of the data collectors.

Type

list

description

Job description.

Type

str

destroyer

Job destroyer.

Type

str

enable_failover

Flag that indicates if failover is enabled.

Type

bool

enable_time_series_analysis

Flag that indicates if time series is enabled.

Type

bool

execution_mode

True for Edge and False for SDC.

Type

bool

job_deleted

Flag that indicates if this job is deleted.

Type

bool

job_id

Id of the job.

Type

str

job_name

Name of the job.

Type

str

last_modified_by

User that last modified this job.

Type

str

last_modified_on

Time at which this job was last modified.

Type

int

number_of_instances

Number of instances.

Type

int

pipeline_force_stop_timeout

Timeout for Pipeline force stop.

Type

int

pipeline_id

Id of the pipeline that is running the job.

Type

str

pipeline_name

Name of the pipeline that is running the job.

Type

str

pipeline_rule_id

Rule Id of the pipeline that is running the job.

Type

str

read_policy

Read Policy of the job.

Type

streamsets.sdk.sch_models.ProtectionPolicy

require_job_error_acknowledgement

acknowledgement.

Type

bool`

runtime_parameters

Run-time parameters of the job.

Type

str

statistics_refresh_interval_in_millisecs

Refresh interval for statistics in milliseconds.

Type

int

status

Status of the job.

Type

string

write_policy

Write Policy of the job.

Type

streamsets.sdk.sch_models.ProtectionPolicy

property acl

Get job ACL.

Returns

An instance of streamsets.sdk.sch_models.ACL.

add_tag(*tags)[source]

Add a tag

Parameters

*tags – One or more instances of str

property commit

Get pipeline commit of the job.

Returns

An instance of streamsets.sdk.sch_models.PipelineCommit.

property committed_offsets

Get the committed offsets for a given job id.

Returns

An instance of streamsets.sdk.sch_models.JobCommittedOffset.

property metrics

The metrics from all runs of a Job.

Returns

A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.JobMetrics instances.

property pipeline

Get the pipeline object corresponding to this job.

remove_tag(*tags)[source]

Remove a tag

Parameters

*tags – One or more instances of str

property system_job

Get the sytem Job for this job if exists.

Returns

An instance of streamsets.sdk.sch_models.Job.

property tag

Get pipeline tag of the job.

Returns

An instance of streamsets.sdk.sch_models.PipelineTag.

property tags

Get the job tags.

Returns

A streamsets.sdk.utils.SeekableList of instances of streamsets.sdk.sch_models.Tag.

time_series_metrics(metric_type, time_filter_condition='LAST_5M', **kwargs)[source]

Get historic time series metrics for the job.

Parameters
  • metric_type (str) – metric type in {‘Record Count Time Series’, ‘Record Throughput Time Series’, ‘Batch Throughput Time Series’, ‘Stage Batch Processing Timer seconds’}.

  • time_filter_condition (str, optional) – Default: 'LAST_5M'.

Returns

An instance of streamsets.sdk.sch_models.JobTimeSeriesMetrics.

class streamsets.sdk.sch_models.Jobs(control_hub)[source]

Collection of streamsets.sdk.sch_models.Job instances.

Parameters

control_hub (streamsets.sdk.sch.ControlHub) – Control Hub object.

count(status)[source]

Get job counts by status.

Parameters

status (str) – Status of the jobs in {‘ACTIVE’, ‘INACTIVE’, ‘ACTIVATING’, ‘DEACTIVATING’, ‘INACTIVE_ERROR’, ‘ACTIVE_GREEN’, ‘ACTIVE_RED’, ‘’}

Returns

An instance of int indicating the count of jobs with specified status.

class streamsets.sdk.sch_models.JobBuilder(job, control_hub)[source]

Class with which to build instances of streamsets.sdk.sch_models.Job.

Instead of instantiating this class directly, most users should use

streamsets.sdk.sch.ControlHub.get_job_builder().

Parameters

job (dict) – Python object built from our Swagger JobJson definition.

build(job_name, pipeline, job_template=False, runtime_parameters=None, pipeline_commit=None, pipeline_tag=None, pipeline_commit_or_tag=None, tags=None)[source]

Build the job.

Parameters
  • job_name (str) – Name of the job.

  • pipeline (streamsets.sdk.sch_models.Pipeline) – Pipeline object.

  • job_template (boolean, optional) – Indicate if it is a Job Template. Default: False.

  • runtime_parameters (dict, optional) – Runtime Parameters for the Job or Job Template. Default: None.

  • pipeline_commit (streamsets.sdk.sch_models.PipelineCommit) – Default: ``None`, which resolves to the latest pipeline commit.

  • pipeline_tag (streamsets.sdk.sch_models.PipelineTag) – Default: ``None`, which resolves to the latest pipeline tag.

  • pipeline_commit_or_tag (str, optional) – Default: None, which resolves to the latest pipeline commit.

  • tags (list, optional) – Job tags. Default: None.

Returns

An instance of streamsets.sdk.sch_models.Job.

class streamsets.sdk.sch_models.JobMetrics(metrics)[source]

Model for job metrics.

error_count

The number of error records generated by this run of the Job.

Type

int

error_records_per_sec

The number of error records per second generated by this run of the Job.

Type

float

input_count

The number of records ingested by this run of the Job.

Type

int

input_records_per_sec

The number of records per second ingested by this run of the Job.

Type

float

output_count

The number of records output by this run of the Job.

Type

int

output_records_per_sec

The number of records output per second by the run of the Job.

Type

float

pipeline_version

The version of the pipeline that was used in this Job run.

Type

str

run_count

The count corresponding to this Job run.

Type

int

sdc_id

The ID of the SDC instance on which this Job run was executed.

Type

str

stage_errors_count

The number of stage error records generated by this run of the Job.

Type

int

stage_error_records_per_sec

The number of stage error records generated per second by this run of the job.

Type

float

total_error_count

The total number of both error records and stage errors generated by this run of the job.

Type

int

Parameters

metrics (dict) – Metrics counts in JSON format.

class streamsets.sdk.sch_models.JobOffset(offset)[source]

Model for offset.

Parameters

offset (dict) – Offset in JSON format.

class streamsets.sdk.sch_models.JobRunEvent(event)[source]

Model for an event in a Job Run.

Parameters

event (dict) – Job Run Event in JSON format.

class streamsets.sdk.sch_models.JobStatus(status, control_hub, **kwargs)[source]

Model for Job Status.

run_history

(streamsets.sdk.utils.JobRunHistoryEvent): History of a particular job run.

Type

streamsets.sdk.utils.SeekableList

offsets

(streamsets.sdk.utils.JobPipelineOffset): Offsets after the job run.

Type

streamsets.sdk.utils.SeekableList

Parameters
class streamsets.sdk.sch_models.JobTimeSeriesMetric(metric, metric_type)[source]

Model for job metrics.

name

Name of measurement.

Type

str

values

Timeseries data.

Type

list

time_series

Timeseries data with timestamp as key and metric value as value.

Type

dict

Parameters

metric (dict) – Metrics in JSON format.

class streamsets.sdk.sch_models.JobTimeSeriesMetrics(metrics, metric_type)[source]

Model for job metrics.

input_records

Appears when queried for ‘Record Count Time Series’ or ‘Record Throughput Time Series’.

Type

streamsets.sdk.sch_models.JobTimeSeriesMetric

output_records

Appears when queried for ‘Record Count Time Series’ or ‘Record Throughput Time Series’.

Type

streamsets.sdk.sch_models.JobTimeSeriesMetric

error_records

Appears when queried for ‘Record Count Time Series’ or ‘Record Throughput Time Series’.

Type

streamsets.sdk.sch_models.JobTimeSeriesMetric

batch_counter

Appears when queried for ‘Batch Throughput Time Series’.

Type

streamsets.sdk.sch_models.JobTimeSeriesMetric

batch_processing_timer

Appears when queried for ‘Batch Processing Timer seconds’.

Type

streamsets.sdk.sch_models.JobTimeSeriesMetric

Parameters

metrics (dict) – Metrics in JSON format.

class streamsets.sdk.sch_models.RuntimeParameters(runtime_parameters, job)[source]

Wrapper for Control Hub job runtime parameters.

Parameters

Organizations

class streamsets.sdk.sch_models.Organization(organization, organization_admin_user=None, api_client=None)[source]

Model for Organization.

Parameters
  • organization (str) – Organization Id.

  • organization_admin_user (str, optional) – Default: None.

  • api_client (streamsets.sdk.sch_api.ApiClient, optional) – Default: None.

class streamsets.sdk.sch_models.Organizations(control_hub)[source]

Collection of streamsets.sdk.sch_models.Organization instances.

class streamsets.sdk.sch_models.OrganizationBuilder(organization, organization_admin_user)[source]

Class with which to build instances of streamsets.sdk.sch_models.Organization.

Instead of instantiating this class directly, most users should use

streamsets.sdk.sch.ControlHub.get_organization_builder().

Parameters

organization (dict) – Python object built from our Swagger UserJson definition.

build(id, name, admin_user_id, admin_user_display_name, admin_user_email_address, admin_user_ldap_user_name=None)[source]

Build the organization.

Parameters
  • id (str) – Organization ID.

  • name (str) – Organization name.

  • admin_user_id (str) – User Id of the admin of this organization.

  • admin_user_display_name (str) – User display name of admin of this organization.

  • admin_user_email_address (str) – User email address of admin of this organization.

  • admin_user_ldap_user_name (str, optional) – LDAP username. Default: None.

Returns

An instance of streamsets.sdk.sch_models.Organization.

Pipelines

class streamsets.sdk.sch_models.Pipeline(pipeline, builder, pipeline_definition, rules_definition, control_hub=None, library_definitions=None)[source]

Model for Pipeline.

Parameters
  • pipeline (dict) – Pipeline in JSON format.

  • builder (streamsets.sdk.sch_models.PipelineBuilder) – Pipeline Builder object.

  • pipeline_definition (dict) – Pipeline Definition in JSON format.

  • rules_definition (dict) – Rules Definition in JSON format.

  • control_hub (streamsets.sdk.sch.ControlHub, optional) – ControlHub object. Default: None.

  • library_definitions (dict, optional) – Library Definition in JSON format. Default: None.

property Labels

Get the pipeline labels. This attribute will be deprecated in a future release. Please use labels instead.

Returns

A list of dict

property acl

Get pipeline ACL.

Returns

An instance of streamsets.sdk.sch_models.ACL.

add_label(*labels)[source]

Add a label

Parameters

*labels – One or more instances of str

property commits

Get commits for this pipeline.

Returns

A streamsets.sdk.utils.SeekableList of instances of streamsets.sdk.sch_models.PipelineCommit.

property configuration

Get pipeline’s configuration.

Returns

An instance of streamsets.sdk.sch_models.Configuration.

property labels

Get the pipeline labels.

Returns

A streamsets.sdk.utils.SeekableList of instances of streamsets.sdk.sch_models.PipelineLabel.

property parameters

Get the pipeline parameters.

Returns

A dict like, streamsets.sdk.sch_models.PipelineParameters object of parameter key-value pairs.

remove_label(*labels)[source]

Remove a label

Parameters

*labels – One or more instances of str

property tags

Get tags for this pipeline.

Returns

A streamsets.sdk.utils.SeekableList of instances of streamsets.sdk.sch_models.PipelineTag.

class streamsets.sdk.sch_models.Pipelines(control_hub, organization)[source]

Collection of streamsets.sdk.sch_models.Pipeline instances.

Parameters
class streamsets.sdk.sch_models.PipelineBuilder(pipeline, data_collector_pipeline_builder, control_hub=None, fragment=False)[source]

Class with which to build instances of streamsets.sdk.sch_models.Pipeline.

Instead of instantiating this class directly, most users should use

streamsets.sdk.sch.ControlHub.get_pipeline_builder().

Parameters
add_stage(label=None, name=None, type=None, library=None)[source]

Add a stage to the pipeline.

When specifying a stage, either label or name must be used. type and library may also be used to select a particular stage if ambiguities exist. If type and/or library are omitted, the first stage definition matching the given label or name will be used.

Parameters
  • label (str, optional) – Stage label to use when selecting stage from definitions. Default: None.

  • name (str, optional) – Stage name to use when selecting stage from definitions. Default: None.

  • type (str, optional) – Stage type to use when selecting stage from definitions (e.g. origin, destination, processor, executor). Default: None.

  • library (str, optional) – Stage library to use when selecting stage from definitions. Default: None.

Returns

An instance of streamsets.sdk.sch_models.SchSdcStage.

build(title='Pipeline', labels=None, **kwargs)[source]

Build the pipeline.

Parameters
  • title (str) – title of the pipeline.

  • labels (list, optional) – List of pipeline labels of type str. Default: None.

Returns

py:class`streamsets.sdk.sch_models.Pipeline`.

Return type

An instance of

class streamsets.sdk.sch_models.PipelineCommit(pipeline_commit, control_hub=None)[source]

Model for pipeline commit.

Parameters
class streamsets.sdk.sch_models.PipelineLabel(pipeline_label)[source]

Model for pipeline label.

Parameters

pipeline_label (dict) – Pipeline label in JSON format.

class streamsets.sdk.sch_models.PipelineLabels(control_hub, organization)[source]

Collection of streamsets.sdk.sch_models.PipelineLabel instances.

Parameters
class streamsets.sdk.sch_models.PipelineParameters(pipeline)[source]

Parameters for pipelines.

Parameters

pipeline (streamsets.sdk.sch_models.Pipeline) – Pipeline Instance.

update(parameters_dict)[source]

Update existing parameters. Works similar to Python dictionary update.

Parameters

parameters_dict (dict) – Dictionary of key-value pairs to be used as parameters.

class streamsets.sdk.sch_models.StPipelineBuilder(pipeline, transformer_pipeline_builder, control_hub=None, fragment=False)[source]

Class with which to build instances of streamsets.sdk.sch_models.Pipeline.

Instead of instantiating this class directly, most users should use

streamsets.sdk.sch.ControlHub.get_pipeline_builder().

Parameters
add_stage(label=None, name=None, type=None, library=None)[source]

Add a stage to the pipeline.

When specifying a stage, either label or name must be used. type and library may also be used to select a particular stage if ambiguities exist. If type and/or library are omitted, the first stage definition matching the given label or name will be used.

Parameters
  • label (str, optional) – Transformer stage label to use when selecting stage from definitions. Default: None.

  • name (str, optional) – Transformer stage name to use when selecting stage from definitions. Default: None.

  • type (str, optional) – Transformer stage type to use when selecting stage from definitions (e.g. origin, destination, processor, executor). Default: None.

  • library (str, optional) – Transformer stage library to use when selecting stage from definitions. Default: None.

Returns

An instance of streamsets.sdk.sch_models.SchStStage.

build(title='Pipeline', **kwargs)[source]

Build the pipeline.

Parameters

title (str) – title of the pipeline.

Returns

py:class`streamsets.sdk.sch_models.Pipeline`.

Return type

An instance of

Protection Methods

class streamsets.sdk.sch_models.ProtectionMethod(stage)[source]

Protection Method Model.

Parameters

stage (dict) – JSON representation of a stage.

class streamsets.sdk.sch_models.ProtectionMethodBuilder(pipeline_builder)[source]

Class with which to build instances of streamsets.sdk.sch_models.ProtectionMethod.

Instead of instantiating this class directly, most users should use

streamsets.sdk.sch.ControlHub.get_protection_method_builder().

Parameters

pipeline_builder (streamsets.sdk.sch_models.PipelineBuilder) – Pipeline Builder object.

Protection Policies

class streamsets.sdk.sch_models.ProtectionPolicy(protection_policy, procedures=None)[source]

Model for Protection Policy.

Parameters
class streamsets.sdk.sch_models.ProtectionPolicies(control_hub)[source]

Collection of streamsets.sdk.sch_models.ProtectionPolicy instances.

class streamsets.sdk.sch_models.ProtectionPolicyBuilder(control_hub, protection_policy, policy_procedure)[source]

Class with which to build instances of streamsets.sdk.sch_models.ProtectionPolicy.

Instead of instantiating this class directly, most users should use

streamsets.sdk.sch.ControlHub.get_protection_policy_builder().

Parameters
  • protection_policy (dict) – Python object defining a protection policy.

  • policy_procedure (dict) – Python object defining a policy procedure.

class streamsets.sdk.sch_models.PolicyProcedure(policy_procedure)[source]

Model for Policy Procedure.

Parameters

policy_procedure (dict) – JSON representation of Policy Procedure.

ProvisioningAgents

class streamsets.sdk.sch_models.Deployment(deployment, control_hub=None)[source]

Model for Deployment.

Parameters
property acl

Get the ACL of a Deployment.

Returns

An instance of streamsets.sdk.sch_models.ACL.

class streamsets.sdk.sch_models.Deployments(control_hub, organization)[source]

Collection of streamsets.sdk.sch_models.Deployment instances.

Parameters
class streamsets.sdk.sch_models.DeploymentBuilder(deployment)[source]

Class with which to build instances of streamsets.sdk.sch_models.Deployment.

Instead of instantiating this class directly, most users should use

streamsets.sdk.sch.ControlHub.get_deployment_builder().

Parameters

deployment (dict) – Python object that represents Deployment JSON.

build(name, provisioning_agent, number_of_data_collector_instances, spec=None, description=None, data_collector_labels=None)[source]

Build the deployment.

Parameters
  • name (str) – Deployment Name.

  • provisioning_agent (streamsets.sdk.sch_models.ProvisioningAgent) – Agent to use.

  • (obj (number_of_data_collector_instances) – int): Number of sdc instances.

  • spec (dict, optional) – Deployment yaml in dictionary format. Will use default yaml used by ui if left out.

  • description (str, optional) – Default: None.

  • data_collector_labels (list, optional) – Default: ['all'].

Returns

An instance of streamsets.sdk.sch_models.Deployment.

class streamsets.sdk.sch_models.ProvisioningAgent(provisioning_agent, control_hub)[source]

Model for Provisioning Agent.

Parameters
  • provisioning_agent (dict) – A Python object representation of Provisioning Agent.

  • control_hub – An instance of streamsets.sdk.sch.ControlHub.

property acl

Get the ACL of a Provisioning Agent.

Returns

An instance of streamsets.sdk.sch_models.ACL.

property deployments

Get the deployments associated with the Provisioning Agent.

Returns

A list of streamsets.sdk.sch_models.Deployment instances.

class streamsets.sdk.sch_models.ProvisioningAgents(control_hub, organization)[source]

Collection of streamsets.sdk.sch_models.ProvisioningAgent instances.

Parameters

Reports

class streamsets.sdk.sch_models.GenerateReportCommand(control_hub, report_defintion, response)[source]

Command to interact with the response from generate_report.

Parameters
  • control_hub (streamsets.sdk.ControlHub) – Control Hub instance.

  • report_definition (dict) – JSON representation of Report Definition.

  • response (dict) – Api response from generating the report.

class streamsets.sdk.sch_models.Report(report, control_hub, report_definition_id)[source]

Model for Report.

Parameters

report (dict) – JSON representation of Report.

download()[source]

Download the Report in PDF format

Returns

An instance of bytes.

class streamsets.sdk.sch_models.Reports(control_hub, report_definition_id)[source]

Collection of streamsets.sdk.sch_models.Report instances.

Parameters
class streamsets.sdk.sch_models.ReportDefinition(report_definition, control_hub)[source]

Model for Report Definition.

Parameters
  • report_definition (dict) – JSON representation of Report Definition.

  • control_hub (streamsets.sdk.ControlHub) – Control Hub instance.

property acl

Get Report Definition ACL.

Returns

An instance of streamsets.sdk.sch_models.ACL.

generate_report()[source]

Generate a Report for Report Definition.

Returns

An instance of streamsets.sdk.sch_models.Report.

property report_resources

Get Report Resources of the Report Definition.

Returns

An instance of streamsets.sdk.sch_models.ReportResources.

property reports

Get Reports of the Report Definition.

Returns

An instance of streamsets.sdk.sch_models.Reports.

class streamsets.sdk.sch_models.ReportDefinitions(control_hub)[source]

Collection of streamsets.sdk.sch_models.ReportDefinition instances.

class streamsets.sdk.sch_models.ReportDefinitionBuilder(report_definition, control_hub)[source]

Class with which to build instances of streamsets.sdk.sch_models.ReportDefinition.

Instead of instantiating this class directly, most users should use

streamsets.sdk.sch.ControlHub.get_report_definition_builder().

Parameters
  • report_definition (dict) – JSON representation of Report Definition.

  • control_hub (streamsets.sdk.ControlHub) – Control Hub instance.

add_report_resource(resource)[source]

Add a given resource to Report Definition resources.

Parameters

resource (streamsets.sdk.sch_models.Job) or (streamsets.sdk.sch_models.Topology) –

build(name, description=None)[source]

Build the report definition.

Parameters
  • name (str) – Name of the Report Definition.

  • description (str, optional) – Description of the Report Definition. Default: None.

Returns

An instance of streamsets.sdk.sch_models.ReportDefinition.

import_report_definition(report_definition)[source]

Import an existing Report Definition to update it.

Parameters

report_definition (streamsets.sdk.sch_models.ReportDefinition) – Report Definition object.

remove_report_resource(resource)[source]

Remove a resource from Report Definition Resources.

Parameters

resource (streamsets.sdk.sch_models.Job) or (streamsets.sdk.sch_models.Topology) –

Returns

A resource of type dict that is removed from Report Definition Resources.

set_data_retrieval_period(start_time, end_time)[source]

Set Time range over which the report will be generated.

Parameters
  • start_time (str) or (int) – Absolute or relative start time for the Report.

  • end_time (str) or (int) – Absolute or relative end time for the Report.

class streamsets.sdk.sch_models.ReportResource(report_resource)[source]

Model for Report Resource.

Parameters

report_resource (dict) – JSON representation of Report Resource.

class streamsets.sdk.sch_models.ReportResources(report_resources, report_definition)[source]

Model for the collection of Report Resources.

Parameters

Scheduler

class streamsets.sdk.sch_models.ScheduledTask(task, control_hub=None)[source]

Model for Scheduled Task.

Parameters
property acl

Get the ACL of a Scheduled Task.

Returns

An instance of streamsets.sdk.sch_models.ACL.

property audits

Get Scheduled Task Audits.

Returns

A streamsets.sdk.utils.SeekableList of inherited instances of streamsets.sdk.sch_models.ScheduledTaskAudit.

property runs

Get Scheduled Task Runs.

Returns

A streamsets.sdk.utils.SeekableList of inherited instances of streamsets.sdk.sch_models.ScheduledTaskRun.

class streamsets.sdk.sch_models.ScheduledTaskAudit(audit)[source]

Scheduled Task Audit.

Parameters

run (dict) – JSON representation of scheduled task audit.

class streamsets.sdk.sch_models.ScheduledTaskBaseModel(data, attributes_to_ignore=None, attributes_to_remap=None, repr_metadata=None)[source]

Base Model for Scheduled Task related classes.

class streamsets.sdk.sch_models.ScheduledTaskBuilder(job_selection_types, control_hub)[source]

Builder for Scheduled Task.

Instead of instantiating this class directly, most users should use

streamsets.sdk.sch.ControlHub.get_scheduled_task_builder().

Parameters
  • job_selection_types (dict) – JSON representation of job selection types.

  • control_hub (streamsets.sdk.ControlHub) – Control Hub instance.

build(task_object, action='START', name=None, description=None, cron_expression='0 0 1/1 * ? *', time_zone='UTC', status='RUNNING', start_time=None, end_time=None, missed_execution_handling='IGNORE')[source]

Builder for Scheduled Task.

Parameters
  • task_object (streamsets.sdk.sch_models.Job) – (streamsets.sdk.sch_models.ReportDefinition): Job or ReportDefinition object.

  • action (str, optional) – One of the {‘START’, ‘STOP’, ‘UPGRADE’} actions. Default: START.

  • name (str, optional) – Name of the task. Default: None.

  • description (str, optional) – Description of the task. Default: None.

  • crontab_mask (str, optional) – Schedule in cron syntax. Default: "0 0 1/1 * ? *". (Daily at 12).

  • time_zone (str, optional) – Time zone. Default: "UTC".

  • status (str, optional) – One of the {‘RUNNING’, ‘PAUSED’} statuses. Default: RUNNING.

  • start_time (str, optional) – Start time of task. Default: None.

  • end_time (str, optional) – End time of task. Default: None.

  • missed_trigger_handling (str, optional) – One of {‘IGNORE’, ‘RUN ALL’, ‘RUN ONCE’}. Default: IGNORE.

Returns

An instance of streamsets.sdk.sch_models.ScheduledTask.

class streamsets.sdk.sch_models.ScheduledTaskRun(run)[source]

Scheduled Task Run.

Parameters

run (dict) – JSON representation if scheduled task run.

class streamsets.sdk.sch_models.ScheduledTasks(control_hub)[source]

Collection of streamsets.sdk.sch_models.ScheduledTask instances.

Subscriptions

class streamsets.sdk.sch_models.Subscription(subscription, control_hub)[source]

Subscription.

Parameters
  • subscription (dict) – JSON representation of Subscription.

  • control_hub (streamsets.sdk.ControlHub) – Control Hub instance.

property acl

Get the ACL of an Event Subscription.

Returns

An instance of streamsets.sdk.sch_models.ACL.

property action

Action of the Subscription.

property events

Events of the Subscription.

class streamsets.sdk.sch_models.Subscriptions(control_hub)[source]

Collection of streamsets.sdk.sch_models.Subscription instances.

Parameters

control_hub (streamsets.sdk.sch.ControlHub) – Control Hub object.

class streamsets.sdk.sch_models.SubscriptionAction(action)[source]

Action to take when the Subscription is triggered.

Parameters

action (dict) – JSON representation of an Action for a Subscription.

class streamsets.sdk.sch_models.SubscriptionAudit(audit)[source]

Model for subscription audit.

Parameters

audit (dict) – JSON representation of a subscription audit.

class streamsets.sdk.sch_models.SubscriptionBuilder(subscription, control_hub)[source]

Builder for Subscription.

Instead of instantiating this class directly, most users should use

streamsets.sdk.sch.ControlHub.get_subscription_builder().

Parameters
  • subscription (dict) – JSON representation of event subscription.

  • control_hub (streamsets.sdk.ControlHub) – Control Hub instance.

add_event(event_type, filter=None)[source]

Add event to the Subscription.

Parameters
  • event_type (str) – Type of event in {‘Job Status Change’, ‘Data SLA Triggered’, ‘Pipeline Committed’, ‘Pipeline Status Change’, ‘Report Generated’, ‘Data Collector not Responding’}.

  • filter (str, optional) – Filter to be applied on event. Default: None.

build(name, description=None)[source]

Builder for Scheduled Task.

Parameters
  • name (str) – Name of Subscription.

  • description (str, optional) – Description of subscription. Default: None.

Returns

An instance of streamsets.sdk.sch_models.Subscription.

import_subscription(subscription)[source]

Import an existing Subscription into the builder to update it.

Parameters

subscription (streamsets.sdk.sch_models.Subscription) – Subscription instance.

remove_event(event_type)[source]

Remove event from the subscription.

Parameters

event_type (str) – Type of event in {‘Job Status Change’, ‘Data SLA Triggered’, ‘Pipeline Committed’, ‘Pipeline Status Change’, ‘Report Generated’, ‘Data Collector not Responding’}.

Returns

An instance of streamsets.sdk.sch_models.SubscriptionEvent.

set_email_action(recipients, subject=None, body=None)[source]

Set the Email action.

Parameters
  • recipients (list) – List of email addresses.

  • subject (str, optional) – Subject of the email. Default: None.

  • body (str, optional) – Body of the email. Default: None.

set_webhook_action(uri, method='GET', content_type=None, payload=None, auth_type=None, username=None, password=None, timeout=30000, headers=None)[source]

Set the Webhook action.

Parameters
  • uri (str) – URI for the Webhook.

  • method (str, optional) – HTTP method to use. Default: 'GET'.

  • content_type (str, optional) – Content Type of the request. Default: None.

  • payload (str, optional) – Payload of the request. Default: None.

  • auth_type (str, optional) – 'basic' or None. Default: None.

  • username (str, optional) – username for the authentication. Default: None.

  • password (str, optional) – password for the authentication. Default: None.

  • timeout (int, optional) – timeout for the Webhook action. Default: 30000.

  • headers (dict, optional) – Headers to be sent to the Webhook call. Default: None.

class streamsets.sdk.sch_models.SubscriptionEvent(event, control_hub)[source]

An Event of a Subscription.

Parameters
  • event (dict) – JSON representation of Events of a Subscription.

  • control_hub (streamsets.sdk.ControlHub) – Control Hub instance.

Topologies

class streamsets.sdk.sch_models.DataSla(data_sla, control_hub)[source]

Model for DataSla.

Parameters
class streamsets.sdk.sch_models.DataSlaBuilder(data_sla, control_hub)[source]

Class with which to build instances of streamsets.sdk.sch_models.DataSla.

Instead of instantiating this class directly, most users should use

streamsets.sdk.sch.ControlHub.get_data_sla_builder().

Parameters
  • data_sla (dict) – Python object built from our Swagger DataSlaJson definition.

  • control_hub (streamsets.sdk.sch.ControlHub) – An instance of the Control Hub.

build(topology, label, job, alert_text, qos_parameter='THROUGHPUT_RATE', function_type='Max', min_max_value=100, enabled=True)[source]

Build the Data Sla.

Parameters
  • topology (streamsets.sdk.sch_models.Topology) – Topology object.

  • label (str) – Label for the SLA.

  • job (list) – List of streamsets.sdk.sch_models.Job objects.

  • alert_text (str) – Alert text.

  • qos_parameter (str, optional) – paramter in {‘THROUGHPUT_RATE’, ‘ERROR_RATE’}. Default: 'THROUGHPUT_RATE'.

  • function_type (str, optional) – paramter in {‘Max’, ‘Min’}. Default: 'Max'.

  • min_max_value (str, optional) – Default: 100.

  • enabled (boolean, optional) – Default: True.

Returns

An instance of streamsets.sdk.sch_models.DataSla.

class streamsets.sdk.sch_models.Topology(topology, control_hub=None)[source]

Model for Topology.

Parameters

topology (dict) – JSON representation of Topology.

commit_id

Pipeline commit id.

Type

str

commit_message

Commit Message.

Type

str

commit_time

Time at which commit was made.

Type

int

committed_by

User that made the commit.

Type

str

default_topology

Default Topology.

Type

bool

description

Topology description.

Type

str

draft

Indicates whether this topology is a draft.

Type

bool

last_modified_by

User that last modified this topology.

Type

str

last_modified_on

Time at which this topology was last modified.

Type

int

new_pipeline_version_available

Whether any job in the topology has a new pipeline version to be updated to.

Type

bool

organization

Id of the organization.

Type

str

parent_version

Version of the parent topology.

Type

str

topology_definition

Definition of the topology.

Type

str

topology_id

Id of the topology.

Type

str

topology_name

Name of the topology.

Type

str

validation_issues

Any validation issues that exist for this Topology.

Type

dict

version

Version of this topology.

Type

str

acknowledge_job_errors()[source]

Acknowledge all errors for the jobs in a topology.

Returns

An instance of streamsets.sdk.sch_api.Command.

property acl

Get the ACL of a Topology.

Returns

An instance of streamsets.sdk.sch_models.ACL.

activate_data_sla(*data_slas)[source]

Activate Data SLAs.

Parameters

*data_slas – One or more instances of streamsets.sdk.sch_models.DataSla.

Returns

An instance of streamsets.sdk.sch_api.Command.

add_data_sla(data_sla)[source]

Add SLA.

Parameters

data_sla (streamsets.sdk.sch_models.DataSla) – Data SLA object.

Returns

An instance of streamsets.sdk.sch_api.Command.

auto_discover_connections()[source]

Auto discover connecting systems between nodes in a Topology.

Returns

An instance of streamsets.sdk.sch_api.Command.

auto_fix()[source]

Auto-fix a topology by rectifying invalid or removed jobs, outdated jobs, etc.

Returns

An instance of streamsets.sdk.sch_api.Command.

deactivate_data_sla(*data_slas)[source]

Deactivate Data SLAs.

Parameters

*data_slas – One or more instances of streamsets.sdk.sch_models.DataSla.

Returns

An instance of streamsets.sdk.sch_api.Command.

delete_data_sla(*data_slas)[source]

Delete Data SLAs.

Parameters

*data_slas – One or more instances of streamsets.sdk.sch_models.DataSla.

Returns

An instance of streamsets.sdk.sch_api.Command.

property jobs

Get the jobs that are contained within the Topology.

Returns

A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.Job instances.

property new_pipeline_version_available

Determine if a new pipeline version is available for any jobs in the Topology.

Returns

A (bool) value.

property nodes

Get the job and system nodes that make up the Topology.

Returns

A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.TopologyNode

instances.

start_all_jobs()[source]

Start all jobs of a topology.

Returns

An instance of streamsets.sdk.sch_api.Command.

stop_all_jobs(force=False)[source]

Stop all jobs of a topology.

Parameters

force (bool, optional) – Force topology jobs to stop. Default: False.

Returns

An instance of streamsets.sdk.sch_api.Command.

update_jobs_to_latest_change()[source]

Upgrade a topology’s job(s) to the latest corresponding pipeline change.

Returns

An instance of streamsets.sdk.sch_api.Command.

property validation_issues

Get any validation issues that are detected for the Topology.

Returns

A (list) of validation issues in JSON format.

class streamsets.sdk.sch_models.Topologies(control_hub)[source]

Collection of streamsets.sdk.sch_models.Topology instances.

Parameters

control_hub (streamsets.sdk.sch.ControlHub) – An instance of the Control Hub.

class streamsets.sdk.sch_models.TopologyBuilder(topology, control_hub=None)[source]

Class with which to build instances of streamsets.sdk.sch_models.Topology.

Instead of instantiating this class directly, most users should use

streamsets.sdk.sch.ControlHub.get_topology_builder().

Parameters
  • topology (dict) – Python object built from our Swagger TopologyJson definition.

  • control_hub (streamsets.sdk.ControlHub, optional) – Control Hub instance. Default: None

add_job(job)[source]

Add a job node to the Topology being built.

Parameters

job (streamsets.sdk.sch_models.Job) – An instance of a job to be added.

add_system(name)[source]

Add a system node to the Topology being built.

Parameters

name (str) – The name of the system to add to the topology.

build(topology_name=None, description=None)[source]

Build the topology.

Parameters
  • topology_name (str, optional) – Name of the topology. This parameter is required when building a new topology.

  • description (str, optional) – Description of the topology. Default: None.

Returns

An instance of streamsets.sdk.sch_models.Topology.

delete_node(topology_node)[source]

Delete a system or job node from the topology.

Parameters

topology_node (streamsets.sdk.sch_models.TopologyNode) – An instance of a TopologyNode to delete from the topology.

import_topology(topology)[source]

Import an existing topology to be used in the builder.

Parameters

topology (streamsets.sdk.sch_models.Topology) – An existing Topology instance to modify.

property topology_nodes

Get all of the nodes currently part of the topology held by the TopologyBuilder.

Returns

A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.TopologyNode

instances.

class streamsets.sdk.sch_models.TopologyNode(topology_node_json)[source]

Model for a node within a Topology.

Parameters

topology_node_json (dict) – JSON representation of a Topology Node.

node_type

The type of this node, i.e. SYSTEM, JOB, etc.

Type

str

instance_name

The name of this node instance.

Type

str

stage_name

The name of the stage in this node.

Type

str

stage_version

The version of the stage in this node.

Type

str

job_id

The ID of the job in this node.

Type

str

pipeline_id

The pipeline ID associated with the job in this node.

Type

str

pipeline_commit_id

The commit ID of the pipeline.

Type

str

pipeline_version

The version of the pipeline.

Type

str

input_lanes

A list of str representing the input lanes for this node.

Type

list

output_lanes

A list of str representing the output lanes for this node.

Type

list

Users

class streamsets.sdk.sch_models.User(user, roles)[source]

Model for User.

Parameters
  • user (dict) – JSON representation of User.

  • roles (dict) – A mapping of role IDs to role labels.

active

Whether the user is active or not.

Type

bool

created_by

Creator of this user.

Type

str

created_on

Creation time of this user.

Type

str

display_name

Display name of this user.

Type

str

email_address

Email address of this user.

Type

str

id

Id of this user.

Type

str

groups

Groups this user belongs to.

Type

list

last_modified_by

User last modified by.

Type

str

last_modified_on

User last modification time.

Type

str

password_expires_on

User’s password expiration time.

Type

str

password_system_generated

Whether User’s password is system generated or not.

Type

bool

roles

A set of role labels.

Type

set

saml_user_name

SAML username of user.

Type

str

class streamsets.sdk.sch_models.Users(control_hub, roles, organization)[source]

Collection of streamsets.sdk.sch_models.User instances.

Parameters
  • control_hub – An instance of streamsets.sdk.sch.ControlHub.

  • roles (dict) – A mapping of role IDs to role labels.

  • organization (str) – Organization ID.

class streamsets.sdk.sch_models.UserBuilder(user, roles, control_hub=None)[source]

Class with which to build instances of streamsets.sdk.sch_models.User.

Instead of instantiating this class directly, most users should use

streamsets.sdk.sch.ControlHub.get_user_builder().

Parameters
  • user (dict) – Python object built from our Swagger UserJson definition.

  • roles (dict) – A mapping of role IDs to role labels.

  • control_hub – An instance of streamsets.sdk.sch.ControlHub. Default: None.

build(id, display_name, email_address, saml_user_name=None, ldap_user_name=None)[source]

Build the user.

Parameters
  • id (str) – User Id.

  • display_name (str) – User display name.

  • email_address (str) – User Email Address.

  • saml_user_name (str, optional) – Default: None.

  • ldap_user_name (str, optional) – Default: None.

Returns

An instance of streamsets.sdk.sch_models.User.

Common

Models used by StreamSets Data Collector and StreamSets Control Hub:

class streamsets.sdk.models.Configuration(configuration=None, property_key='name', property_value='value', **kwargs)[source]

Abstraction for stage configurations.

This class enables easy access to and modification of data stored as a list of dictionaries. As an example, SDC’s pipeline configuration is stored in the form

[ {
  "name" : "executionMode",
  "value" : "STANDALONE"
}, {
  "name" : "deliveryGuarantee",
  "value" : "AT_LEAST_ONCE"
}, ... ]

By implementing simple __getitem__ and __setitem__ methods, this class allows items in this list to be accessed using

configuration['executionMode'] = 'CLUSTER_BATCH'

Instead of the more verbose

for property in configuration:
    if property['name'] == 'executionMode':
        property['value'] = 'CLUSTER_BATCH'
    break
Parameters
  • configuration (str) – List of dictionaries comprising the configuration.

  • property_key (str, optional) – The dictionary entry denoting the property key. Default: name

  • property_value (str, optional) – The dictionary entry denoting the property value. Default: value

get(key, default=None)[source]

Return the value of key or, if not in the configuration, the default value.

items()[source]

Gets the configuration’s items.

Returns

A new view of the configuration’s items ((key, value) pairs).

update(configs)[source]

Update instance with a collection of configurations.

Parameters

configs (dict) – Dictionary of configurations to use.

Exceptions

Common exceptions.

exception streamsets.sdk.exceptions.ActivationError(reason=None)[source]

Activation error.

exception streamsets.sdk.exceptions.BadRequestError(response)[source]

Bad request error (HTTP 400).

exception streamsets.sdk.exceptions.ConnectionError(code, message)[source]

Connection Catalog errors.

exception streamsets.sdk.exceptions.DeploymentInactiveError(message)[source]

Deployment status changed into INACTIVE_ERROR.

exception streamsets.sdk.exceptions.InternalServerError(response)[source]

Internal server error.

exception streamsets.sdk.exceptions.InvalidCredentialsError(message)[source]

Invalid credentials error.

exception streamsets.sdk.exceptions.JobInactiveError(message)[source]

Job status changed into INACTIVE_ERROR.

exception streamsets.sdk.exceptions.JobRunnerError(code, message)[source]

JobRunner errors.

exception streamsets.sdk.exceptions.TopologyIssuesError(issues)[source]

Topology has some issues.

exception streamsets.sdk.exceptions.UnsupportedMethodError(message)[source]

An unsupported method was called.

exception streamsets.sdk.exceptions.ValidationError(issues)[source]

Validation issues.