StreamSets Control Hub

Main interface

This is the main entry point used by users when interacting with SCH instances.

class streamsets.sdk.ControlHub(credential_id, token, use_websocket_tunneling=True, **kwargs)[source]

Class to interact with StreamSets Control Hub.

Parameters
  • credential_id (str) – ControlHub credential ID.

  • token (str) – ControlHub token.

  • use_websocket_tunneling (bool, optional) – use websocket tunneling. Default: True.

acknowledge_event_subscription_error(subscription)[source]

Acknowledge an error on given Event Subscription.

Parameters

subscription (streamsets.sdk.sch_models.Subscription) – A Subscription instance.

Returns

An instance of streamsets.sdk.sch_api.Command.

acknowledge_job_error(*jobs)[source]

Acknowledge errors for one or more jobs.

Parameters

*jobs – One or more instances of streamsets.sdk.sch_models.Job.

acknowledge_legacy_deployment_error(*deployments)[source]

Acknowledge errors for one or more deployments.

Parameters

*deployments – One or more instances of streamsets.sdk.sch_models.LegacyDeployment.

property action_audits

Action Audits.

Returns

An instance of streamsets.sdk.sch_models.ActionAudits.

activate_api_credential(api_credential)[source]

Activate an api credential.

Parameters

api_credential (streamsets.sdk.sch_models.ApiCredential) – ApiCredential object.

Returns

An instance of streamsets.sdk.sch_api.Command.

activate_engine(engine)[source]

Activate an engine.

Parameters

engine (streamsets.sdk.sch_models.Engine) – Engine instance.

activate_environment(*environments, timeout_sec=300)[source]

Activate environments.

Parameters

*environments – One or more instances of streamsets.sdk.sch_models.Environment.

Returns

An instance of streamsets.sdk.sch_api.Command or None.

activate_provisioning_agent(provisioning_agent)[source]

Activate provisioning agent.

Parameters

provisioning_agent (streamets.sdk.sch_models.ProvisioningAgent) –

Returns

An instance of streamsets.sdk.sch_api.Command.

activate_user(*users)[source]

Activate users for all given User IDs.

Parameters

*users – One or more instance of streamsets.sdk.sch_models.User.

Returns

An instance of streamsets.sdk.aster_api.Command.

add_api_credential(api_credential)[source]
Add an api credential. Some api credential attributes are updated by ControlHub such as

created_by.

Parameters

api_credential (streamsets.sdk.sch_models.ApiCredential) – An API credential instance, built via the streamsets.sdk.sch_models.ApiCredentialBuilder.build() method.

Returns

An instance of streamsets.sdk.sch_api.Command.

add_classification_rule(classification_rule, commit=False)[source]

Add a classification rule.

Parameters
add_connection(connection)[source]

Add a connection.

Parameters

connection (streamsets.sdk.sch_models.Connection) – Connection object.

Returns

An instance of streamsets.sdk.sch_api.Command.

add_deployment(deployment)[source]

Add a deployment.

Parameters

deployment (streamsets.sdk.sch_models.Deployment) – Deployment object.

Returns

An instance of streamsets.sdk.sch_api.Command.

add_environment(environment)[source]

Add an environment.

Parameters

environment (streamsets.sdk.sch_models.Environment) – Environment object.

Returns

An instance of streamsets.sdk.sch_api.Command.

add_group(group)[source]

Add a group.

Parameters

group (streamsets.sdk.sch_models.Group) – Group object.

Returns

An instance of streamsets.sdk.sch_api.Command.

add_job(job)[source]

Add a job.

Parameters

job (streamsets.sdk.sch_models.Job) – Job object.

Returns

An instance of streamsets.sdk.sch_models.Job.

add_legacy_deployment(deployment)[source]

Add a legacy deployment.

Parameters

deployment (streamsets.sdk.sch_models.LegacyDeployment) – Deployment object.

Returns

An instance of streamsets.sdk.sch_api.Command.

add_organization(organization)[source]

Add an organization.

Parameters

organization (streamsets.sdk.sch_models.Organization) – Organization object.

Returns

An instance of streamsets.sdk.sch_api.Command.

add_protection_policy(protection_policy)[source]

Add a protection policy.

Parameters

protection_policy (streamsets.sdk.sch_models.ProtectionPolicy) – Protection Policy object.

add_report_definition(report_definition)[source]

Add Report Definition to ControlHub.

Parameters

report_definition (streamsets.sdk.sch_models.ReportDefinition) – Report Definition instance.

Returns

An instance of streamsets.sdk.sch_api.Command.

add_subscription(subscription)[source]

Add Subscription to ControlHub.

Parameters

subscription (streamsets.sdk.sch_models.Subscription) – A Subscription instance.

Returns

An instance of streamsets.sdk.sch_api.Command.

property alerts

Alerts.

Returns

A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.Alert.

property api_credentials

Api Credentials.

Returns

A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.ApiCredential.

assign_administrator_role(user)[source]

Designate the specified user as an Organization Administrator.

Parameters

user (streamsets.sdk.sch_models.User) – User object.

Returns

An instance of streamsets.sdk.aster_api.Command.

balance_engines(*engines)[source]

Balance all jobs running on given Engine.

Parameters

*engines – One or more instances of streamsets.sdk.sch_models.Engine.

balance_job(*jobs)[source]

Balance one or more jobs.

Parameters

*jobs – One or more instances of streamsets.sdk.sch_models.Job.

property connection_audits

Connection Audits.

Returns

An instance of streamsets.sdk.sch_models.ConnectionAudits.

property connection_tags

Connection Tags.

Returns

An instance of streamsets.sdk.sch_models.ConnectionTags.

property connections

Connections.

Returns

An instance of streamsets.sdk.sch_models.Connections.

create_components(component_type, number_of_components=1, active=True)[source]

Create components.

Parameters
  • component_type (str) – Component type.

  • number_of_components (int, optional) – Default: 1.

  • active (bool, optional) – Default: True.

Returns

An instance of streamsets.sdk.sch_api.CreateComponentsCommand.

property data_collectors

Data Collectors registered to the ControlHub instance.

Returns

Returns a streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.DataCollector instances.

property data_protector_enabled

Whether Data Protector is enabled for the current organization.

Type

bool

deactivate_api_credential(api_credential)[source]

Deactivate an api credential.

Parameters

api_credential (streamsets.sdk.sch_models.ApiCredential) – ApiCredential object.

Returns

An instance of streamsets.sdk.sch_api.Command.

deactivate_engine(engine)[source]

Deactivate an engine.

Parameters

engine (streamsets.sdk.sch_models.Engine) – Engine instance.

deactivate_environment(*environments)[source]

Deactivate environments.

Parameters

*environments – One or more instances of streamsets.sdk.sch_models.Environment.

Returns

An instance of streamsets.sdk.sch_api.Command.

deactivate_provisioning_agent(provisioning_agent)[source]

Deactivate provisioning agent.

Parameters

provisioning_agent (streamets.sdk.sch_models.ProvisioningAgent) –

Returns

An instance of streamsets.sdk.sch_api.Command.

deactivate_user(*users)[source]

Deactivate Users for all given User IDs.

Parameters

*users – One or more instances of streamsets.sdk.sch_models.User.

Returns

An instance of streamsets.sdk.aster_api.Command.

delete_and_unregister_engine(engine)[source]

Delete and Unregister an engine.

Parameters

engine (streamsets.sdk.sch_models.Engine) – Engine instance.

delete_api_credentials(*api_credentials)[source]

Delete api_credentials.

Parameters

*api_credentials – One or more instances of streamsets.sdk.sch_models.ApiCredential.

delete_connection(*connections)[source]

Delete connections.

Parameters

*connections – One or more instances of streamsets.sdk.sch_models.Connection.

delete_deployment(*deployments)[source]

Delete deployments.

Parameters

*deployments – One or more instances of streamsets.sdk.sch_models.Deployment.

delete_engine(engine)[source]

Delete an engine.

Parameters

engine (streamsets.sdk.sch_models.Engine) – Engine instance.

delete_environment(*environments)[source]

Delete environments.

Parameters

*environments – One or more instances of streamsets.sdk.sch_models.Environment.

Returns

An instance of streamsets.sdk.sch_api.Command.

delete_group(*groups)[source]

Delete groups.

Parameters

*groups – One or more instances of streamsets.sdk.sch_models.Group.

Returns

An instance of streamsets.sdk.sch_api.Command.

delete_job(*jobs)[source]

Delete one or more jobs.

Parameters

*jobs – One or more instances of streamsets.sdk.sch_models.Job.

delete_legacy_deployment(*deployments)[source]

Delete deployments.

Parameters

*deployments – One or more instances of streamsets.sdk.sch_models.LegacyDeployment.

Returns

An instance of streamsets.sdk.sch_api.Command.

delete_pipeline(pipeline, only_selected_version=False)[source]

Delete a pipeline.

Parameters
Returns

An instance of streamsets.sdk.sch_api.Command.

delete_pipeline_labels(*pipeline_labels)[source]

Delete pipeline labels.

Parameters

*pipeline_labels – One or more instances of streamsets.sdk.sch_models.PipelineLabel.

Returns

An instance of streamsets.sdk.sch_api.Command.

delete_provisioning_agent(provisioning_agent)[source]

Delete provisioning agent.

Parameters

provisioning_agent (streamets.sdk.sch_models.ProvisioningAgent) –

Returns

An instance of streamsets.sdk.sch_api.Command.

delete_provisioning_agent_token(provisioning_agent)[source]

Delete provisioning agent token.

Parameters

provisioning_agent (streamets.sdk.sch_models.ProvisioningAgent) –

Returns

An instance of streamsets.sdk.sch_api.Command.

delete_report_definition(report_definition)[source]

Delete an existing Report Definition.

Parameters

report_definition (streamsets.sdk.sch_models.ReportDefinition) – Report Definition instance.

Returns

An instance of streamsets.sdk.sch_api.Command.

delete_snowflake_pipeline_defaults()[source]

Delete the Snowflake pipeline defaults for this user.

Returns

An instance of streamsets.sdk.sch_api.Command.

delete_snowflake_user_credentials()[source]

Delete the Snowflake user credentials.

Returns

An instance of streamsets.sdk.sch_api.Command.

delete_subscription(subscription)[source]

Delete an exisiting Subscription.

Parameters

subscription (streamsets.sdk.sch_models.Subscription) – A Subscription instance.

Returns

An instance of streamsets.sdk.sch_api.Command.

delete_topology(topology, only_selected_version=False)[source]

Delete a topology.

Parameters
Returns

An instance of streamsets.sdk.sch_api.Command.

delete_unregistered_auth_tokens(executor_type)[source]

Delete auth tokens for engines that have been unregistered.

Parameters

executor_type (str) – The executor type. Acceptable values are ‘DATACOLLECTOR’ or ‘TRANSFORMER’.

Returns

An instance of streamsets.sdk.sch_api.Command.

delete_user(*users, deactivate=False)[source]

Delete users. Deactivate users before deleting if configured.

Parameters
Returns

An instance of streamsets.sdk.aster_api.Command.

property deployments

Deployments.

Returns

An instance of streamsets.sdk.sch_models.Deployments.

duplicate_job(job, name=None, description=None, number_of_copies=1)[source]

Duplicate an existing job.

Parameters
  • job (streamsets.sdk.sch_models.Job) – Job object.

  • name (str, optional) – Name of the new job(s). Default: None. If not specified, name of the job with ' copy' appended to the end will be used.

  • description (str, optional) – Description for new job(s). Default: None.

  • number_of_copies (int, optional) – Number of copies. Default: 1.

Returns

A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.Job.

duplicate_pipeline(pipeline, name=None, description='New Pipeline', number_of_copies=1)[source]

Duplicate an existing pipeline.

Parameters
  • pipeline (streamsets.sdk.sch_models.Pipeline) – Pipeline object.

  • name (str, optional) – Name of the new pipeline(s). Default: None.

  • description (str, optional) – Description for new pipeline(s). Default: 'New Pipeline'.

  • number_of_copies (int, optional) – Number of copies. Default: 1.

Returns

A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.Pipeline.

edit_job(job)[source]

Edit a job.

Parameters

job (streamsets.sdk.sch_models.Job) – Job object.

Returns

An instance of streamsets.sdk.sch_models.Job.

property engine_configurations

Deployment engine configurations.

Returns

An instance of streamsets.sdk.sch_models.DeploymentEngineConfiguration.

property environments

environments.

Returns

An instance of streamsets.sdk.sch_models.Environments.

export_jobs(jobs)[source]

Export jobs to a compressed archive.

Parameters

jobs (list) – A list of streamsets.sdk.sch_models.Job instances.

Returns

An instance of type bytes indicating the content of zip file with job json files.

export_pipelines(pipelines, fragments=False, include_plain_text_credentials=False)[source]

Export pipelines.

Parameters
  • pipelines (list) – A list of streamsets.sdk.sch_models.Pipeline instances.

  • fragments (bool) – Indicates if exporting fragments is needed.

  • include_plain_text_credentials (bool) – Indicates if plain text credentials should be included.

Returns

An instance of type bytes indicating the content of zip file with pipeline json files.

export_protection_policies(protection_policies)[source]

Export protection policies to a compressed archive.

Parameters
Returns

An instance of type bytes indicating the content of zip file with protection policy json files.

export_topologies(topologies)[source]

Export topologies.

Parameters

topologies (list) – A list of streamsets.sdk.sch_models.Topology instances.

Returns

An instance of type bytes indicating the content of zip file with pipeline json files.

get_admin_tool(base_url, username, password)[source]

Get ControlHub admin tool.

Returns

An instance of streamsets.sdk.sch_models.AdminTool.

get_api_credential_builder()[source]

Get api credential Builder.

Returns

An instance of streamsets.sdk.sch_models.ApiCredentialBuilder.

get_classification_rule_builder()[source]

Get a classification rule builder instance with which a classification rule can be created.

Returns

An instance of streamsets.sdk.sch_models.ClassificationRuleBuilder.

get_components(component_type_id, offset=None, len_=None, order_by='LAST_VALIDATED_ON', order='ASC')[source]

Get components.

Parameters
  • component_type_id (str) – Component type id.

  • offset (str, optional) – Default: None.

  • len (str, optional) – Default: None.

  • order_by (str, optional) – Default: 'LAST_VALIDATED_ON'.

  • order (str, optional) – Default: 'ASC'.

get_connection_builder()[source]

Get a connection builder instance with which a connection can be created.

Returns

An instance of streamsets.sdk.sch_models.ConnectionBuilder.

get_current_job_status(job)[source]

Returns the current job status for given job id.

Parameters

job (streamsets.sdk.sch_models.Job) – Job object.

get_data_sla_builder()[source]

Get a Data SLA builder instance with which a Data SLA can be created.

Returns

An instance of streamsets.sdk.sch_models.DataSlaBuilder.

get_deployment_builder(deployment_type='SELF')[source]

Get a deployment builder instance with which a deployment can be created.

Parameters

( (deployment_type) – obj: str, optional) Valid values are ‘AWS’, ‘GCP’ and ‘SELF’. Default 'SELF'

Returns

An instance of streamsets.sdk.sch_models.DeploymentBuilder.

get_engine_labels(engine)[source]

Returns all labels assigned to an engine.

Parameters

engine (streamsets.sdk.sch_models.Engine) – Engine instance.

Returns

A list of engine assigned labels.

get_environment_builder(environment_type='SELF')[source]

Get an environment builder instance with which an environment can be created.

Parameters

( (environment_type) – obj: str, optional) Valid values are ‘AWS’, ‘GCP’ and ‘SELF’. Default 'SELF'

Returns

An instance of streamsets.sdk.sch_models.EnvironmentBuilder.

get_group_builder()[source]

Get a group builder instance with which a group can be created.

Returns

An instance of streamsets.sdk.sch_models.GroupBuilder.

get_job_builder()[source]

Get a job builder instance with which a job can be created.

Returns

An instance of streamsets.sdk.sch_models.JobBuilder.

get_legacy_deployment_builder()[source]

Get a deployment builder instance with which a legacy deployment can be created.

Returns

An instance of streamsets.sdk.sch_models.LegacyDeploymentBuilder.

get_organization_builder()[source]

Get an organization builder instance with which an organization can be created.

Returns

An instance of streamsets.sdk.sch_models.OrganizationBuilder.

get_pipeline_builder(engine_type, engine_id=None, fragment=False)[source]

Get a pipeline builder instance with which a pipeline can be created.

Parameters
  • engine_type (str) – The type of pipeline that will be created. The options are 'data_collector', 'snowflake' or 'transformer'.

  • engine_id (str) – The ID of the Data Collector or Transformer in which to author the pipeline if not using Transformer for Snowflake.

  • fragment (boolean, optional) – Specify if a fragment builder. Default: False.

Returns

An instance of streamsets.sdk.sch_models.PipelineBuilder or streamsets.sdk.sch_models.StPipelineBuilder.

get_protection_method_builder()[source]

Get a protection method builder instance with which a protection method can be created.

Returns

An instance of streamsets.sdk.sch_models.ProtectionMethodBuilder.

get_protection_policy_builder()[source]

Get a protection policy builder instance with which a protection policy can be created.

Returns

An instance of streamsets.sdk.sch_models.ProtectionPolicyBuilder.

get_scheduled_task_builder()[source]

Get a scheduled task builder instance with which a scheduled task can be created.

Returns

An instance of streamsets.sdk.sch_models.ScheduledTaskBuilder.

get_self_managed_deployment_install_script(deployment, install_mechanism='DEFAULT')[source]

Get install script for a Self Managed deployment.

Parameters
  • deployment (streamsets.sdk.sch_models.Deployment) – deployment object.

  • install_mechanism (str, optional) – Possible values for install are “DEFAULT”, “BACKGROUND” and “FOREGROUND”. Default: DEFAULT

Returns

An instance of streamsets.sdk.sch_api.Command.

get_snowflake_pipeline_defaults()[source]

Get the Snowflake pipeline defaults for this user (if it exists).

Returns

A dict of Snowflake pipeline defaults.

get_snowflake_user_credentials()[source]

Get the Snowflake user credentials (if they exist). They will be redacted.

Returns

A dict of Snowflake user credentials (redacted).

get_subscription_builder()[source]

Get Event Subscription Builder.

Returns

An instance of streamsets.sdk.sch_models.SubscriptionBuilder.

get_topology_builder()[source]

Get a topology builder instance with which a topology can be created.

Returns

An instance of streamsets.sdk.sch_models.TopologyBuilder.

get_user_builder()[source]

Get a user builder instance with which a user can be created.

Returns

An instance of streamsets.sdk.sch_models.UserBuilder.

property groups

Groups.

Returns

An instance of streamsets.sdk.sch_models.Groups.

import_jobs(archive, pipeline=True, number_of_instances=False, labels=False, runtime_parameters=False, **kwargs)[source]

Import jobs from archived zip directory.

Parameters
  • archive (file) – file containing the jobs.

  • pipeline (boolean, optional) – Indicate if pipeline should be imported. Default: True.

  • number_of_instances (boolean, optional) – Indicate if number of instances should be imported. Default: False.

  • labels (boolean, optional) – Indicate if labels should be imported. Default: False.

  • runtime_parameters (boolean, optional) – Indicate if runtime parameters should be imported. Default: False.

Returns

A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.Job.

import_pipeline(pipeline, commit_message, name=None, data_collector_instance=None)[source]

Import pipeline from json file.

Parameters
  • pipeline (dict) – A python dict representation of ControlHub Pipeline.

  • commit_message (str) – Commit message.

  • name (str, optional) – Name of the pipeline. If left out, pipeline name from JSON object will be used. Default None.

  • data_collector_instance (streamsets.sdk.sch_models.DataCollector) – If excluded, system sdc will be used. Default None.

Returns

An instance of streamsets.sdk.sch_models.Pipeline.

import_pipelines_from_archive(archive, commit_message, fragments=False)[source]

Import pipelines from archived zip directory.

Parameters
  • archive (file) – file containing the pipelines.

  • commit_message (str) – Commit message.

  • fragments (bool, optional) – Indicates if pipeline contains fragments.

Returns

A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.Pipeline.

import_protection_policies(policies_archive)[source]

Import protection policies from a compressed archive.

Parameters

policies_archive (file) – file containing the protection policies.

Returns

A py:class:streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.ProtectionPolicy.

import_topologies(archive, import_number_of_instances=False, import_labels=False, import_runtime_parameters=False, **kwargs)[source]

Import topologies from archived zip directory.

Parameters
  • archive (file) – file containing the topologies.

  • import_number_of_instances (boolean, optional) – Indicate if number of instances should be imported. Default: False.

  • import_labels (boolean, optional) – Indicate if labels should be imported. Default: False.

  • import_runtime_parameters (boolean, optional) – Indicate if runtime parameters should be imported. Default: False.

Returns

A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.Topology.

invite_user(user)[source]

Invite a user to the DataOps Platform.

Parameters

user (streamsets.sdk.sch_models.User) – User object.

Returns

An instance of streamsets.sdk.sch_api.Command.

property jobs

Jobs.

Returns

An instance of streamsets.sdk.sch_models.Jobs.

property ldap_enabled

Indication if LDAP is enabled or not.

Returns

An instance of boolean.

property legacy_deployments

Deployments.

Returns

An instance of streamsets.sdk.sch_models.LegacyDeployments.

property login_audits

Login Audits.

Returns

An instance of streamsets.sdk.sch_models.LoginAudits.

property metering_and_usage

Metering and usage for the Organization. By default, this will return a report for the last 30 days of metering data. A report for a custom time window can be retrieved by indexing this object with a slice that contains a datetime object for the start (required) and stop (optional, defaults to datetime.now()).

ex. metering_and_usage[datetime(2022, 1, 1):datetime(2022, 1, 14)]

metering_and_usage[datetime.now() - timedelta(7):]

Returns

An instance of streamsets.sdk.sch_models.MeteringUsage

property organizations

Organizations.

Returns

An instance of streamsets.sdk.sch_models.Organizations.

property pipeline_labels

Pipeline labels.

Returns

An instance of streamsets.sdk.sch_models.PipelineLabels.

property pipelines

Pipelines.

Returns

An instance of streamsets.sdk.sch_models.Pipelines.

preview_classification_rule(classification_rule, parameter_data, data_collector=None)[source]

Dynamic preview of a classification rule.

Parameters
Returns

An instance of streamsets.sdk.sdc_api.PreviewCommand.

property protection_policies

Protection policies.

Returns

An instance of streamsets.sdk.sch_models.ProtectionPolicies.

property provisioning_agents

Provisioning Agents registered to the ControlHub instance.

Returns

An instance of streamsets.sdk.sch_models.ProvisioningAgents.

publish_pipeline(pipeline, commit_message='New pipeline', draft=False)[source]

Publish a pipeline.

Parameters
  • pipeline (streamsets.sdk.sch_models.Pipeline) – Pipeline object.

  • commit_message (str, optional) – Default: 'New pipeline'.

  • draft (boolean, optional) – Default: False.

publish_scheduled_task(task)[source]

Send the scheduled task to ControlHub.

Parameters

task (streamsets.sdk.sch_models.ScheduledTask) – Scheduled task object.

Returns

An instance of streamsets.sdk.sch_api.Command.

publish_topology(topology, commit_message=None)[source]

Publish a topology.

Parameters
  • topology (streamsets.sdk.sch_models.Topology) – Topology object to publish.

  • commit_message (str, optional) – Commit message to supply with the Topology. Default: None

Returns

An instance of streamsets.sdk.sch_api.Command.

regenerate_api_credential_auth_token(api_credential)[source]

Regenerate the auth token for an api credential.

Parameters

api_credential (streamsets.sdk.sch_models.ApiCredential) – ApiCredential object.

Returns

An instance of streamsets.sdk.sch_api.Command.

remove_administrator_role(user)[source]

Remove the Organization Administrator status from the specified user.

Parameters

user (streamsets.sdk.sch_models.User) – User object.

Returns

An instance of streamsets.sdk.aster_api.Command.

rename_api_credential(api_credential)[source]

Rename an api credential.

Parameters

api_credential (streamsets.sdk.sch_models.ApiCredential) – ApiCredential object.

Returns

An instance of streamsets.sdk.sch_api.Command.

property report_definitions

Report Definitions.

Returns

An instance of streamsets.sdk.sch_models.ReportDefinitions.

reset_origin(*jobs)[source]

Reset all pipeline offsets for given jobs.

Parameters

*jobs – One or more instances of streamsets.sdk.sch_models.Job.

Returns

A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.Job.

restart_engines(*engines)[source]

Restart the given engine(s).

Parameters

*engines – One or more instances of streamsets.sdk.sch_models.DataCollector or streamsets.sdk.sch_models.Transformer.

Returns

An instance of streamsets.sdk.sch_api.Command.

run_pipeline_preview(pipeline, batches=1, batch_size=10, skip_targets=True, skip_lifecycle_events=True, timeout=120000, test_origin=False, read_policy=None, write_policy=None, executor=None, **kwargs)[source]

Run pipeline preview.

Parameters
  • pipeline (streamsets.sdk.sch_models.Pipeline) – Pipeline object.

  • batches (int, optional) – Number of batches. Default: 1.

  • batch_size (int, optional) – Batch size. Default: 10.

  • skip_targets (bool, optional) – Skip targets. Default: True.

  • skip_lifecycle_events (bool, optional) – Skip life cycle events. Default: True.

  • timeout (int, optional) – Timeout. Default: 120000.

  • test_origin (bool, optional) – Test origin. Default: False.

  • read_policy (streamsets.sdk.sch_models.ProtectionPolicy) – Read Policy for preview. If not provided, uses default read policy if one available. Default: None.

  • write_policy (streamsets.sdk.sch_models.ProtectionPolicy) – Write Policy for preview. If not provided, uses default write policy if one available. Default: None.

  • executor (streamsets.sdk.sch_models.DataCollector, optional) – The Data Collector in which to preview the pipeline. If omitted, ControlHub’s first executor SDC will be used. Default: None.

Returns

An instance of streamsets.sdk.sdc_api.PreviewCommand.

run_snowflake_pipeline_preview(pipeline_id, rev=0, batches=1, batch_size=10, skip_targets=True, end_stage=None, timeout=2000, test_origin=False, stage_outputs_to_override_json=None, **kwargs)[source]

Run Snowflake pipeline preview.

Parameters
  • pipeline_id (str) – The pipeline instance’s ID.

  • rev (int, optional) – Pipeline revision. Default: 0.

  • batches (int, optional) – Number of batches. Default: 1.

  • batch_size (int, optional) – Batch size. Default: 10.

  • skip_targets (bool, optional) – Skip targets. Default: True.

  • end_stage (str, optional) – End stage. Default: None.

  • timeout (int, optional) – Timeout. Default: 2000.

  • test_origin (bool, optional) – Test origin. Default: False

  • stage_outputs_to_override_json (str, optional) – Stage outputs to override. Default: None.

  • wait (bool, optional) – Wait for pipeline preview to finish. Default: True.

Returns

An instance of streamsets.sdk.sdc_api.PreviewCommand.

scale_legacy_deployment(deployment, num_instances)[source]

Scale up/down active deployment.

Parameters
  • deployment (streamsets.sdk.sch_models.LegacyDeployment) – Deployment object.

  • num_instances (int) – Number of sdc instances.

Returns

An instance of streamsets.sdk.sch_api.Command.

property scheduled_tasks

Scheduled Tasks.

Returns

An instance of streamsets.sdk.sch_models.ScheduledTasks.

set_default_read_protection_policy(protection_policy)[source]

Set a default read protection policy.

Parameters

protection_policy

:param (streamsets.sdk.sch_models.ProtectionPolicy): Protection :param Policy object to be set as the default read policy.:

Returns

An updated instance of streamsets.sdk.sch_models.ProtectionPolicy.

set_default_write_protection_policy(protection_policy)[source]

Set a default write protection policy.

Parameters

protection_policy

:param (streamsets.sdk.sch_models.ProtectionPolicy): Protection :param Policy object to be set as the default write policy.:

Returns

An updated instance of streamsets.sdk.sch_models.ProtectionPolicy.

shutdown_engines(*engines)[source]

Call shutdown on the given engine(s).

Parameters

*engines – One or more instances of streamsets.sdk.sch_models.DataCollector or streamsets.sdk.sch_models.Transformer.

Returns

An instance of streamsets.sdk.sch_api.Command.

start_deployment(*deployments)[source]

Start Deployments.

Parameters

*deployments – One or more instances of streamsets.sdk.sch_models.Deployment.

start_job(*jobs, wait=True, **kwargs)[source]

Start one or more jobs.

Parameters
  • *jobs – One or more instances of streamsets.sdk.sch_models.Job.

  • wait (bool, optional) – Wait for pipelines to reach RUNNING status before returning. Default: True.

  • **kwargs – Arbitrary keyword arguments.

Returns

An instance of streamsets.sdk.sch_api.StartJobsCommand.

start_job_template(job_template, name=None, description=None, attach_to_template=True, delete_after_completion=False, instance_name_suffix='COUNTER', number_of_instances=1, parameter_name=None, raw_job_tags=None, runtime_parameters=None, wait_for_data_collectors=False)[source]

Start Job instances from a Job Template.

Parameters
  • job_template (streamsets.sdk.sch_models.Job) – A Job instance with the property job_template set to True.

  • name (str, optional) – Name of the new job(s). Default: None. If not specified, name of the job template with ' copy' appended to the end will be used.

  • description (str, optional) – Description for new job(s). Default: None.

  • attach_to_template (bool, optional) – Default: True.

  • delete_after_completion (bool, optional) – Default: False.

  • instance_name_suffix (str, optional) – Suffix to be used for Job names in {‘COUNTER’, ‘TIME_STAMP’, ‘PARAM_VALUE’}. Default: COUNTER.

  • number_of_instances (int, optional) – Number of instances to be started using given parameters. Default: 1.

  • parameter_name (str, optional) – Specified when instance_name_suffix is ‘PARAM_VALUE’. Default: None.

  • raw_job_tags (list, optional) – Default: None.

  • runtime_parameters (dict) or (list) – Runtime Parameters to be used in the jobs. If a dict is specified, number_of_instances jobs will be started. If a list is specified, number_of_instances is ignored and job instances will be started using the elements of the list as Runtime Parameters for each job. If left out, Runtime Parameters from Job Template will be used. Default: None.

  • wait_for_data_collectors (bool, optional) – Default: False.

Returns

A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.Job instances.

start_legacy_deployment(deployment, **kwargs)[source]

Start Deployment.

Parameters
  • deployment (streamsets.sdk.sch_models.LegacyDeployment) – Deployment instance.

  • wait (bool, optional) – Wait for deployment to start. Default: True.

  • wait_for_statuses (list, optional) – Deployment statuses to wait on. Default: ['ACTIVE'].

Returns

An instance of streamsets.sdk.sch_api.DeploymentStartStopCommand.

stop_deployment(*deployments)[source]

Stop Deployments.

Parameters

*deployments – One or more instances of streamsets.sdk.sch_models.Deployment.

stop_job(*jobs, force=False, timeout_sec=300)[source]

Stop one or more jobs.

Parameters
  • *jobs – One or more instances of streamsets.sdk.sch_models.Job.

  • force (bool, optional) – Force job to stop. Default: False.

  • timeout_sec (int, optional) – Timeout in secs. Default: 300.

stop_legacy_deployment(deployment, wait_for_statuses=['INACTIVE'])[source]

Stop Deployment.

Parameters
  • deployment (streamsets.sdk.sch_models.LegacyDeployment) – Deployment instance.

  • wait_for_statuses (list, optional) – List of statuses to wait for. Default: ['INACTIVE'].

Returns

An instance of streamsets.sdk.sch_api.DeploymentStartStopCommand.

stop_test_pipeline_run(start_pipeline_command)[source]

Stop the test run of pipeline.

Parameters

start_pipeline_command (streamsets.sdk.sdc_api.StartPipelineCommand) –

Returns

An instance of streamsets.sdk.sdc_api.StopPipelineCommand.

property subscription_audits

Subscription audits.

Returns

A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.SubscriptionAudit.

property subscriptions

Event Subscriptions.

Returns

An instance of streamsets.sdk.sch_models.Subscriptions.

sync_job(*jobs)[source]

Sync one or more jobs.

Parameters

*jobs – One or more instances of streamsets.sdk.sch_models.Job.

test_pipeline_run(pipeline, reset_origin=False, parameters=None)[source]

Test run a pipeline.

Parameters
  • pipeline (streamsets.sdk.sch_models.Pipeline) – Pipeline object.

  • reset_origin (boolean, optional) – Default: False.

  • parameters (dict, optional) – Pipeline parameters. Default: None.

Returns

An instance of streamsets.sdk.sdc_api.StartPipelineCommand.

property topologies

Topologies.

Returns

An instance of streamsets.sdk.sch_models.Topologies.

property transformers

Transformers registered to the ControlHub instance.

Returns

Returns a streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.Transformer instances.

update_connection(connection)[source]

Update a connection.

Parameters

connection (streamsets.sdk.sch_models.Connection) – Connection object.

Returns

An instance of streamsets.sdk.sch_api.Command.

update_deployment(deployment)[source]

Update a deployment.

Parameters

deployment (streamsets.sdk.sch_models.Deployment) – deployment object.

Returns

An instance of streamsets.sdk.sch_api.Command.

update_engine_labels(engine)[source]

Update an engine’s labels.

Parameters

engine (streamsets.sdk.sch_models.Engine) – Engine instance.

Returns

An instance of streamsets.sdk.sch_models.Engine.

update_engine_resource_thresholds(engine, max_cpu_load=None, max_memory_used=None, max_pipelines_running=None)[source]

Updates engine resource thresholds.

Parameters
  • engine (streamsets.sdk.sch_models.Engine) – Engine instance.

  • max_cpu_load (float, optional) – Max CPU load in percentage. Default: None.

  • max_memory_used (int, optional) – Max memory used in MB. Default: None.

  • max_pipelines_running (int, optional) – Max pipelines running. Default: None.

Returns

An instance of streamsets.sdk.sch_api.Command.

update_environment(environment, timeout_sec=300)[source]

Update an environment.

Parameters

environment (streamsets.sdk.sch_models.Environment) – environment object.

Returns

An instance of streamsets.sdk.sch_api.Command.

update_group(group)[source]

Update a group.

Parameters

group (streamsets.sdk.sch_models.Group) – Group object.

Returns

An instance of streamsets.sdk.sch_api.Command.

update_job(job)[source]

Update a job.

Parameters

job (streamsets.sdk.sch_models.Job) – Job object.

Returns

An instance of streamsets.sdk.sch_models.Job.

update_legacy_deployment(deployment)[source]

Update a deployment.

Parameters

deployment (streamsets.sdk.sch_models.LegacyDeployment) – Deployment object.

Returns

An instance of streamsets.sdk.sch_api.Command.

update_organization(organization)[source]

Update an organization.

Parameters

organization (streamsets.sdk.sch_models.Organization) – Organization instance.

Returns

An instance of streamsets.sdk.sch_api.Command.

update_pipelines_with_different_fragment_version(pipelines, from_fragment_version, to_fragment_version)[source]

Update pipelines with latest pipeline fragment commit version.

Parameters
Returns

An instance of streamsets.sdk.sch_api.Command

update_report_definition(report_definition)[source]

Update an existing Report Definition.

Parameters

report_definition (streamsets.sdk.sch_models.ReportDefinition) – Report Definition instance.

Returns

An instance of streamsets.sdk.sch_api.Command.

update_snowflake_pipeline_defaults(account_url=None, database=None, warehouse=None, schema=None, role=None)[source]

Create or update the Snowflake pipeline defaults for this user.

Parameters
  • account_url (str, optional) – Snowflake account url. Default: None

  • database (str, optional) – Snowflake database to query against. Default: None

  • warehouse (str, optional) – Snowflake warehouse. Default: None

  • schema (str, optional) – Schema used. Default: None

  • role (str, optional) – Role used. Default: None

Returns

An instance of streamsets.sdk.sch_api.Command.

update_snowflake_user_credentials(username, snowflake_login_type, password=None, private_key=None, role=None)[source]

Create or update the Snowflake user credentials.

Parameters
  • username (str) – Snowflake account username.

  • snowflake_login_type (str) – Snowflake login type to use. Options are password and privateKey.

  • password (str, optional) – Snowflake account password. Default: None

  • private_key (str, optional) – Snowflake account private key. Default: None

  • role (str, optional) – Snowflake role of the account. Default: None

Returns

An instance of streamsets.sdk.sch_api.Command.

update_subscription(subscription)[source]

Update an existing Subscription.

Parameters

subscription (streamsets.sdk.sch_models.Subscription) – A Subscription instance.

Returns

An instance of streamsets.sdk.sch_api.Command.

update_user(user)[source]
Update a user. Some user attributes are updated by ControlHub such as

last_modified_by, last_modified_on.

Parameters

user (streamsets.sdk.sch_models.User) – User object.

Returns

An instance of streamsets.sdk.sch_api.Command.

upgrade_job(*jobs)[source]

Upgrade job(s) to latest pipeline version.

Parameters

*jobs – One or more instances of streamsets.sdk.sch_models.Job.

Returns

A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.Job.

upload_offset(job, offset_file=None, offset_json=None)[source]

Upload offset for given job.

Parameters
  • job (streamsets.sdk.sch_models.Job) – Job object.

  • offset_file (file, optional) – File containing the offsets. Default: None. Exactly one of offset_file, offset_json should specified.

  • offset_json (dict, optional) – Contents of offset. Default: None. Exactly one of offset_file, offset_json should specified.

Returns

An instance of streamsets.sdk.sch_models.Job.

property users

Users.

Returns

An instance of streamsets.sdk.sch_models.Users.

validate_pipeline(pipeline)[source]

Validate pipeline. Only Transformer for Snowflake pipelines are supported at this time.

Parameters

pipeline (streamsets.sdk.sch_models.Pipeline) – Pipeline instance.

Raises

streamsets.sdk.exceptions.ValidationError – If validation fails.

verify_connection(connection)[source]

Verify connection.

Parameters

connection (streamsets.sdk.sch_models.Connection) – Connection object.

Returns

An instance of streamsets.sdk.sch_models.ConnectionVerificationResult.

wait_for_job_metric(job, metric, value, timeout_sec=200)[source]

Block until a job’s realtime summary reaches the desired value for the desired metric.

Parameters
  • job (streamsets.sdk.sch_models.Job) – The job instance.

  • metric (str) – The desired metric (e.g. 'output_record_count' or 'input_record_count').

  • value (int) – The desired value to wait for.

  • timeout (int, optional) – Timeout to wait for metric to reach value, in seconds. Default: streamsets.sdk.sch.DEFAULT_WAIT_FOR_METRIC_TIMEOUT.

Raises

TimeoutError – If timeout passes without metric reaching value.

wait_for_job_metrics_record_count(job, count, timeout_sec=200)[source]

Block until a job’s metrics reaches the desired count.

Parameters
  • job (streamsets.sdk.sch_models.Job) – The job instance.

  • count (int) – The desired value to wait for.

  • timeout_sec (int, optional) – Timeout to wait for metric to reach count, in seconds. Default: streamsets.sdk.sch.DEFAULT_WAIT_FOR_METRIC_TIMEOUT.

Raises

TimeoutError – If timeout_sec passes without metric reaching value.

wait_for_job_status(job, status, timeout_sec=300)[source]

Block until a job reaches the desired status.

Parameters
  • job (streamsets.sdk.sch_models.Job) – The job instance.

  • status (str) – The desired status to wait for.

  • timeout_sec (int, optional) – Timeout to wait for job to reach status, in seconds. Default: streamsets.sdk.sch.DEFAULT_WAIT_FOR_STATUS_TIMEOUT.

Raises

TimeoutError – If timeout_sec passes without job reaching status.

Models

These models wrap and provide useful functionality for interacting with common SCH abstractions.

ACLs

class streamsets.sdk.sch_models.ACL(acl, control_hub)[source]

Represents an ACL.

Parameters
  • acl (dict) – JSON representation of an ACL.

  • control_hub (streamsets.sdk.sch.ControlHub) – ControlHub object.

permissions

A Collection of Permissions.

Type

streamsets.sdk.sch_models.Permissions

add_permission(permission)[source]

Add new permission to the ACL.

Parameters

permission (streamsets.sdk.sch_models.Permission) – A permission object.

Returns

An instance of streamsets.sdk.sch_api.Command

property permission_builder

Get a permission builder instance with which a pipeline can be created.

Returns

An instance of streamsets.sdk.sch_models.ACLPermissionBuilder.

remove_permission(permission)[source]

Remove a permission from ACL.

Parameters

permission (streamsets.sdk.sch_models.Permission) – A permission object.

Returns

An instance of streamsets.sdk.sch_api.Command

class streamsets.sdk.sch_models.ACLPermissionBuilder(permission, acl)[source]

Class to help build the ACL permission.

Parameters
build(subject_id, subject_type, actions)[source]

Method to help build the ACL permission.

Parameters
  • subject_id (str) – Id of the subject e.g. ‘test@test’.

  • subject_type (str) – Type of the subject e.g. ‘USER’.

  • actions (list) – A list of actions of type str e.g. [‘READ’, ‘WRITE’, ‘EXECUTE’].

Returns

An instance of streamsets.sdk.sch_models.Permission.

class streamsets.sdk.sch_models.Permission(permission, resource_type, api_client)[source]

A container for a permission.

Parameters
  • permission (dict) – A Python object representation of a permission.

  • resource_type (str) – String representing the type of resource e.g. ‘JOB’, ‘PIPELINE’.

  • api_client (streamsets.sdk.sch_api.ApiClient) – An instance of ApiClient.

resource_id

Id of the resource e.g. Pipeline or Job.

Type

str

subject_id

Id of the subject e.g. user id 'admin@admin'.

Type

str

subject_type

Type of the subject e.g. 'USER'.

Type

str

last_modified_by

User who last modified this permission e.g. 'admin@admin'.

Type

str

last_modified_on

Timestamp at which this permission was last modified e.g. 1550785079811.

Type

int

Alerts

class streamsets.sdk.sch_models.Alert(alert, control_hub)[source]

Model for Alerts.

message

The Alert’s message text.

Type

str

alert_status

The status of the Alert.

Type

str

Parameters
acknowledge()[source]

Acknowledge an active Alert.

delete()[source]

Delete an acknowledged Alert.

Api Credentials

class streamsets.sdk.sch_models.ApiCredential(api_credential, control_hub=None)[source]

Model for ApiCredential.

Parameters
  • api_credential (dict) – A Python object representation of ApiCredential.

  • control_hub (streamsets.sdk.sch.ControlHub) – ControlHub object. Default: None

active
Type

bool

auth_token
Type

str

credential_id
Type

str

name
Type

str

created_by

User that created this ApiCredential.

Type

int

class streamsets.sdk.sch_models.ApiCredentials(control_hub)[source]

Collection of streamsets.sdk.sch_models.ApiCredential instances.

Parameters

control_hub (streamsets.sdk.sch.ControlHub) – ControlHub object.

class streamsets.sdk.sch_models.ApiCredentialBuilder(api_credential, control_hub)[source]

Class with which to build instances of streamsets.sdk.sch_models.ApiCredential.

Instead of instantiating this class directly, most users should use

streamsets.sdk.sch.ControlHub.get_api_credential_builder().

Parameters
  • api_credential (dict) – Python object built from our Swagger ApiCredentialJson definition.

  • control_hub (streamsets.sdk.sch.ControlHub) – ControlHub object.

build(name)[source]

Build the ApiCredential.

Parameters

name (str) – ApiCredential name.

Returns

An instance of streamsets.sdk.sch_models.ApiCredential.

Classifiers

class streamsets.sdk.sch_models.Classifier(classifier)[source]

Classifier model.

Parameters

classifier (dict) – A Python dict representation of classifier.

Classification Rules

class streamsets.sdk.sch_models.ClassificationRule(classification_rule, classifiers)[source]

Classification Rule Model.

Parameters
class streamsets.sdk.sch_models.ClassificationRuleBuilder(classification_rule, classifier)[source]

Class with which to build instances of streamsets.sdk.sch_models.ClassificationRule.

Instead of instantiating this class directly, most users should use

streamsets.sdk.sch.ControlHub.get_classification_rule_builder().

Parameters
  • classification_rule (dict) – Python object defining a classification rule.

  • classifier (dict) – Python object defining a classifier.

add_classifier(patterns=None, match_with=None, regular_expression_type='RE2/J', case_sensitive=False)[source]

Add classifier to the classification rule.

Parameters
  • patterns (list, optional) – List of strings of patterns. Default: None.

  • match_with (str, optional) – Default: None.

  • regular_expression_type (str, optional) – Default: 'RE2/J'.

  • case_sensitive (bool, optional) – Default: False.

Returns

An instance of streamsets.sdk.sch_models.Classifier.

build(name, category, score)[source]

Build the classification rule.

Parameters
  • name (str) – Classification Rule name.

  • category (str) – Classification Rule category.

  • score (float) – Classification Rule score.

Returns

An instance of streamsets.sdk.sch_models.ClassificationRule.

Connections

class streamsets.sdk.sch_models.Connection(connection, control_hub)[source]

Model for connection.

Parameters
  • connection (dict) – A Python object representation of Connection.

  • control_hub (streamsets.sdk.sch.ControlHub) – ControlHub object.

property acl

Get Connection ACL.

Returns

An instance of streamsets.sdk.sch_models.ACL.

add_tag(*tags)[source]

Add a tag

Parameters

*tags – One or more instances of str

property pipeline_commits

Get the pipeline commits using this connection.

Returns

A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.PipelineCommit

instances.

remove_tag(*tags)[source]

Remove a tag

Parameters

*tags – One or more instances of str

property tags

Get the connection tags.

Returns

A streamsets.sdk.utils.SeekableList of instances of streamsets.sdk.sch_models.Tag.

class streamsets.sdk.sch_models.Connections(control_hub, organization)[source]

Collection of streamsets.sdk.sch_models.Connection instances.

Parameters
  • control_hub – An instance of streamsets.sdk.sch.ControlHub.

  • organization (str) – Organization Id.

class streamsets.sdk.sch_models.ConnectionBuilder(connection, control_hub)[source]

Class with which to build instances of streamsets.sdk.sch_models.Connection.

Instead of instantiating this class directly, most users should use

streamsets.sdk.sch.ControlHub.get_connection_builder().

Parameters
  • connection (dict) – Python object built from our Swagger ConnectionJson definition.

  • control_hub (streamsets.sdk.sch.ControlHub) – ControlHub object.

build(title, connection_type, authoring_data_collector, tags=None)[source]

Define the connection.

Parameters
  • title (str) – Connection title.

  • connecion_type (str) – Type of connection.

  • authoring_data_collector (streamsets.sdk.sch.DataCollector) – Authoring Data Collector.

  • tags (list, optional) – List of tags (strings). Default: None.

Returns

An instance of streamsets.sdk.sch_models.Connection.

class streamsets.sdk.sch_models.ConnectionVerificationResult(connection_preview_json)[source]

Model for connection verification result.

Parameters

connection_preview_json (dict) – dynamic preview API response JSON

property issue_count

The count of the number of issues for the connection verification result.

Returns

A int that represents the number of issues.

property issue_message

The message provided for the connection verification result.

Returns

A str message detailing the response for the connection verification result.

DataCollectors

class streamsets.sdk.sch_models.DataCollector(data_collector, control_hub)[source]

Model for Data Collector.

execution_mode

True for Edge and False for SDC.

Type

bool

id

Data Collectort id.

Type

str

labels

Labels for Data Collector.

Type

list

last_validated_on

Last validated time for Data Collector.

Type

str

reported_labels

Reported labels for Data Collector.

Type

list

url

Data Collector’s url.

Type

str

version

Data Collector’s version.

Type

str

property accessible

Returns a bool for whether the Data Collector instance is accessible.

property acl

Get DataCollector ACL.

Returns

An instance of streamsets.sdk.sch_models.ACL.

property attributes

Returns a dict of Data Collector attributes.

property pipelines_committed

ControlHub Job IDs that are about to be started but have no corresponding pipeline status yet.

Returns

A list of Job IDs (str objects).

property resource_thresholds

Return DataCollector resource thresholds.

Returns

A dict of DataCollector thresholds named as “max_memory_used”, “max_cpu_load”

and “max_pipelines_running”

property responding

Returns a bool for whether the Data Collector instance is responding.

Deployments

class streamsets.sdk.sch_models.AzureVMDeployment(deployment, control_hub=None, **kwargs)[source]

Model for Azure VM Deployment.

attach_public_ip
Type

bool

azure_tags

Azure tags to apply to all provisioned resources in this Azure deployment.

Type

dict

key_pair_name

SSH key pair.

Type

str

managed_identity
Type

str

public_ssh_key
Type

str

resource_group
Type

str

ssh_key_source

SSH key source.

Type

str

vm_size
Type

str

zones
Type

list

static get_ui_value_mappings()[source]

This method returns a map for the values shown in the UI and internal constants to be used.

is_complete()[source]

Checks if all required fields are set in this deployment.

class streamsets.sdk.sch_models.Deployment(deployment, control_hub=None, **kwargs)[source]

Model for Deployment. This is an abstract class.

created_by

User that created this deployment.

Type

str

created_on

Time at which this deployment was created.

Type

int

deployment_id

Id of the deployment.

Type

str

deployment_name

Name of the deployment.

Type

str

deployment_tags

Raw deployment tags of the deployment.

Type

list

deployment_type

Type of the deployment.

Type

str

desired_instances
Type

int

engine_configuration
Type

DeploymentEngineConfiguration

engine_type
Type

str

engine_version
Type

str

environment_id

Enabled environment where engine will be deployed.

Type

list

last_modified_by

User that last modified this deployment.

Type

str

last_modified_on

Time at which this deployment was last modified.

Type

int

network_tags
Type

list

scala_binary_version

scala Binary Version.

Type

str

organization
Type

str

json_state

State of the deployment - this is json field called state.

Type

str

state

State of the deployment - displayed as state on UI.

Type

str

status

Status of the deployment.

Type

str

status_detail

Status detail of the deployment.

Type

str

class ENGINE_TYPES(value)[source]

An enumeration.

class TYPES(value)[source]

An enumeration.

property acl

Get deployment ACL.

Returns

An instance of streamsets.sdk.sch_models.ACL.

property deployment_events

Get the events for a deployment.

Returns

An instance of streamsets.sdk.sch_models.DeploymentEvents

property engine_configuration

The engine configuration of deployment engine. :returns: An instance of streamsets.sdk.sch_models.DeploymentEngineConfiguration.

property registered_engines

Get the registered engines.

Returns

An instance of streamsets.sdk.sch_models.RegisteredEngines

property tags

Get the tags.

Returns

A streamsets.sdk.utils.SeekableList of str instances.

class streamsets.sdk.sch_models.DeploymentBuilder(deployment, control_hub)[source]

Class with which to build instances of streamsets.sdk.sch_models.Deployment.

Instead of instantiating this class directly, most users should use

streamsets.sdk.sch.ControlHub.get_deployment_builder().

Parameters
  • deployment (dict) – Python object built from our Swagger CspDeploymentJson definition.

  • control_hub (streamsets.sdk.sch.ControlHub) – ControlHub object.

build(deployment_name, engine_type, engine_version, environment, **kwargs)[source]

Define the deployment.

Parameters
  • deployment_name (str) – deployment name.

  • engine_type (str) – Type of engine to deploy.

  • engine_version (str) – Version of engine to deploy.

  • environment (str) – ID of the enabled environment where the engine will be deployed.

  • deployment_tags (list, optional) – List of tags (strings). Default: None.

  • engine_build (str, optional) – Build of engine to deploy. Default: None.

  • scala_binary_version (str, optional) – Scala binary version required in case of engine_type=’TF’ Default: None.

Returns

An instance of subclass of streamsets.sdk.sch_models.Deployment.

class streamsets.sdk.sch_models.DeploymentEngineAdvancedConfiguration(deployment_engine_advanced_config, deployment=None)[source]

Model for advanced configuration in Deployment engine configuration.

class streamsets.sdk.sch_models.DeploymentEngineConfiguration(deployment_engine_config, deployment=None, control_hub=None)[source]

Model for Deployment engine configuration.

advanced_configuration
Type

str

aws_image_id
Type

str

azure_image_id
Type

str

created_by

User that created this deployment.

Type

str

created_on

Time at which this deployment was created.

Type

int

download_url
Type

list

engine_type

engine type.

Type

list

engine_version

Engine version to deploy.

Type

list

id

User that created this deployment.

Type

str

gcp_image_id
Type

str

last_modified_by

User that last modified this deployment.

Type

str

last_modified_on

Time at which this deployment was last modified.

Type

int

scala_binary_version

scala Binary Version.

Type

str

class streamsets.sdk.sch_models.DeploymentEngineConfigurations(control_hub)[source]

Collection of streamsets.sdk.sch_models.DeploymentEngineConfiguration instances.

Parameters

control_hub – An instance of streamsets.sdk.sch.ControlHub.

class streamsets.sdk.sch_models.DeploymentEngineJavaConfiguration(deployment_engine_java_configuration, deployment=None)[source]

Model for Deployment engine Java configuration.

id
Type

str

java_options
Type

str

java_memory_strategy (

object:ABSOLUTE/PERCENTAGE)

maximum_java_heap_size_in_mb
Type

long

minimum_java_heap_size_in_mb
Type

long

maximum_java_heap_size_in_percent
Type

int

minimum_java_heap_size_in_percent
Type

int

class streamsets.sdk.sch_models.DeploymentStageLibraries(values, deployment_config)[source]

Wrapper class for the list of stage libraries in a deployment.

Parameters
append(object)None append object to end[source]
extend(iterable)None extend list by appending elements from the iterable[source]
remove(value)None remove first occurrence of value.[source]

Raises ValueError if the value is not present.

class streamsets.sdk.sch_models.EC2Deployment(deployment, control_hub=None, **kwargs)[source]

Model for EC2 Deployment.

ec2_instance_type

Type EC2 engine instance.

Type

str

engine_instances

No. of EC2 engine instances.

Type

int

instance_profile

arn of instance profile created as an environment prerequisite.

Type

str

key_pair_name

SSH key pair.

Type

str

aws_tags

AWS tags to apply to all provisioned resources in this AWS deployment.

Type

dict

ssh_key_source

SSH key source.

Type

str

tracking_url
Type

str

is_complete()[source]

Checks if all required fields are set in this deployment.

class streamsets.sdk.sch_models.GCEDeployment(deployment, control_hub=None, **kwargs)[source]

Model for GCE Deployment.

block_project_ssh_keys

Blocks the use of project-wide public SSH keys to access the provisioned VM instances.

Type

bool

engine_instances

Number of engine instances to deploy.

Type

int

gcp_labels

GCP labels to apply to all provisioned resources in your GCP project.

Type

dict

instance_service_account

Instance service account.

Type

str

machine_type

machine type to use for the provisioned VM instances.

Type

str

public_ssh_key

full contents of public SSH key associated with each VM instance.

Type

str

region

Region to provision VM instances in.

Type

str

tracking_url

Tracking URL.

Type

str

zone

GCP zone

Type

list

is_complete()[source]

Checks if all required fields are set in this deployment.

class streamsets.sdk.sch_models.SelfManagedDeployment(deployment, control_hub=None, **kwargs)[source]

Model for self managed Deployment.

install_script()[source]

Gets the installation script to be run for this deployment.

Returns

An str instance of the installation command.

is_complete()[source]

Checks if all required fields are set in this deployment.

Engines

class streamsets.sdk.sch_models.Engine(engine, control_hub=None)[source]

Model for DataOps Engines.

id

ID of the engine.

Type

str

organization

Organization that the engine belongs to.

Type

str

engine_url

URL registered for the engine.

Type

str

version

Version of the engine.

Type

str

labels

Labels for the engine.

Type

list of str instances

last_reported_time

The last time the engine contacted DataOps Platform.

Type

str

start_up_time

The time the engine was started.

Type

str

edge

Whether this is an Edge engine or not.

Type

bool

cpu_load

The percent utilization of the CPU by the engine.

Type

float

memory_used

The amount of memory used by the engine in MB.

Type

float

total_memory

The total amount of memory configured for the engine in MB.

Type

float

running_pipelines

The total number of pipelines running on the engine.

Type

int

responding

Whether the engine is responding or not.

Type

bool

engine_type

The type of engine.

Type

str

max_cpu_load

The percentage limit on CPU load configured for this engine.

Type

int

max_memory_used

The percentage limit on memory configured for this engine.

Type

int

max_pipelines_running

The limit on the number of concurrent pipelines for this engine.

Type

int

deployment_id

The ID of the deployment that the engine belongs to.

Type

str

add_external_libraries(stage_library, *libraries)[source]

Add external libraries to the engine.

Parameters
  • stage_library (str) – The stage library that includes the stage requiring the external library.

  • *libraries – One or more file instances to add to an engine, in binary format.

add_resources(*resources)[source]

Add resource files to the engine.

Parameters

*resources – One or more file instances to add to an engine, in binary format.

delete_external_libraries(*libraries)[source]

Delete external libraries from the engine.

Parameters

*libraries – One or more streamsets.sdk.sch_models.ExternalLibrary instances to delete from the engine.

delete_resources(*resources)[source]

Delete resource files on the engine.

Parameters

*resources – One or more streamsets.sdk.sch_models.ExternalResource instances to delete from the engine.

property external_libraries

Get the external libraries for the engine.

Returns

A list of streamsets.sdk.sch_models.ExternalLibrary instances.

get_logs(ending_offset=- 1)[source]

Retrieve the logs for the engine.

Parameters

ending_offset (int, optional) – The offset to capture logs up until. Default: -1

Returns

A list of dict instances, one dictionary per log line.

get_thread_dump()[source]

Generate a thread dump for the engine.

Returns

A list of dict instances, one dictionary per thread.

refresh()[source]

Retrieve the latest state of the Engine from DataOps Platform, and update the in-memory representation.

property resources

Get the external resources for the engine.

Returns

A list of streamsets.sdk.sch_models.ExternalResource instances.

class streamsets.sdk.sch_models.Engines(control_hub)[source]

Collection of streamsets.sdk.sch_models.Engine instances.

Parameters

control_hub (streamsets.sdk.sch.ControlHub) – ControlHub object.

Environments

class streamsets.sdk.sch_models.AWSEnvironment(environment, control_hub=None, **kwargs)[source]

Model for AWS Environment.

access_key_id

AWS access key id to access AWS account. Required when credential_type is AWS_STATIC_KEYS.

Type

str

aws_tags

AWS tags to apply to all provisioned resources in this AWS deployment.

Type

dict

credentials

Credentials of the environment.

Type

dict

default_instance_profile

aarn of default instance profile created as an environment prerequisite.

Type

dict

region

AWS region where VPC is located in case of AWS environment.

Type

str

role_arn

Role AR of the cross-account role created as an environment prerequisite in user’s AWS account. Required when credential_type is AWS_CROSS_ACCOUNT_ROLE.

Type

str

secret_access_key_id

AWS secret access key to access AWS account. Required when credential_type is AWS_STATIC_KEYS.

Type

str

subnet_ids

Range of Subnet IDs in the VPC.

Type

str

security_group_id

Security group ID.

Type

str

vpc_id

Id of the Amazon VPC created as an environment prerequisite in user’s AWS account.

Type

str

static get_ui_value_mappings()[source]

This method returns a map for the values shown in the UI and internal constants to be used.

is_complete()[source]

Checks if all required fields are set in this environment.

class streamsets.sdk.sch_models.AzureEnvironment(environment, control_hub=None, **kwargs)[source]

Model for Azure Environment.

azure_tags

Azure tags for this environment.

Type

dict

credential_type

Credential type of the environment.

Type

str

credentials

Credentials of the environment.

Type

dict

default_managed_identity

default managed identity.

Type

str

default_resource_group

default resource group.

Type

str

region

Azure region where vnet is located.

Type

str

subnet_id

Subnet ID in the vnet.

Type

str

security_group_id

Security group ID.

Type

str

vnet_id

Id of the vnet.

Type

str

static get_ui_value_mappings()[source]

This method returns a map for the values shown in the UI and internal constants to be used.

is_complete()[source]

Checks if all required fields are set in this environment.

class streamsets.sdk.sch_models.Environment(environment, control_hub=None, **kwargs)[source]

Model for Environment. This is an abstract class.

allow_nightly_engine_builds
Type

bool

created_by

User that created this environment.

Type

str

created_on

Time at which this environment was created.

Type

int

environment_id

Id of the environment.

Type

str

environment_name

Name of the environment.

Type

str

environment_tags

Raw environment tags of the environment.

Type

list

environment_type

Type of the environment.

Type

str

last_modified_by

User that last modified this environment.

Type

str

last_modified_on

Time at which this environment was last modified.

Type

int

json_state

State of the environment - which is stored in json field called state.

Type

str

state

State of the environment stateDisplayLabel - shown on the UI as state.

Type

str

status

Status of the environment.

Type

str

status_detail

Status detail of the environment.

Type

str

class TYPES(value)[source]

An enumeration.

property acl

Get environment ACL.

Returns

An instance of streamsets.sdk.sch_models.ACL.

property tags

Get the tags.

Returns

A streamsets.sdk.utils.SeekableList of str instances.

class streamsets.sdk.sch_models.EnvironmentBuilder(environment, control_hub)[source]

Class with which to build instances of streamsets.sdk.sch_models.Environment.

Instead of instantiating this class directly, most users should use

streamsets.sdk.sch.ControlHub.get_environment_builder().

Parameters
  • environment (dict) – Python object built from our Swagger CspEnvironmentJson definition.

  • control_hub (streamsets.sdk.sch.ControlHub) – ControlHub object.

build(environment_name, **kwargs)[source]

Define the environment.

Parameters
  • environment_name (str) – environment name.

  • allow_nightly_engine_builds (bool, optional) – Default: False.

  • environment_tags (list, optional) – List of tags (strings). Default: None.

Returns

An instance of subclass of streamsets.sdk.sch_models.Environment.

class streamsets.sdk.sch_models.GCPEnvironment(environment, control_hub=None, **kwargs)[source]

Model for GCP Environment.

credentials

Credentials of the environment.

Type

dict

account_key_json

JSON file contents for the Service Account Key. Required when credential_type is GCP_SERVICE_ACCOUNT_KEY.

Type

str

project

Project where VPC is located.

Type

str

gcp_labels

GCP labels for this environment.

Type

dict

gcp_tags

GCP tags to apply to all resources provisioned in GCP account.

Type

str

service_account_email

Service account ID. Required when credential_type is GCP_SERVICE_ACCOUNT_IMPERSONATION.

Type

str

vpc_id

Id of the GCP VPC created as an environment prerequisite in user’s GCP account.

Type

str

fetch_available_machine_types(zone_id)[source]

Returns the available machine types for the given Environment and GCP Project and GCP Zone.

fetch_available_projects()[source]

Returns the available projects for the given Environment.

fetch_available_regions()[source]

Returns the available regions for the given Environment and GCP Project.

fetch_available_service_accounts()[source]

Returns the available service accounts for the given Environment and GCP Project.

fetch_available_vpcs()[source]

Returns the available Networks for the given Environment and GCP Project. Here it is called vpcs since on UI it is shown as VPC.

fetch_available_zones(region_id)[source]

Returns the available zones for the given environment and GCP project and GCP region.

static get_ui_value_mappings()[source]

This method returns a map for the values shown in the UI and internal constants to be used.

is_complete()[source]

Checks if all required fields are set in this environment.

class streamsets.sdk.sch_models.SelfManagedEnvironment(environment, control_hub=None, **kwargs)[source]

Model for Self Managed Environment.

Transformers

class streamsets.sdk.sch_models.Transformer(transformer, control_hub)[source]

Model for Transformer.

execution_mode
Type

str

id

Transformer id.

Type

str

labels

Labels for Transformer.

Type

list

last_validated_on

Last validated time for Transformer.

Type

str

reported_labels

Reported labels for Transformer.

Type

list

url

Transformer’s url.

Type

str

version

Transformer’s version.

Type

str

property accessible

Returns a bool for whether the Transformer instance is accessible.

property acl

Get Transformer ACL.

Returns

An instance of streamsets.sdk.sch_models.ACL.

property attributes

Returns a dict of Transformer attributes.

Groups

class streamsets.sdk.sch_models.Group(group, roles, control_hub=None)[source]

Model for Group.

Parameters
  • group (dict) – A Python object representation of Group.

  • roles (dict) – A mapping of role IDs to role labels.

  • control_hub (streamsets.sdk.ControlHub, optional) – An instance of Control Hub. Default: None

created_by

Creator of this group.

Type

str

created_on

Creation time of this group.

Type

str

display_name

Display name of this group.

Type

str

group_id

ID of this group.

Type

str

last_modified_by

Group last modified by.

Type

str

last_modified_on

Group last modification time.

Type

str

organization

Organization this group belongs to.

Type

str

roles

A set of role labels.

Type

set

users

Users that are a member of this group.

Type

streamsets.sdk.util.SeekableList of streamsets.sdk.sch_models.User instances

class streamsets.sdk.sch_models.Groups(control_hub, roles, organization)[source]

Collection of streamsets.sdk.sch_models.Group instances.

Parameters
  • control_hub – An instance of streamsets.sdk.sch.ControlHub.

  • roles (dict) – A mapping of role IDs to role labels.

  • organization (str) – Organization ID.

class streamsets.sdk.sch_models.GroupBuilder(group, roles, control_hub=None)[source]

Class with which to build instances of streamsets.sdk.sch_models.Group.

Instead of instantiating this class directly, most users should use

streamsets.sdk.sch.ControlHub.get_group_builder().

Parameters
  • group (dict) – Python object built from our Swagger GroupJson definition.

  • roles (dict) – A mapping of role IDs to role labels.

  • control_hub (streamsets.sdk.sch.ControlHub, optional) – ControlHub object. Default: None

build(display_name, group_id=None, ldap_groups=None)[source]

Build the group.

Parameters
  • display_name (str) – Group display name.

  • group_id (str, optional) – Group ID. Can only include letters, numbers, underscores and periods. Default: None.

  • ldap_groups (list, optional) – List of LDAP groups (strings).

Returns

An instance of streamsets.sdk.sch_models.Group.

Jobs

class streamsets.sdk.sch_models.Job(job, control_hub=None)[source]

Model for Job.

archived

Flag that indicates if this job is archived.

Type

bool

commit_id

Pipeline commit id.

Type

str

commit_label

Pipeline commit label.

Type

str

created_by

User that created this job.

Type

str

created_on

Time at which this job was created.

Type

int

data_collector_labels

Labels of the data collectors.

Type

list

delete_after_completion

Flag that indicates if this job should be deleted after completion.

Type

bool

description

Job description.

Type

str

destroyer

Job destroyer.

Type

str

enable_failover

Flag that indicates if failover is enabled.

Type

bool

enable_time_series_analysis

Flag that indicates if time series is enabled.

Type

bool

execution_mode

True for Edge and False for SDC.

Type

bool

job_deleted

Flag that indicates if this job is deleted.

Type

bool

job_id

Id of the job.

Type

str

job_name

Name of the job.

Type

str

last_modified_by

User that last modified this job.

Type

str

last_modified_on

Time at which this job was last modified.

Type

int

number_of_instances

Number of instances.

Type

int

pipeline_force_stop_timeout

Timeout for Pipeline force stop.

Type

int

pipeline_id

Id of the pipeline that is running the job.

Type

str

pipeline_name

Name of the pipeline that is running the job.

Type

str

pipeline_rule_id

Rule Id of the pipeline that is running the job.

Type

str

read_policy

Read Policy of the job.

Type

streamsets.sdk.sch_models.ProtectionPolicy

runtime_parameters

Run-time parameters of the job.

Type

str

static_parameters

List of parameters wjat cannot be overriden.

Type

list

statistics_refresh_interval_in_millisecs

Refresh interval for statistics in milliseconds.

Type

int

status

Status of the job.

Type

string

template_run_history_list

List of job template run history.

Type

list

write_policy

Write Policy of the job.

Type

streamsets.sdk.sch_models.ProtectionPolicy

property acl

Get job ACL.

Returns

An instance of streamsets.sdk.sch_models.ACL.

add_tag(*tags)[source]

Add a tag

Parameters

*tags – One or more instances of str

property commit

Get pipeline commit of the job.

Returns

An instance of streamsets.sdk.sch_models.PipelineCommit.

property committed_offsets

Get the committed offsets for a given job id.

Returns

An instance of streamsets.sdk.sch_models.JobCommittedOffset.

get_run_logs()[source]

Retrieve the logs for the last run of the job.

Returns

A list of dict instances, one dictionary per log line.

get_snowflake_generated_queries()[source]

Retrieve the Snowflake generated queries of the last run of the job.

Returns

A list of dict instances, one dictionary per query.

property metrics

The metrics from all runs of a Job.

Returns

A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.JobMetrics instances.

property pipeline

Get the pipeline object corresponding to this job.

remove_tag(*tags)[source]

Remove a tag

Parameters

*tags – One or more instances of str

property system_job

Get the sytem Job for this job if exists.

Returns

An instance of streamsets.sdk.sch_models.Job.

property tag

Get pipeline tag of the job.

Returns

An instance of streamsets.sdk.sch_models.PipelineTag.

property tags

Get the job tags.

Returns

A streamsets.sdk.utils.SeekableList of instances of streamsets.sdk.sch_models.Tag.

time_series_metrics(metric_type, time_filter_condition='LAST_5M', **kwargs)[source]

Get historic time series metrics for the job.

Parameters
  • metric_type (str) – metric type in {‘Record Count Time Series’, ‘Record Throughput Time Series’, ‘Batch Throughput Time Series’, ‘Stage Batch Processing Timer seconds’}.

  • time_filter_condition (str, optional) – Default: 'LAST_5M'.

Returns

An instance of streamsets.sdk.sch_models.JobTimeSeriesMetrics.

class streamsets.sdk.sch_models.Jobs(control_hub)[source]

Collection of streamsets.sdk.sch_models.Job instances.

Parameters

control_hub (streamsets.sdk.sch.ControlHub) – ControlHub object.

count(status)[source]

Get job counts by status.

Parameters

status (str) – Status of the jobs in {‘ACTIVE’, ‘INACTIVE’, ‘ACTIVATING’, ‘DEACTIVATING’, ‘INACTIVE_ERROR’, ‘ACTIVE_GREEN’, ‘ACTIVE_RED’, ‘’}

Returns

An instance of int indicating the count of jobs with specified status.

class streamsets.sdk.sch_models.JobBuilder(job, control_hub)[source]

Class with which to build instances of streamsets.sdk.sch_models.Job.

Instead of instantiating this class directly, most users should use

streamsets.sdk.sch.ControlHub.get_job_builder().

Parameters

job (dict) – Python object built from our Swagger JobJson definition.

build(job_name, pipeline, job_template=False, runtime_parameters=None, pipeline_commit=None, pipeline_tag=None, pipeline_commit_or_tag=None, tags=None)[source]

Build the job.

Parameters
  • job_name (str) – Name of the job.

  • pipeline (streamsets.sdk.sch_models.Pipeline) – Pipeline object.

  • job_template (boolean, optional) – Indicate if it is a Job Template. Default: False.

  • runtime_parameters (dict, optional) – Runtime Parameters for the Job or Job Template. Default: None.

  • pipeline_commit (streamsets.sdk.sch_models.PipelineCommit) – Default: ``None`, which resolves to the latest pipeline commit.

  • pipeline_tag (streamsets.sdk.sch_models.PipelineTag) – Default: ``None`, which resolves to the latest pipeline tag.

  • pipeline_commit_or_tag (str, optional) – Default: None, which resolves to the latest pipeline commit.

  • tags (list, optional) – Job tags. Default: None.

Returns

An instance of streamsets.sdk.sch_models.Job.

class streamsets.sdk.sch_models.JobMetrics(metrics)[source]

Model for job metrics.

error_count

The number of error records generated by this run of the Job.

Type

int

error_records_per_sec

The number of error records per second generated by this run of the Job.

Type

float

input_count

The number of records ingested by this run of the Job.

Type

int

input_records_per_sec

The number of records per second ingested by this run of the Job.

Type

float

output_count

The number of records output by this run of the Job.

Type

int

output_records_per_sec

The number of records output per second by the run of the Job.

Type

float

pipeline_version

The version of the pipeline that was used in this Job run.

Type

str

run_count

The count corresponding to this Job run.

Type

int

sdc_id

The ID of the SDC instance on which this Job run was executed.

Type

str

stage_errors_count

The number of stage error records generated by this run of the Job.

Type

int

stage_error_records_per_sec

The number of stage error records generated per second by this run of the job.

Type

float

total_error_count

The total number of both error records and stage errors generated by this run of the job.

Type

int

Parameters

metrics (dict) – Metrics counts in JSON format.

class streamsets.sdk.sch_models.JobOffset(offset)[source]

Model for offset.

Parameters

offset (dict) – Offset in JSON format.

class streamsets.sdk.sch_models.JobRunEvent(event)[source]

Model for an event in a Job Run.

Parameters

event (dict) – Job Run Event in JSON format.

class streamsets.sdk.sch_models.JobStatus(status, control_hub, **kwargs)[source]

Model for Job Status.

run_history

(streamsets.sdk.utils.JobRunHistoryEvent): History of a particular job run.

Type

streamsets.sdk.utils.SeekableList

offsets

(streamsets.sdk.utils.JobPipelineOffset): Offsets after the job run.

Type

streamsets.sdk.utils.SeekableList

Parameters
  • status (dict) – Job status in JSON format.

  • control_hub (streamsets.sdk.sch.ControlHub) – ControlHub object.

class streamsets.sdk.sch_models.JobTimeSeriesMetric(metric, metric_type)[source]

Model for job metrics.

name

Name of measurement.

Type

str

values

Timeseries data.

Type

list

time_series

Timeseries data with timestamp as key and metric value as value.

Type

dict

Parameters

metric (dict) – Metrics in JSON format.

class streamsets.sdk.sch_models.JobTimeSeriesMetrics(metrics, metric_type)[source]

Model for job metrics.

input_records

Appears when queried for ‘Record Count Time Series’ or ‘Record Throughput Time Series’.

Type

streamsets.sdk.sch_models.JobTimeSeriesMetric

output_records

Appears when queried for ‘Record Count Time Series’ or ‘Record Throughput Time Series’.

Type

streamsets.sdk.sch_models.JobTimeSeriesMetric

error_records

Appears when queried for ‘Record Count Time Series’ or ‘Record Throughput Time Series’.

Type

streamsets.sdk.sch_models.JobTimeSeriesMetric

batch_counter

Appears when queried for ‘Batch Throughput Time Series’.

Type

streamsets.sdk.sch_models.JobTimeSeriesMetric

batch_processing_timer

Appears when queried for ‘Batch Processing Timer seconds’.

Type

streamsets.sdk.sch_models.JobTimeSeriesMetric

Parameters

metrics (dict) – Metrics in JSON format.

class streamsets.sdk.sch_models.RuntimeParameters(runtime_parameters, job)[source]

Wrapper for ControlHub job runtime parameters.

Parameters

Organizations

class streamsets.sdk.sch_models.Organization(organization, organization_admin_user=None, api_client=None)[source]

Model for Organization.

Parameters
  • organization (str) – Organization Id.

  • organization_admin_user (str, optional) – Default: None.

  • api_client (streamsets.sdk.sch_api.ApiClient, optional) – Default: None.

class streamsets.sdk.sch_models.Organizations(control_hub)[source]

Collection of streamsets.sdk.sch_models.Organization instances.

class streamsets.sdk.sch_models.OrganizationBuilder(organization, organization_admin_user)[source]

Class with which to build instances of streamsets.sdk.sch_models.Organization.

Instead of instantiating this class directly, most users should use

streamsets.sdk.sch.ControlHub.get_organization_builder().

Parameters

organization (dict) – Python object built from our Swagger UserJson definition.

build(id, name, admin_user_id, admin_user_display_name, admin_user_email_address, admin_user_ldap_user_name=None)[source]

Build the organization.

Parameters
  • id (str) – Organization ID.

  • name (str) – Organization name.

  • admin_user_id (str) – User Id of the admin of this organization.

  • admin_user_display_name (str) – User display name of admin of this organization.

  • admin_user_email_address (str) – User email address of admin of this organization.

  • admin_user_ldap_user_name (str, optional) – LDAP username. Default: None.

Returns

An instance of streamsets.sdk.sch_models.Organization.

Pipelines

class streamsets.sdk.sch_models.Pipeline(pipeline, builder, pipeline_definition, rules_definition, control_hub=None, library_definitions=None)[source]

Model for Pipeline.

Parameters
  • pipeline (dict) – Pipeline in JSON format.

  • builder (streamsets.sdk.sch_models.PipelineBuilder) – Pipeline Builder object.

  • pipeline_definition (dict) – Pipeline Definition in JSON format.

  • rules_definition (dict) – Rules Definition in JSON format.

  • control_hub (streamsets.sdk.sch.ControlHub, optional) – ControlHub object. Default: None.

  • library_definitions (dict, optional) – Library Definition in JSON format. Default: None.

property acl

Get pipeline ACL.

Returns

An instance of streamsets.sdk.sch_models.ACL.

add_label(*labels)[source]

Add a label

Parameters

*labels – One or more instances of str

property commits

Get commits for this pipeline.

Returns

A streamsets.sdk.utils.SeekableList of instances of streamsets.sdk.sch_models.PipelineCommit.

property configuration

Get pipeline’s configuration.

Returns

An instance of streamsets.sdk.sch_models.Configuration.

property labels

Get the pipeline labels.

Returns

A streamsets.sdk.utils.SeekableList of instances of streamsets.sdk.sch_models.PipelineLabel.

property parameters

Get the pipeline parameters.

Returns

A dict like, streamsets.sdk.sch_models.PipelineParameters object of parameter key-value pairs.

remove_label(*labels)[source]

Remove a label

Parameters

*labels – One or more instances of str

property tags

Get tags for this pipeline.

Returns

A streamsets.sdk.utils.SeekableList of instances of streamsets.sdk.sch_models.PipelineTag.

class streamsets.sdk.sch_models.Pipelines(control_hub, organization)[source]

Collection of streamsets.sdk.sch_models.Pipeline instances.

Parameters
  • control_hub – An instance of streamsets.sdk.sch.ControlHub.

  • organization (str) – Organization Id.

class streamsets.sdk.sch_models.PipelineBuilder(pipeline, data_collector_pipeline_builder, control_hub=None, fragment=False)[source]

Class with which to build instances of streamsets.sdk.sch_models.Pipeline.

Instead of instantiating this class directly, most users should use

streamsets.sdk.sch.ControlHub.get_pipeline_builder().

Parameters
  • pipeline (dict) – Python object built from our Swagger PipelineJson definition.

  • data_collector_pipeline_builder (streamsets.sdk.sdc_models.PipelineBuilder) – Data Collector Pipeline Builder object.

  • control_hub (streamsets.sdk.sch.ControlHub) – Default: None.

  • fragment (boolean, optional) – Specify if a fragment builder. Default: False.

add_stage(label=None, name=None, type=None, library=None)[source]

Add a stage to the pipeline.

When specifying a stage, either label or name must be used. type and library may also be used to select a particular stage if ambiguities exist. If type and/or library are omitted, the first stage definition matching the given label or name will be used.

Parameters
  • label (str, optional) – Stage label to use when selecting stage from definitions. Default: None.

  • name (str, optional) – Stage name to use when selecting stage from definitions. Default: None.

  • type (str, optional) – Stage type to use when selecting stage from definitions (e.g. origin, destination, processor, executor). Default: None.

  • library (str, optional) – Stage library to use when selecting stage from definitions. Default: None.

Returns

An instance of streamsets.sdk.sch_models.SchSdcStage.

build(title='Pipeline', labels=None, **kwargs)[source]

Build the pipeline.

Parameters
  • title (str) – title of the pipeline.

  • labels (list, optional) – List of pipeline labels of type str. Default: None.

Returns

py:class`streamsets.sdk.sch_models.Pipeline`.

Return type

An instance of

class streamsets.sdk.sch_models.PipelineCommit(pipeline_commit, control_hub=None)[source]

Model for pipeline commit.

Parameters
  • pipeline_commit (dict) – Pipeline commit in JSON format.

  • control_hub (streamsets.sdk.sch.ControlHub) – ControlHub object.

class streamsets.sdk.sch_models.PipelineLabel(pipeline_label)[source]

Model for pipeline label.

Parameters

pipeline_label (dict) – Pipeline label in JSON format.

class streamsets.sdk.sch_models.PipelineLabels(control_hub, organization)[source]

Collection of streamsets.sdk.sch_models.PipelineLabel instances.

Parameters
  • control_hub – An instance of streamsets.sdk.sch.ControlHub.

  • organization (str) – Organization Id.

class streamsets.sdk.sch_models.PipelineParameters(pipeline)[source]

Parameters for pipelines.

Parameters

pipeline (streamsets.sdk.sch_models.Pipeline) – Pipeline Instance.

update(parameters_dict)[source]

Update existing parameters. Works similar to Python dictionary update.

Parameters

parameters_dict (dict) – Dictionary of key-value pairs to be used as parameters.

class streamsets.sdk.sch_models.StPipelineBuilder(pipeline, transformer_pipeline_builder, control_hub=None, fragment=False)[source]

Class with which to build instances of streamsets.sdk.sch_models.Pipeline.

Instead of instantiating this class directly, most users should use

streamsets.sdk.sch.ControlHub.get_pipeline_builder().

Parameters
  • pipeline (dict) – Python object built from our Swagger PipelineJson definition.

  • transformer_pipeline_builder (streamsets.sdk.sdc_models.PipelineBuilder) – Transformer Pipeline Builder object.

  • control_hub (streamsets.sdk.sch.ControlHub) – Default: None.

  • fragment (boolean, optional) – Specify if a fragment builder. Default: False.

add_stage(label=None, name=None, type=None, library=None)[source]

Add a stage to the pipeline.

When specifying a stage, either label or name must be used. type and library may also be used to select a particular stage if ambiguities exist. If type and/or library are omitted, the first stage definition matching the given label or name will be used.

Parameters
  • label (str, optional) – Transformer stage label to use when selecting stage from definitions. Default: None.

  • name (str, optional) – Transformer stage name to use when selecting stage from definitions. Default: None.

  • type (str, optional) – Transformer stage type to use when selecting stage from definitions (e.g. origin, destination, processor, executor). Default: None.

  • library (str, optional) – Transformer stage library to use when selecting stage from definitions. Default: None.

Returns

An instance of streamsets.sdk.sch_models.SchStStage.

build(title='Pipeline', **kwargs)[source]

Build the pipeline.

Parameters

title (str) – title of the pipeline.

Returns

py:class`streamsets.sdk.sch_models.Pipeline`.

Return type

An instance of

Protection Methods

class streamsets.sdk.sch_models.ProtectionMethod(stage)[source]

Protection Method Model.

Parameters

stage (dict) – JSON representation of a stage.

class streamsets.sdk.sch_models.ProtectionMethodBuilder(pipeline_builder)[source]

Class with which to build instances of streamsets.sdk.sch_models.ProtectionMethod.

Instead of instantiating this class directly, most users should use

streamsets.sdk.sch.ControlHub.get_protection_method_builder().

Parameters

pipeline_builder (streamsets.sdk.sch_models.PipelineBuilder) – Pipeline Builder object.

Protection Policies

class streamsets.sdk.sch_models.ProtectionPolicy(protection_policy, procedures=None)[source]

Model for Protection Policy.

Parameters
class streamsets.sdk.sch_models.ProtectionPolicies(control_hub)[source]

Collection of streamsets.sdk.sch_models.ProtectionPolicy instances.

class streamsets.sdk.sch_models.ProtectionPolicyBuilder(control_hub, protection_policy, policy_procedure)[source]

Class with which to build instances of streamsets.sdk.sch_models.ProtectionPolicy.

Instead of instantiating this class directly, most users should use

streamsets.sdk.sch.ControlHub.get_protection_policy_builder().

Parameters
  • protection_policy (dict) – Python object defining a protection policy.

  • policy_procedure (dict) – Python object defining a policy procedure.

class streamsets.sdk.sch_models.PolicyProcedure(policy_procedure)[source]

Model for Policy Procedure.

Parameters

policy_procedure (dict) – JSON representation of Policy Procedure.

ProvisioningAgents

class streamsets.sdk.sch_models.Deployment(deployment, control_hub=None, **kwargs)[source]

Model for Deployment. This is an abstract class.

created_by

User that created this deployment.

Type

str

created_on

Time at which this deployment was created.

Type

int

deployment_id

Id of the deployment.

Type

str

deployment_name

Name of the deployment.

Type

str

deployment_tags

Raw deployment tags of the deployment.

Type

list

deployment_type

Type of the deployment.

Type

str

desired_instances
Type

int

engine_configuration
Type

DeploymentEngineConfiguration

engine_type
Type

str

engine_version
Type

str

environment_id

Enabled environment where engine will be deployed.

Type

list

last_modified_by

User that last modified this deployment.

Type

str

last_modified_on

Time at which this deployment was last modified.

Type

int

network_tags
Type

list

scala_binary_version

scala Binary Version.

Type

str

organization
Type

str

json_state

State of the deployment - this is json field called state.

Type

str

state

State of the deployment - displayed as state on UI.

Type

str

status

Status of the deployment.

Type

str

status_detail

Status detail of the deployment.

Type

str

class ENGINE_TYPES(value)[source]

An enumeration.

class TYPES(value)[source]

An enumeration.

property acl

Get deployment ACL.

Returns

An instance of streamsets.sdk.sch_models.ACL.

property deployment_events

Get the events for a deployment.

Returns

An instance of streamsets.sdk.sch_models.DeploymentEvents

property engine_configuration

The engine configuration of deployment engine. :returns: An instance of streamsets.sdk.sch_models.DeploymentEngineConfiguration.

property registered_engines

Get the registered engines.

Returns

An instance of streamsets.sdk.sch_models.RegisteredEngines

property tags

Get the tags.

Returns

A streamsets.sdk.utils.SeekableList of str instances.

class streamsets.sdk.sch_models.Deployments(control_hub, organization)[source]

Collection of streamsets.sdk.sch_models.Deployment instances.

Parameters
  • control_hub – An instance of streamsets.sdk.sch.ControlHub.

  • organization (str) – Organization Id.

class streamsets.sdk.sch_models.DeploymentBuilder(deployment, control_hub)[source]

Class with which to build instances of streamsets.sdk.sch_models.Deployment.

Instead of instantiating this class directly, most users should use

streamsets.sdk.sch.ControlHub.get_deployment_builder().

Parameters
  • deployment (dict) – Python object built from our Swagger CspDeploymentJson definition.

  • control_hub (streamsets.sdk.sch.ControlHub) – ControlHub object.

build(deployment_name, engine_type, engine_version, environment, **kwargs)[source]

Define the deployment.

Parameters
  • deployment_name (str) – deployment name.

  • engine_type (str) – Type of engine to deploy.

  • engine_version (str) – Version of engine to deploy.

  • environment (str) – ID of the enabled environment where the engine will be deployed.

  • deployment_tags (list, optional) – List of tags (strings). Default: None.

  • engine_build (str, optional) – Build of engine to deploy. Default: None.

  • scala_binary_version (str, optional) – Scala binary version required in case of engine_type=’TF’ Default: None.

Returns

An instance of subclass of streamsets.sdk.sch_models.Deployment.

class streamsets.sdk.sch_models.ProvisioningAgent(provisioning_agent, control_hub)[source]

Model for Provisioning Agent.

Parameters
  • provisioning_agent (dict) – A Python object representation of Provisioning Agent.

  • control_hub – An instance of streamsets.sdk.sch.ControlHub.

property acl

Get the ACL of a Provisioning Agent.

Returns

An instance of streamsets.sdk.sch_models.ACL.

property deployments

Get the deployments associated with the Provisioning Agent.

Returns

A list of streamsets.sdk.sch_models.LegacyDeployment instances.

class streamsets.sdk.sch_models.ProvisioningAgents(control_hub, organization)[source]

Collection of streamsets.sdk.sch_models.ProvisioningAgent instances.

Parameters
  • control_hub – An instance of streamsets.sdk.sch.ControlHub.

  • organization (str) – Organization Id.

Reports

class streamsets.sdk.sch_models.GenerateReportCommand(control_hub, report_defintion, response)[source]

Command to interact with the response from generate_report.

Parameters
  • control_hub (streamsets.sdk.sch.ControlHub) – ControlHub instance.

  • report_definition (dict) – JSON representation of Report Definition.

  • response (dict) – Api response from generating the report.

class streamsets.sdk.sch_models.Report(report, control_hub, report_definition_id)[source]

Model for Report.

Parameters

report (dict) – JSON representation of Report.

download()[source]

Download the Report in PDF format

Returns

An instance of bytes.

class streamsets.sdk.sch_models.Reports(control_hub, report_definition_id)[source]

Collection of streamsets.sdk.sch_models.Report instances.

Parameters
  • control_hub (streamsets.sdk.sch.ControlHub) – ControlHub object.

  • report_definition_id (str) – Report Definition Id.

class streamsets.sdk.sch_models.ReportDefinition(report_definition, control_hub)[source]

Model for Report Definition.

Parameters
  • report_definition (dict) – JSON representation of Report Definition.

  • control_hub (streamsets.sdk.sch.ControlHub) – ControlHub instance.

property acl

Get Report Definition ACL.

Returns

An instance of streamsets.sdk.sch_models.ACL.

generate_report()[source]

Generate a Report for Report Definition.

Returns

An instance of streamsets.sdk.sch_models.Report.

property report_resources

Get Report Resources of the Report Definition.

Returns

An instance of streamsets.sdk.sch_models.ReportResources.

property reports

Get Reports of the Report Definition.

Returns

An instance of streamsets.sdk.sch_models.Reports.

class streamsets.sdk.sch_models.ReportDefinitions(control_hub)[source]

Collection of streamsets.sdk.sch_models.ReportDefinition instances.

class streamsets.sdk.sch_models.ReportDefinitionBuilder(report_definition, control_hub)[source]

Class with which to build instances of streamsets.sdk.sch_models.ReportDefinition.

Instead of instantiating this class directly, most users should use

streamsets.sdk.sch.ControlHub.get_report_definition_builder().

Parameters
  • report_definition (dict) – JSON representation of Report Definition.

  • control_hub (streamsets.sdk.sch.ControlHub) – ControlHub instance.

add_report_resource(resource)[source]

Add a given resource to Report Definition resources.

Parameters

resource (streamsets.sdk.sch_models.Job) or (streamsets.sdk.sch_models.Topology) –

build(name, description=None)[source]

Build the report definition.

Parameters
  • name (str) – Name of the Report Definition.

  • description (str, optional) – Description of the Report Definition. Default: None.

Returns

An instance of streamsets.sdk.sch_models.ReportDefinition.

import_report_definition(report_definition)[source]

Import an existing Report Definition to update it.

Parameters

report_definition (streamsets.sdk.sch_models.ReportDefinition) – Report Definition object.

remove_report_resource(resource)[source]

Remove a resource from Report Definition Resources.

Parameters

resource (streamsets.sdk.sch_models.Job) or (streamsets.sdk.sch_models.Topology) –

Returns

A resource of type dict that is removed from Report Definition Resources.

set_data_retrieval_period(start_time, end_time)[source]

Set Time range over which the report will be generated.

Parameters
  • start_time (str) or (int) – Absolute or relative start time for the Report.

  • end_time (str) or (int) – Absolute or relative end time for the Report.

class streamsets.sdk.sch_models.ReportResource(report_resource)[source]

Model for Report Resource.

Parameters

report_resource (dict) – JSON representation of Report Resource.

class streamsets.sdk.sch_models.ReportResources(report_resources, report_definition)[source]

Model for the collection of Report Resources.

Parameters

Roles

class streamsets.sdk.sch_models.Roles(values, entity, role_label_to_id)[source]

Wrapper class over the list of Roles.

Parameters
append(object)None append object to end[source]
remove(value)None remove first occurrence of value.[source]

Raises ValueError if the value is not present.

Scheduler

class streamsets.sdk.sch_models.ScheduledTask(task, control_hub=None)[source]

Model for Scheduled Task.

Parameters
  • task (dict) – JSON representation of task.

  • control_hub (streamsets.sdk.sch.ControlHub) – ControlHub instance.

property acl

Get the ACL of a Scheduled Task.

Returns

An instance of streamsets.sdk.sch_models.ACL.

property audits

Get Scheduled Task Audits.

Returns

A streamsets.sdk.utils.SeekableList of inherited instances of streamsets.sdk.sch_models.ScheduledTaskAudit.

property runs

Get Scheduled Task Runs.

Returns

A streamsets.sdk.utils.SeekableList of inherited instances of streamsets.sdk.sch_models.ScheduledTaskRun.

class streamsets.sdk.sch_models.ScheduledTaskAudit(audit)[source]

Scheduled Task Audit.

Parameters

run (dict) – JSON representation of scheduled task audit.

class streamsets.sdk.sch_models.ScheduledTaskBaseModel(data, attributes_to_ignore=None, attributes_to_remap=None, repr_metadata=None)[source]

Base Model for Scheduled Task related classes.

class streamsets.sdk.sch_models.ScheduledTaskBuilder(job_selection_types, control_hub)[source]

Builder for Scheduled Task.

Instead of instantiating this class directly, most users should use

streamsets.sdk.sch.ControlHub.get_scheduled_task_builder().

Parameters
  • job_selection_types (dict) – JSON representation of job selection types.

  • control_hub (streamsets.sdk.sch.ControlHub) – ControlHub instance.

build(task_object, action='START', name=None, description=None, cron_expression='0 0 1/1 * ? *', time_zone='UTC', status='RUNNING', start_time=None, end_time=None, missed_execution_handling='IGNORE')[source]

Builder for Scheduled Task.

Parameters
  • task_object (streamsets.sdk.sch_models.Job) – (streamsets.sdk.sch_models.ReportDefinition): Job or ReportDefinition object.

  • action (str, optional) – One of the {‘START’, ‘STOP’, ‘UPGRADE’} actions. Default: START.

  • name (str, optional) – Name of the task. Default: None.

  • description (str, optional) – Description of the task. Default: None.

  • crontab_mask (str, optional) – Schedule in cron syntax. Default: "0 0 1/1 * ? *". (Daily at 12).

  • time_zone (str, optional) – Time zone. Default: "UTC".

  • status (str, optional) – One of the {‘RUNNING’, ‘PAUSED’} statuses. Default: RUNNING.

  • start_time (str, optional) – Start time of task. Default: None.

  • end_time (str, optional) – End time of task. Default: None.

  • missed_trigger_handling (str, optional) – One of {‘IGNORE’, ‘RUN ALL’, ‘RUN ONCE’}. Default: IGNORE.

Returns

An instance of streamsets.sdk.sch_models.ScheduledTask.

class streamsets.sdk.sch_models.ScheduledTaskRun(run)[source]

Scheduled Task Run.

Parameters

run (dict) – JSON representation if scheduled task run.

class streamsets.sdk.sch_models.ScheduledTasks(control_hub)[source]

Collection of streamsets.sdk.sch_models.ScheduledTask instances.

Subscriptions

class streamsets.sdk.sch_models.Subscription(subscription, control_hub)[source]

Subscription.

Parameters
  • subscription (dict) – JSON representation of Subscription.

  • control_hub (streamsets.sdk.sch.ControlHub) – ControlHub instance.

property acl

Get the ACL of an Event Subscription.

Returns

An instance of streamsets.sdk.sch_models.ACL.

property action

Action of the Subscription.

property events

Events of the Subscription.

class streamsets.sdk.sch_models.Subscriptions(control_hub)[source]

Collection of streamsets.sdk.sch_models.Subscription instances.

Parameters

control_hub (streamsets.sdk.sch.ControlHub) – ControlHub object.

class streamsets.sdk.sch_models.SubscriptionAction(action)[source]

Action to take when the Subscription is triggered.

Parameters

action (dict) – JSON representation of an Action for a Subscription.

class streamsets.sdk.sch_models.SubscriptionAudit(audit)[source]

Model for subscription audit.

Parameters

audit (dict) – JSON representation of a subscription audit.

class streamsets.sdk.sch_models.SubscriptionBuilder(subscription, control_hub)[source]

Builder for Subscription.

Instead of instantiating this class directly, most users should use

streamsets.sdk.sch.ControlHub.get_subscription_builder().

Parameters
  • subscription (dict) – JSON representation of event subscription.

  • control_hub (streamsets.sdk.sch.ControlHub) – ControlHub instance.

add_event(event_type, filter=None)[source]

Add event to the Subscription.

Parameters
  • event_type (str) – Type of event in {‘Job Status Change’, ‘Data SLA Triggered’, ‘Pipeline Committed’, ‘Pipeline Status Change’, ‘Report Generated’, ‘Data Collector not Responding’}.

  • filter (str, optional) – Filter to be applied on event. Default: None.

build(name, description=None)[source]

Builder for Scheduled Task.

Parameters
  • name (str) – Name of Subscription.

  • description (str, optional) – Description of subscription. Default: None.

Returns

An instance of streamsets.sdk.sch_models.Subscription.

import_subscription(subscription)[source]

Import an existing Subscription into the builder to update it.

Parameters

subscription (streamsets.sdk.sch_models.Subscription) – Subscription instance.

remove_event(event_type)[source]

Remove event from the subscription.

Parameters

event_type (str) – Type of event in {‘Job Status Change’, ‘Data SLA Triggered’, ‘Pipeline Committed’, ‘Pipeline Status Change’, ‘Report Generated’, ‘Data Collector not Responding’}.

Returns

An instance of streamsets.sdk.sch_models.SubscriptionEvent.

set_email_action(recipients, subject=None, body=None)[source]

Set the Email action.

Parameters
  • recipients (list) – List of email addresses.

  • subject (str, optional) – Subject of the email. Default: None.

  • body (str, optional) – Body of the email. Default: None.

set_webhook_action(uri, method='GET', content_type=None, payload=None, auth_type=None, username=None, password=None, timeout=30000, headers=None)[source]

Set the Webhook action.

Parameters
  • uri (str) – URI for the Webhook.

  • method (str, optional) – HTTP method to use. Default: 'GET'.

  • content_type (str, optional) – Content Type of the request. Default: None.

  • payload (str, optional) – Payload of the request. Default: None.

  • auth_type (str, optional) – 'basic' or None. Default: None.

  • username (str, optional) – username for the authentication. Default: None.

  • password (str, optional) – password for the authentication. Default: None.

  • timeout (int, optional) – timeout for the Webhook action. Default: 30000.

  • headers (dict, optional) – Headers to be sent to the Webhook call. Default: None.

class streamsets.sdk.sch_models.SubscriptionEvent(event, control_hub)[source]

An Event of a Subscription.

Parameters
  • event (dict) – JSON representation of Events of a Subscription.

  • control_hub (streamsets.sdk.sch.ControlHub) – ControlHub instance.

Topologies

class streamsets.sdk.sch_models.DataSla(data_sla, control_hub)[source]

Model for DataSla.

Parameters
  • data_sla (dict) – JSON representation of SLA.

  • control_hub (streamsets.sdk.sch.ControlHub) – An instance of the Control Hub.

class streamsets.sdk.sch_models.DataSlaBuilder(data_sla, control_hub)[source]

Class with which to build instances of streamsets.sdk.sch_models.DataSla.

Instead of instantiating this class directly, most users should use

streamsets.sdk.sch.ControlHub.get_data_sla_builder().

Parameters
  • data_sla (dict) – Python object built from our Swagger DataSlaJson definition.

  • control_hub (streamsets.sdk.sch.ControlHub) – An instance of the Control Hub.

build(topology, label, job, alert_text, qos_parameter='THROUGHPUT_RATE', function_type='Max', min_max_value=100, enabled=True)[source]

Build the Data Sla.

Parameters
  • topology (streamsets.sdk.sch_models.Topology) – Topology object.

  • label (str) – Label for the SLA.

  • job (list) – List of streamsets.sdk.sch_models.Job objects.

  • alert_text (str) – Alert text.

  • qos_parameter (str, optional) – paramter in {‘THROUGHPUT_RATE’, ‘ERROR_RATE’}. Default: 'THROUGHPUT_RATE'.

  • function_type (str, optional) – paramter in {‘Max’, ‘Min’}. Default: 'Max'.

  • min_max_value (str, optional) – Default: 100.

  • enabled (boolean, optional) – Default: True.

Returns

An instance of streamsets.sdk.sch_models.DataSla.

class streamsets.sdk.sch_models.Topology(topology, control_hub=None)[source]

Model for Topology.

Parameters

topology (dict) – JSON representation of Topology.

commit_id

Pipeline commit id.

Type

str

commit_message

Commit Message.

Type

str

commit_time

Time at which commit was made.

Type

int

committed_by

User that made the commit.

Type

str

default_topology

Default Topology.

Type

bool

description

Topology description.

Type

str

draft

Indicates whether this topology is a draft.

Type

bool

last_modified_by

User that last modified this topology.

Type

str

last_modified_on

Time at which this topology was last modified.

Type

int

new_pipeline_version_available

Whether any job in the topology has a new pipeline version to be updated to.

Type

bool

organization

Id of the organization.

Type

str

parent_version

Version of the parent topology.

Type

str

topology_definition

Definition of the topology.

Type

str

topology_id

Id of the topology.

Type

str

topology_name

Name of the topology.

Type

str

validation_issues

Any validation issues that exist for this Topology.

Type

dict

version

Version of this topology.

Type

str

acknowledge_job_errors()[source]

Acknowledge all errors for the jobs in a topology.

Returns

An instance of streamsets.sdk.sch_api.Command.

property acl

Get the ACL of a Topology.

Returns

An instance of streamsets.sdk.sch_models.ACL.

activate_data_sla(*data_slas)[source]

Activate Data SLAs.

Parameters

*data_slas – One or more instances of streamsets.sdk.sch_models.DataSla.

Returns

An instance of streamsets.sdk.sch_api.Command.

add_data_sla(data_sla)[source]

Add SLA.

Parameters

data_sla (streamsets.sdk.sch_models.DataSla) – Data SLA object.

Returns

An instance of streamsets.sdk.sch_api.Command.

auto_discover_connections()[source]

Auto discover connecting systems between nodes in a Topology.

Returns

An instance of streamsets.sdk.sch_api.Command.

auto_fix()[source]

Auto-fix a topology by rectifying invalid or removed jobs, outdated jobs, etc.

Returns

An instance of streamsets.sdk.sch_api.Command.

deactivate_data_sla(*data_slas)[source]

Deactivate Data SLAs.

Parameters

*data_slas – One or more instances of streamsets.sdk.sch_models.DataSla.

Returns

An instance of streamsets.sdk.sch_api.Command.

delete_data_sla(*data_slas)[source]

Delete Data SLAs.

Parameters

*data_slas – One or more instances of streamsets.sdk.sch_models.DataSla.

Returns

An instance of streamsets.sdk.sch_api.Command.

property jobs

Get the jobs that are contained within the Topology.

Returns

A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.Job instances.

property new_pipeline_version_available

Determine if a new pipeline version is available for any jobs in the Topology.

Returns

A (bool) value.

property nodes

Get the job and system nodes that make up the Topology.

Returns

A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.TopologyNode

instances.

start_all_jobs()[source]

Start all jobs of a topology.

Returns

An instance of streamsets.sdk.sch_api.Command.

stop_all_jobs(force=False)[source]

Stop all jobs of a topology.

Parameters

force (bool, optional) – Force topology jobs to stop. Default: False.

Returns

An instance of streamsets.sdk.sch_api.Command.

update_jobs_to_latest_change()[source]

Upgrade a topology’s job(s) to the latest corresponding pipeline change.

Returns

An instance of streamsets.sdk.sch_api.Command.

property validation_issues

Get any validation issues that are detected for the Topology.

Returns

A (list) of validation issues in JSON format.

class streamsets.sdk.sch_models.Topologies(control_hub)[source]

Collection of streamsets.sdk.sch_models.Topology instances.

Parameters

control_hub (streamsets.sdk.sch.ControlHub) – An instance of the ControlHub.

class streamsets.sdk.sch_models.TopologyBuilder(topology, control_hub=None)[source]

Class with which to build instances of streamsets.sdk.sch_models.Topology.

Instead of instantiating this class directly, most users should use

streamsets.sdk.sch.ControlHub.get_topology_builder().

Parameters
  • topology (dict) – Python object built from our Swagger TopologyJson definition.

  • control_hub (streamsets.sdk.ControlHub, optional) – Control Hub instance. Default: None

add_job(job)[source]

Add a job node to the Topology being built.

Parameters

job (streamsets.sdk.sch_models.Job) – An instance of a job to be added.

add_system(name)[source]

Add a system node to the Topology being built.

Parameters

name (str) – The name of the system to add to the topology.

build(topology_name=None, description=None)[source]

Build the topology.

Parameters
  • topology_name (str, optional) – Name of the topology. This parameter is required when building a new topology.

  • description (str, optional) – Description of the topology. Default: None.

Returns

An instance of streamsets.sdk.sch_models.Topology.

delete_node(topology_node)[source]

Delete a system or job node from the topology.

Parameters

topology_node (streamsets.sdk.sch_models.TopologyNode) – An instance of a TopologyNode to delete from the topology.

import_topology(topology)[source]

Import an existing topology to be used in the builder.

Parameters

topology (streamsets.sdk.sch_models.Topology) – An existing Topology instance to modify.

property topology_nodes

Get all of the nodes currently part of the topology held by the TopologyBuilder.

Returns

A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.TopologyNode

instances.

class streamsets.sdk.sch_models.TopologyNode(topology_node_json)[source]

Model for a node within a Topology.

Parameters

topology_node_json (dict) – JSON representation of a Topology Node.

node_type

The type of this node, i.e. SYSTEM, JOB, etc.

Type

str

instance_name

The name of this node instance.

Type

str

stage_name

The name of the stage in this node.

Type

str

stage_version

The version of the stage in this node.

Type

str

job_id

The ID of the job in this node.

Type

str

pipeline_id

The pipeline ID associated with the job in this node.

Type

str

pipeline_commit_id

The commit ID of the pipeline.

Type

str

pipeline_version

The version of the pipeline.

Type

str

input_lanes

A list of str representing the input lanes for this node.

Type

list

output_lanes

A list of str representing the output lanes for this node.

Type

list

Users

class streamsets.sdk.sch_models.User(user, roles, control_hub=None)[source]

Model for User.

Parameters
  • user (dict) – JSON representation of User.

  • roles (dict) – A mapping of role IDs to role labels.

  • control_hub (streamsets.sdk.ControlHub, optional) – An instance of Control Hub. Default: None

active

Whether the user is active or not.

Type

bool

created_by

Creator of this user.

Type

str

created_on

Creation time of this user.

Type

str

display_name

Display name of this user.

Type

str

email_address

Email address of this user.

Type

str

id

Id of this user.

Type

str

groups

Groups this user belongs to.

Type

streamsets.sdk.util.SeekableList of streamsets.sdk.sch_models.Group instances

last_modified_by

User last modified by.

Type

str

last_modified_on

User last modification time.

Type

str

password_expires_on

User’s password expiration time.

Type

str

password_system_generated

Whether User’s password is system generated or not.

Type

bool

roles

A set of role labels.

Type

set

saml_user_name

SAML username of user.

Type

str

class streamsets.sdk.sch_models.Users(control_hub, roles, organization)[source]

Collection of streamsets.sdk.sch_models.User instances.

Parameters
  • control_hub – An instance of streamsets.sdk.sch.ControlHub.

  • roles (dict) – A mapping of role IDs to role labels.

  • organization (str) – Organization ID.

class streamsets.sdk.sch_models.UserBuilder(user, roles, control_hub=None)[source]

Class with which to build instances of streamsets.sdk.sch_models.User.

Instead of instantiating this class directly, most users should use

streamsets.sdk.sch.ControlHub.get_user_builder().

Parameters
  • user (dict) – Python object built from our Swagger UserJson definition.

  • roles (dict) – A mapping of role IDs to role labels.

  • control_hub – An instance of streamsets.sdk.sch.ControlHub. Default: None.

build(email_address)[source]

Build the user.

Parameters

email_address (str) – User Email Address.

Returns

An instance of streamsets.sdk.sch_models.User.

Common

Models used by StreamSets Data Collector and StreamSets Control Hub:

class streamsets.sdk.models.Configuration(configuration=None, property_key='name', property_value='value', **kwargs)[source]

Abstraction for stage configurations.

This class enables easy access to and modification of data stored as a list of dictionaries. As an example, SDC’s pipeline configuration is stored in the form

[ {
  "name" : "executionMode",
  "value" : "STANDALONE"
}, {
  "name" : "deliveryGuarantee",
  "value" : "AT_LEAST_ONCE"
}, ... ]

By implementing simple __getitem__ and __setitem__ methods, this class allows items in this list to be accessed using

configuration['executionMode'] = 'CLUSTER_BATCH'

Instead of the more verbose

for property in configuration:
    if property['name'] == 'executionMode':
        property['value'] = 'CLUSTER_BATCH'
    break
Parameters
  • configuration (str) – List of dictionaries comprising the configuration.

  • property_key (str, optional) – The dictionary entry denoting the property key. Default: name

  • property_value (str, optional) – The dictionary entry denoting the property value. Default: value

get(key, default=None)[source]

Return the value of key or, if not in the configuration, the default value.

items()[source]

Gets the configuration’s items.

Returns

A new view of the configuration’s items ((key, value) pairs).

update(configs)[source]

Update instance with a collection of configurations.

Parameters

configs (dict) – Dictionary of configurations to use.

Exceptions

Common exceptions.

exception streamsets.sdk.exceptions.BadRequestError(response)[source]

Bad request error (HTTP 400).

exception streamsets.sdk.exceptions.ConnectionError(code, message)[source]

Connection Catalog errors.

exception streamsets.sdk.exceptions.InternalServerError(response)[source]

Internal server error.

exception streamsets.sdk.exceptions.InvalidCredentialsError(message)[source]

Invalid credentials error.

exception streamsets.sdk.exceptions.JobInactiveError(message)[source]

Job status changed into INACTIVE_ERROR.

exception streamsets.sdk.exceptions.JobRunnerError(code, message)[source]

JobRunner errors.

exception streamsets.sdk.exceptions.LegacyDeploymentInactiveError(message)[source]

Legacy deployment status changed into INACTIVE_ERROR.

exception streamsets.sdk.exceptions.MultipleIssuesError(errors)[source]

Multiple errors were returned.

exception streamsets.sdk.exceptions.TopologyIssuesError(issues)[source]

Topology has some issues.

exception streamsets.sdk.exceptions.UnprocessableEntityError(response)[source]

Unprocessable Entity Error (HTTP 422).

exception streamsets.sdk.exceptions.UnsupportedMethodError(message)[source]

An unsupported method was called.

exception streamsets.sdk.exceptions.ValidationError(issues)[source]

Validation issues.