StreamSets Control Hub#

Main interface#

This is the main entry point used by users when interacting with SCH instances.

class streamsets.sdk.ControlHub(credential_id, token, use_websocket_tunneling=True, **kwargs)[source]#

Class to interact with StreamSets Control Hub.

Parameters

credential_id (str) – ControlHub credential ID.
token (str) – ControlHub token.
use_websocket_tunneling (bool, optional) – use websocket tunneling. Default: True.

acknowledge_event_subscription_error(subscription)[source]#

Acknowledge an error on given Event Subscription.

Parameters: subscription (streamsets.sdk.sch_models.Subscription) – A Subscription instance.
Returns: An instance of streamsets.sdk.sch_api.Command.

acknowledge_job_error(*jobs)[source]#

Acknowledge errors for one or more jobs.

Parameters: *jobs – One or more instances of streamsets.sdk.sch_models.Job.

acknowledge_legacy_deployment_error(*deployments)[source]#

Acknowledge errors for one or more deployments.

Parameters: *deployments – One or more instances of streamsets.sdk.sch_models.LegacyDeployment.

property action_audits#

Action Audits.

Returns: An instance of streamsets.sdk.sch_models.ActionAudits.

activate_api_credential(api_credential)[source]#

Activate an api credential.

Parameters: api_credential (streamsets.sdk.sch_models.ApiCredential) – ApiCredential object.
Returns: An instance of streamsets.sdk.sch_api.Command.

activate_datacollector(data_collector)[source]#

Activate data collector.

Parameters: data_collector (streamsets.sdk.sch_models.DataCollector) – Data Collector object.

activate_environment(*environments, timeout_sec=300)[source]#

Activate environments.

Parameters: *environments – One or more instances of streamsets.sdk.sch_models.Environment.
Returns: An instance of streamsets.sdk.sch_api.Command or None.

activate_provisioning_agent(provisioning_agent)[source]#

Activate provisioning agent.

Parameters: provisioning_agent (streamets.sdk.sch_models.ProvisioningAgent) –
Returns: An instance of streamsets.sdk.sch_api.Command.

activate_user(*users)[source]#

Activate users for all given User IDs.

Parameters: *users – One or more instance of streamsets.sdk.sch_models.User.
Returns: An instance of streamsets.sdk.aster_api.Command.

add_api_credential(api_credential)[source]#

Add an api credential. Some api credential attributes are updated by ControlHub such as: created_by.

Parameters: api_credential (streamsets.sdk.sch_models.ApiCredential) – ApiCredential object.
Returns: An instance of streamsets.sdk.sch_api.Command.

add_classification_rule(classification_rule, commit=False)[source]#

Add a classification rule.

Parameters

classification_rule (streamsets.sdk.sch_models.ClassificationRule) – Classification Rule object.
commit (bool, optional) – Whether to commit the rule after adding it. Default: False.

add_connection(connection)[source]#

Add a connection.

Parameters: connection (streamsets.sdk.sch_models.Connection) – Connection object.
Returns: An instance of streamsets.sdk.sch_api.Command.

add_deployment(deployment)[source]#

Add a deployment.

Parameters: deployment (streamsets.sdk.sch_models.Deployment) – Deployment object.
Returns: An instance of streamsets.sdk.sch_api.Command.

add_environment(environment)[source]#

Add an environment.

Parameters: environment (streamsets.sdk.sch_models.Environment) – Environment object.
Returns: An instance of streamsets.sdk.sch_api.Command.

add_group(group)[source]#

Add a group.

Parameters: group (streamsets.sdk.sch_models.Group) – Group object.
Returns: An instance of streamsets.sdk.sch_api.Command.

add_job(job)[source]#

Add a job.

Parameters: job (streamsets.sdk.sch_models.Job) – Job object.
Returns: An instance of streamsets.sdk.sch_models.Job.

add_legacy_deployment(deployment)[source]#

Add a legacy deployment.

Parameters: deployment (streamsets.sdk.sch_models.LegacyDeployment) – Deployment object.
Returns: An instance of streamsets.sdk.sch_api.Command.

add_organization(organization)[source]#

Add an organization.

Parameters: organization (streamsets.sdk.sch_models.Organization) – Organization object.
Returns: An instance of streamsets.sdk.sch_api.Command.

add_protection_policy(protection_policy)[source]#

Add a protection policy.

Parameters: protection_policy (streamsets.sdk.sch_models.ProtectionPolicy) – Protection Policy object.

add_report_definition(report_definition)[source]#

Add Report Definition to ControlHub.

Parameters: report_definition (streamsets.sdk.sch_models.ReportDefinition) – Report Definition instance.
Returns: An instance of streamsets.sdk.sch_api.Command.

add_subscription(subscription)[source]#

Add Subscription to ControlHub.

Parameters: subscription (streamsets.sdk.sch_models.Subscription) – A Subscription instance.
Returns: An instance of streamsets.sdk.sch_api.Command.

property alerts#

Alerts.

Returns: A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.Alert.

property api_credentials#

Api Credentials.

Returns: A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.ApiCredential.

assign_administrator_role(user)[source]#

Designate the specified user as an Organization Administrator.

Parameters: user (streamsets.sdk.sch_models.User) – User object.
Returns: An instance of streamsets.sdk.aster_api.Command.

balance_data_collectors(*data_collectors)[source]#

Balance all jobs running on given Data Collectors.

Parameters: *sdcs – One or more instances of streamsets.sdk.sch_models.DataCollector.

balance_job(*jobs)[source]#

Balance one or more jobs.

Parameters: *jobs – One or more instances of streamsets.sdk.sch_models.Job.

property connection_audits#

Connection Audits.

Returns: An instance of streamsets.sdk.sch_models.ConnectionAudits.

property connection_tags#

Connection Tags.

Returns: An instance of streamsets.sdk.sch_models.ConnectionTags.

property connections#

Connections.

Returns: An instance of streamsets.sdk.sch_models.Connections.

create_components(component_type, number_of_components=1, active=True)[source]#

Create components.

Parameters

component_type (str) – Component type.
number_of_components (int, optional) – Default: 1.
active (bool, optional) – Default: True.

Returns

An instance of streamsets.sdk.sch_api.CreateComponentsCommand.

property data_collectors#

Data Collectors registered to the ControlHub instance.

Returns: Returns a streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.DataCollector instances.

property data_protector_enabled#

Whether Data Protector is enabled for the current organization.

Type: bool

deactivate_api_credential(api_credential)[source]#

Deactivate an api credential.

Parameters: api_credential (streamsets.sdk.sch_models.ApiCredential) – ApiCredential object.
Returns: An instance of streamsets.sdk.sch_api.Command.

deactivate_datacollector(data_collector)[source]#

Deactivate data collector.

Parameters: data_collector (streamsets.sdk.sch_models.DataCollector) – Data Collector object.

deactivate_environment(*environments)[source]#

Deactivate environments.

Parameters: *environments – One or more instances of streamsets.sdk.sch_models.Environment.
Returns: An instance of streamsets.sdk.sch_api.Command.

deactivate_provisioning_agent(provisioning_agent)[source]#

Deactivate provisioning agent.

Parameters: provisioning_agent (streamets.sdk.sch_models.ProvisioningAgent) –
Returns: An instance of streamsets.sdk.sch_api.Command.

deactivate_user(*users)[source]#

Deactivate Users for all given User IDs.

Parameters: *users – One or more instances of streamsets.sdk.sch_models.User.
Returns: An instance of streamsets.sdk.aster_api.Command.

delete_and_unregister_data_collector(data_collector)[source]#

Delete and Unregister data collector.

Parameters: data_collector (streamsets.sdk.sch_models.DataCollector) – Data Collector object.

delete_api_credentials(*api_credentials)[source]#

Delete api_credentials.

Parameters: *api_credentials – One or more instances of streamsets.sdk.sch_models.ApiCredential.

delete_connection(*connections)[source]#

Delete connections.

Parameters: *connections – One or more instances of streamsets.sdk.sch_models.Connection.

delete_data_collector(data_collector)[source]#

Delete data collector.

Parameters: data_collector (streamsets.sdk.sch_models.DataCollector) – Data Collector object.

delete_deployment(*deployments)[source]#

Delete deployments.

Parameters: *deployments – One or more instances of streamsets.sdk.sch_models.Deployment.

delete_environment(*environments)[source]#

Delete environments.

Parameters: *environments – One or more instances of streamsets.sdk.sch_models.Environment.
Returns: An instance of streamsets.sdk.sch_api.Command.

delete_group(*groups)[source]#

Delete groups.

Parameters: *groups – One or more instances of streamsets.sdk.sch_models.Group.
Returns: An instance of streamsets.sdk.sch_api.Command.

delete_job(*jobs)[source]#

Delete one or more jobs.

Parameters: *jobs – One or more instances of streamsets.sdk.sch_models.Job.

delete_legacy_deployment(*deployments)[source]#

Delete deployments.

Parameters: *deployments – One or more instances of streamsets.sdk.sch_models.LegacyDeployment.
Returns: An instance of streamsets.sdk.sch_api.Command.

delete_pipeline(pipeline, only_selected_version=False)[source]#

Delete a pipeline.

Parameters

pipeline (streamsets.sdk.sch_models.Pipeline) – Pipeline object.
only_selected_version (boolean) – Delete only current commit.

Returns

An instance of streamsets.sdk.sch_api.Command.

delete_pipeline_labels(*pipeline_labels)[source]#

Delete pipeline labels.

Parameters: *pipeline_labels – One or more instances of streamsets.sdk.sch_models.PipelineLabel.
Returns: An instance of streamsets.sdk.sch_api.Command.

delete_provisioning_agent(provisioning_agent)[source]#

Delete provisioning agent.

Parameters: provisioning_agent (streamets.sdk.sch_models.ProvisioningAgent) –
Returns: An instance of streamsets.sdk.sch_api.Command.

delete_provisioning_agent_token(provisioning_agent)[source]#

Delete provisioning agent token.

Parameters: provisioning_agent (streamets.sdk.sch_models.ProvisioningAgent) –
Returns: An instance of streamsets.sdk.sch_api.Command.

delete_report_definition(report_definition)[source]#

Delete an existing Report Definition.

Parameters: report_definition (streamsets.sdk.sch_models.ReportDefinition) – Report Definition instance.
Returns: An instance of streamsets.sdk.sch_api.Command.

delete_snowflake_pipeline_defaults()[source]#

Delete the Snowflake pipeline defaults for this user.

Returns: An instance of streamsets.sdk.sch_api.Command.

delete_snowflake_user_credentials()[source]#

Delete the Snowflake user credentials.

Returns: An instance of streamsets.sdk.sch_api.Command.

delete_subscription(subscription)[source]#

Delete an exisiting Subscription.

Parameters: subscription (streamsets.sdk.sch_models.Subscription) – A Subscription instance.
Returns: An instance of streamsets.sdk.sch_api.Command.

delete_topology(topology, only_selected_version=False)[source]#

Delete a topology.

Parameters

topology (streamsets.sdk.sch_models.Topology) – Topology object.
only_selected_version (boolean) – Delete only current commit.

Returns

An instance of streamsets.sdk.sch_api.Command.

delete_unregistered_auth_tokens(executor_type)[source]#

Delete auth tokens for engines that have been unregistered.

Parameters: executor_type (str) – The executor type. Acceptable values are ‘DATACOLLECTOR’ or ‘TRANSFORMER’.
Returns: An instance of streamsets.sdk.sch_api.Command.

delete_user(*users, deactivate=False)[source]#

Delete users. Deactivate users before deleting if configured.

Parameters

*users – One or more instances of streamsets.sdk.sch_models.User.
deactivate (bool, optional) – Default: False.

Returns

An instance of streamsets.sdk.aster_api.Command.

property deployments#

Deployments.

Returns: An instance of streamsets.sdk.sch_models.Deployments.

duplicate_job(job, name=None, description=None, number_of_copies=1)[source]#

Duplicate an existing job.

Parameters

job (streamsets.sdk.sch_models.Job) – Job object.
name (str, optional) – Name of the new job(s). Default: None. If not specified, name of the job with ' copy' appended to the end will be used.
description (str, optional) – Description for new job(s). Default: None.
number_of_copies (int, optional) – Number of copies. Default: 1.

Returns

A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.Job.

duplicate_pipeline(pipeline, name=None, description='New Pipeline', number_of_copies=1)[source]#

Duplicate an existing pipeline.

Parameters

pipeline (streamsets.sdk.sch_models.Pipeline) – Pipeline object.
name (str, optional) – Name of the new pipeline(s). Default: None.
description (str, optional) – Description for new pipeline(s). Default: 'New Pipeline'.
number_of_copies (int, optional) – Number of copies. Default: 1.

Returns

A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.Pipeline.

edit_job(job)[source]#

Edit a job.

Parameters: job (streamsets.sdk.sch_models.Job) – Job object.
Returns: An instance of streamsets.sdk.sch_models.Job.

property engine_configurations#

Deployment engine configurations.

Returns: An instance of streamsets.sdk.sch_models.DeploymentEngineConfiguration.

property environments#

environments.

Returns: An instance of streamsets.sdk.sch_models.Environments.

export_jobs(jobs)[source]#

Export jobs to a compressed archive.

Parameters: jobs (list) – A list of streamsets.sdk.sch_models.Job instances.
Returns: An instance of type bytes indicating the content of zip file with job json files.

export_pipelines(pipelines, fragments=False, include_plain_text_credentials=False)[source]#

Export pipelines.

Parameters

pipelines (list) – A list of streamsets.sdk.sch_models.Pipeline instances.
fragments (bool) – Indicates if exporting fragments is needed.
include_plain_text_credentials (bool) – Indicates if plain text credentials should be included.

Returns

An instance of type bytes indicating the content of zip file with pipeline json files.

export_protection_policies(protection_policies)[source]#

Export protection policies to a compressed archive.

Parameters

protection_policies (list) – A list of streamsets.sdk.sch_models.ProtectionPolicy
instances. –

Returns

An instance of type bytes indicating the content of zip file with protection policy json files.

export_topologies(topologies)[source]#

Export topologies.

Parameters: topologies (list) – A list of streamsets.sdk.sch_models.Topology instances.
Returns: An instance of type bytes indicating the content of zip file with pipeline json files.

get_admin_tool(base_url, username, password)[source]#

Get ControlHub admin tool.

Returns: An instance of streamsets.sdk.sch_models.AdminTool.

get_api_credential_builder()[source]#

Get api credential Builder.

Returns: An instance of streamsets.sdk.sch_models.ApiCredentialBuilder.

get_classification_rule_builder()[source]#

Get a classification rule builder instance with which a classification rule can be created.

Returns: An instance of streamsets.sdk.sch_models.ClassificationRuleBuilder.

get_components(component_type_id, offset=None, len_=None, order_by='LAST_VALIDATED_ON', order='ASC')[source]#

Get components.

Parameters

component_type_id (str) – Component type id.
offset (str, optional) – Default: None.
len (str, optional) – Default: None.
order_by (str, optional) – Default: 'LAST_VALIDATED_ON'.
order (str, optional) – Default: 'ASC'.

get_connection_builder()[source]#

Get a connection builder instance with which a connection can be created.

Returns: An instance of streamsets.sdk.sch_models.ConnectionBuilder.

get_current_job_status(job)[source]#

Returns the current job status for given job id.

Parameters: job (streamsets.sdk.sch_models.Job) – Job object.

get_data_collector_labels(data_collector)[source]#

Returns all labels assigned to data collector.

Parameters: data_collector (streamsets.sdk.sch_models.DataCollector) – Data Collector object.
Returns: A list of data collector assigned labels.

get_data_sla_builder()[source]#

Get a Data SLA builder instance with which a Data SLA can be created.

Returns: An instance of streamsets.sdk.sch_models.DataSlaBuilder.

get_deployment_builder(deployment_type='SELF')[source]#

Get a deployment builder instance with which a deployment can be created.

Parameters: ( (deployment_type) – obj: str, optional) Valid values are ‘AWS’, ‘GCP’ and ‘SELF’. Default 'SELF'
Returns: An instance of streamsets.sdk.sch_models.DeploymentBuilder.

get_environment_builder(environment_type='SELF')[source]#

Get an environment builder instance with which an environment can be created.

Parameters: ( (environment_type) – obj: str, optional) Valid values are ‘AWS’, ‘GCP’ and ‘SELF’. Default 'SELF'
Returns: An instance of streamsets.sdk.sch_models.EnvironmentBuilder.

get_group_builder()[source]#

Get a group builder instance with which a group can be created.

Returns: An instance of streamsets.sdk.sch_models.GroupBuilder.

get_job_builder()[source]#

Get a job builder instance with which a job can be created.

Returns: An instance of streamsets.sdk.sch_models.JobBuilder.

get_legacy_deployment_builder()[source]#

Get a deployment builder instance with which a legacy deployment can be created.

Returns: An instance of streamsets.sdk.sch_models.LegacyDeploymentBuilder.

get_organization_builder()[source]#

Get an organization builder instance with which an organization can be created.

Returns: An instance of streamsets.sdk.sch_models.OrganizationBuilder.

get_pipeline_builder(engine_type, engine_id=None, fragment=False)[source]#

Get a pipeline builder instance with which a pipeline can be created.

Parameters

engine_type (str) – The type of pipeline that will be created. The options are 'data_collector', 'snowflake' or 'transformer'.
engine_id (str) – The ID of the Data Collector or Transformer in which to author the pipeline if not using Transformer for Snowflake.
fragment (boolean, optional) – Specify if a fragment builder. Default: False.

Returns

An instance of streamsets.sdk.sch_models.PipelineBuilder or streamsets.sdk.sch_models.StPipelineBuilder.

get_protection_method_builder()[source]#

Get a protection method builder instance with which a protection method can be created.

Returns: An instance of streamsets.sdk.sch_models.ProtectionMethodBuilder.

get_protection_policy_builder()[source]#

Get a protection policy builder instance with which a protection policy can be created.

Returns: An instance of streamsets.sdk.sch_models.ProtectionPolicyBuilder.

get_scheduled_task_builder()[source]#

Get a scheduled task builder instance with which a scheduled task can be created.

Returns: An instance of streamsets.sdk.sch_models.ScheduledTaskBuilder.

get_self_managed_deployment_install_script(deployment, install_mechanism='DEFAULT')[source]#

Get install script for a Self Managed deployment.

Parameters

deployment (streamsets.sdk.sch_models.Deployment) – deployment object.
install_mechanism (str, optional) – Possible values for install are “DEFAULT”, “BACKGROUND” and “FOREGROUND”. Default: DEFAULT

Returns

An instance of streamsets.sdk.sch_api.Command.

get_snowflake_pipeline_defaults()[source]#

Get the Snowflake pipeline defaults for this user (if it exists).

Returns: A dict of Snowflake pipeline defaults.

get_snowflake_user_credentials()[source]#

Get the Snowflake user credentials (if they exist). They will be redacted.

Returns: A dict of Snowflake user credentials (redacted).

get_subscription_builder()[source]#

Get Event Subscription Builder.

Returns: An instance of streamsets.sdk.sch_models.SubscriptionBuilder.

get_topology_builder()[source]#

Get a topology builder instance with which a topology can be created.

Returns: An instance of streamsets.sdk.sch_models.TopologyBuilder.

get_user_builder()[source]#

Get a user builder instance with which a user can be created.

Returns: An instance of streamsets.sdk.sch_models.UserBuilder.

property groups#

Groups.

Returns: An instance of streamsets.sdk.sch_models.Groups.

import_jobs(archive, pipeline=True, number_of_instances=False, labels=False, runtime_parameters=False, **kwargs)[source]#

Import jobs from archived zip directory.

Parameters

archive (file) – file containing the jobs.
pipeline (boolean, optional) – Indicate if pipeline should be imported. Default: True.
number_of_instances (boolean, optional) – Indicate if number of instances should be imported. Default: False.
labels (boolean, optional) – Indicate if labels should be imported. Default: False.
runtime_parameters (boolean, optional) – Indicate if runtime parameters should be imported. Default: False.

Returns

A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.Job.

import_pipeline(pipeline, commit_message, name=None, data_collector_instance=None)[source]#

Import pipeline from json file.

Parameters

pipeline (dict) – A python dict representation of ControlHub Pipeline.
commit_message (str) – Commit message.
name (str, optional) – Name of the pipeline. If left out, pipeline name from JSON object will be used. Default None.
data_collector_instance (streamsets.sdk.sch_models.DataCollector) – If excluded, system sdc will be used. Default None.

Returns

An instance of streamsets.sdk.sch_models.Pipeline.

import_pipelines_from_archive(archive, commit_message, fragments=False)[source]#

Import pipelines from archived zip directory.

Parameters

archive (file) – file containing the pipelines.
commit_message (str) – Commit message.
fragments (bool, optional) – Indicates if pipeline contains fragments.

Returns

A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.Pipeline.

import_protection_policies(policies_archive)[source]#

Import protection policies from a compressed archive.

Parameters: policies_archive (file) – file containing the protection policies.
Returns: A py:class:streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.ProtectionPolicy.

import_topologies(archive, import_number_of_instances=False, import_labels=False, import_runtime_parameters=False, **kwargs)[source]#

Import topologies from archived zip directory.

Parameters

archive (file) – file containing the topologies.
import_number_of_instances (boolean, optional) – Indicate if number of instances should be imported. Default: False.
import_labels (boolean, optional) – Indicate if labels should be imported. Default: False.
import_runtime_parameters (boolean, optional) – Indicate if runtime parameters should be imported. Default: False.

Returns

A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.Topology.

invite_user(user)[source]#

Invite a user to the DataOps Platform.

Parameters: user (streamsets.sdk.sch_models.User) – User object.
Returns: An instance of streamsets.sdk.sch_api.Command.

property jobs#

Jobs.

Returns: An instance of streamsets.sdk.sch_models.Jobs.

property ldap_enabled#

Indication if LDAP is enabled or not.

Returns: An instance of boolean.

property legacy_deployments#

Deployments.

Returns: An instance of streamsets.sdk.sch_models.LegacyDeployments.

property login_audits#

Returns: An instance of streamsets.sdk.sch_models.LoginAudits.

property metering_and_usage#

Metering and usage for the Organization. By default, this will return a report for the last 30 days of metering data. A report for a custom time window can be retrieved by indexing this object with a slice that contains a datetime object for the start (required) and stop (optional, defaults to datetime.now()).

ex. metering_and_usage[datetime(2022, 1, 1):datetime(2022, 1, 14)]
metering_and_usage[datetime.now() - timedelta(7):]

Returns: An instance of streamsets.sdk.sch_models.MeteringUsage

property organizations#

Organizations.

Returns: An instance of streamsets.sdk.sch_models.Organizations.

property pipeline_labels#

Pipeline labels.

Returns: An instance of streamsets.sdk.sch_models.PipelineLabels.

property pipelines#

Pipelines.

Returns: An instance of streamsets.sdk.sch_models.Pipelines.

preview_classification_rule(classification_rule, parameter_data, data_collector=None)[source]#

Dynamic preview of a classification rule.

Parameters

classification_rule (streamsets.sdk.sch_models.ClassificationRule) – Classification Rule object.
parameter_data (dict) – A python dict representation of raw JSON parameters required for preview.
data_collector (streamsets.sdk.sch_models.DataCollector, optional) – The Data Collector in which to preview the pipeline. If omitted, ControlHub’s first executor SDC will be used. Default: None.

Returns

An instance of streamsets.sdk.sdc_api.PreviewCommand.

property protection_policies#

Protection policies.

Returns: An instance of streamsets.sdk.sch_models.ProtectionPolicies.

property provisioning_agents#

Provisioning Agents registered to the ControlHub instance.

Returns: An instance of streamsets.sdk.sch_models.ProvisioningAgents.

publish_pipeline(pipeline, commit_message='New pipeline', draft=False)[source]#

Publish a pipeline.

Parameters

pipeline (streamsets.sdk.sch_models.Pipeline) – Pipeline object.
commit_message (str, optional) – Default: 'New pipeline'.
draft (boolean, optional) – Default: False.

publish_scheduled_task(task)[source]#

Send the scheduled task to ControlHub.

Parameters: task (streamsets.sdk.sch_models.ScheduledTask) – Scheduled task object.
Returns: An instance of streamsets.sdk.sch_api.Command.

publish_topology(topology, commit_message=None)[source]#

Publish a topology.

Parameters

topology (streamsets.sdk.sch_models.Topology) – Topology object to publish.
commit_message (str, optional) – Commit message to supply with the Topology. Default: None

Returns

An instance of streamsets.sdk.sch_api.Command.

regenerate_api_credential_auth_token(api_credential)[source]#

Regenerate the auth token for an api credential.

Parameters: api_credential (streamsets.sdk.sch_models.ApiCredential) – ApiCredential object.
Returns: An instance of streamsets.sdk.sch_api.Command.

remove_administrator_role(user)[source]#

Remove the Organization Administrator status from the specified user.

Parameters: user (streamsets.sdk.sch_models.User) – User object.
Returns: An instance of streamsets.sdk.aster_api.Command.

rename_api_credential(api_credential)[source]#

Rename an api credential.

Parameters: api_credential (streamsets.sdk.sch_models.ApiCredential) – ApiCredential object.
Returns: An instance of streamsets.sdk.sch_api.Command.

property report_definitions#

Report Definitions.

Returns: An instance of streamsets.sdk.sch_models.ReportDefinitions.

reset_origin(*jobs)[source]#

Reset all pipeline offsets for given jobs.

Parameters: *jobs – One or more instances of streamsets.sdk.sch_models.Job.
Returns: A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.Job.

restart_engines(*engines)[source]#

Restart the given engine(s).

Parameters: *engines – One or more instances of streamsets.sdk.sch_models.DataCollector or streamsets.sdk.sch_models.Transformer.
Returns: An instance of streamsets.sdk.sch_api.Command.

run_pipeline_preview(pipeline, batches=1, batch_size=10, skip_targets=True, skip_lifecycle_events=True, timeout=120000, test_origin=False, read_policy=None, write_policy=None, executor=None, **kwargs)[source]#

Run pipeline preview.

Parameters

pipeline (streamsets.sdk.sch_models.Pipeline) – Pipeline object.
batches (int, optional) – Number of batches. Default: 1.
batch_size (int, optional) – Batch size. Default: 10.
skip_targets (bool, optional) – Skip targets. Default: True.
skip_lifecycle_events (bool, optional) – Skip life cycle events. Default: True.
timeout (int, optional) – Timeout. Default: 120000.
test_origin (bool, optional) – Test origin. Default: False.
read_policy (streamsets.sdk.sch_models.ProtectionPolicy) – Read Policy for preview. If not provided, uses default read policy if one available. Default: None.
write_policy (streamsets.sdk.sch_models.ProtectionPolicy) – Write Policy for preview. If not provided, uses default write policy if one available. Default: None.
executor (streamsets.sdk.sch_models.DataCollector, optional) – The Data Collector in which to preview the pipeline. If omitted, ControlHub’s first executor SDC will be used. Default: None.

Returns

An instance of streamsets.sdk.sdc_api.PreviewCommand.

run_snowflake_pipeline_preview(pipeline_id, rev=0, batches=1, batch_size=10, skip_targets=True, end_stage=None, timeout=2000, test_origin=False, stage_outputs_to_override_json=None, **kwargs)[source]#

Run Snowflake pipeline preview.

Parameters

pipeline_id (str) – The pipeline instance’s ID.
rev (int, optional) – Pipeline revision. Default: 0.
batches (int, optional) – Number of batches. Default: 1.
batch_size (int, optional) – Batch size. Default: 10.
skip_targets (bool, optional) – Skip targets. Default: True.
end_stage (str, optional) – End stage. Default: None.
timeout (int, optional) – Timeout. Default: 2000.
test_origin (bool, optional) – Test origin. Default: False
stage_outputs_to_override_json (str, optional) – Stage outputs to override. Default: None.
wait (bool, optional) – Wait for pipeline preview to finish. Default: True.

Returns

An instance of streamsets.sdk.sdc_api.PreviewCommand.

scale_legacy_deployment(deployment, num_instances)[source]#

Scale up/down active deployment.

Parameters

deployment (streamsets.sdk.sch_models.LegacyDeployment) – Deployment object.
num_instances (int) – Number of sdc instances.

Returns

An instance of streamsets.sdk.sch_api.Command.

property scheduled_tasks#

Scheduled Tasks.

Returns: An instance of streamsets.sdk.sch_models.ScheduledTasks.

set_default_read_protection_policy(protection_policy)[source]#

Set a default read protection policy.

Parameters: protection_policy –

:param (streamsets.sdk.sch_models.ProtectionPolicy): Protection :param Policy object to be set as the default read policy.:

Returns: An updated instance of streamsets.sdk.sch_models.ProtectionPolicy.

set_default_write_protection_policy(protection_policy)[source]#

Set a default write protection policy.

Parameters: protection_policy –

:param (streamsets.sdk.sch_models.ProtectionPolicy): Protection :param Policy object to be set as the default write policy.:

Returns: An updated instance of streamsets.sdk.sch_models.ProtectionPolicy.

shutdown_engines(*engines)[source]#

Call shutdown on the given engine(s).

Parameters: *engines – One or more instances of streamsets.sdk.sch_models.DataCollector or streamsets.sdk.sch_models.Transformer.
Returns: An instance of streamsets.sdk.sch_api.Command.

start_deployment(*deployments)[source]#

Start Deployments.

Parameters: *deployments – One or more instances of streamsets.sdk.sch_models.Deployment.

start_job(*jobs, wait=True, **kwargs)[source]#

Start one or more jobs.

Parameters

*jobs – One or more instances of streamsets.sdk.sch_models.Job.
wait (bool, optional) – Wait for pipelines to reach RUNNING status before returning. Default: True.
**kwargs – Arbitrary keyword arguments.

Returns

An instance of streamsets.sdk.sch_api.StartJobsCommand.

start_job_template(job_template, name=None, description=None, attach_to_template=True, delete_after_completion=False, instance_name_suffix='COUNTER', number_of_instances=1, parameter_name=None, raw_job_tags=None, runtime_parameters=None, wait_for_data_collectors=False)[source]#

Start Job instances from a Job Template.

Parameters

job_template (streamsets.sdk.sch_models.Job) – A Job instance with the property job_template set to True.
name (str, optional) – Name of the new job(s). Default: None. If not specified, name of the job template with ' copy' appended to the end will be used.
description (str, optional) – Description for new job(s). Default: None.
attach_to_template (bool, optional) – Default: True.
delete_after_completion (bool, optional) – Default: False.
instance_name_suffix (str, optional) – Suffix to be used for Job names in {‘COUNTER’, ‘TIME_STAMP’, ‘PARAM_VALUE’}. Default: COUNTER.
number_of_instances (int, optional) – Number of instances to be started using given parameters. Default: 1.
parameter_name (str, optional) – Specified when instance_name_suffix is ‘PARAM_VALUE’. Default: None.
raw_job_tags (list, optional) – Default: None.
runtime_parameters (dict) or (list) – Runtime Parameters to be used in the jobs. If a dict is specified, number_of_instances jobs will be started. If a list is specified, number_of_instances is ignored and job instances will be started using the elements of the list as Runtime Parameters for each job. If left out, Runtime Parameters from Job Template will be used. Default: None.
wait_for_data_collectors (bool, optional) – Default: False.

Returns

A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.Job instances.

start_legacy_deployment(deployment, **kwargs)[source]#

Start Deployment.

Parameters

deployment (streamsets.sdk.sch_models.LegacyDeployment) – Deployment instance.
wait (bool, optional) – Wait for deployment to start. Default: True.
wait_for_statuses (list, optional) – Deployment statuses to wait on. Default: ['ACTIVE'].

Returns

An instance of streamsets.sdk.sch_api.DeploymentStartStopCommand.

stop_deployment(*deployments)[source]#

Stop Deployments.

Parameters: *deployments – One or more instances of streamsets.sdk.sch_models.Deployment.

stop_job(*jobs, force=False, timeout_sec=300)[source]#

Stop one or more jobs.

Parameters

*jobs – One or more instances of streamsets.sdk.sch_models.Job.
force (bool, optional) – Force job to stop. Default: False.
timeout_sec (int, optional) – Timeout in secs. Default: 300.

stop_legacy_deployment(deployment, wait_for_statuses=['INACTIVE'])[source]#

Stop Deployment.

Parameters

deployment (streamsets.sdk.sch_models.LegacyDeployment) – Deployment instance.
wait_for_statuses (list, optional) – List of statuses to wait for. Default: ['INACTIVE'].

Returns

An instance of streamsets.sdk.sch_api.DeploymentStartStopCommand.

stop_test_pipeline_run(start_pipeline_command)[source]#

Stop the test run of pipeline.

Parameters: start_pipeline_command (streamsets.sdk.sdc_api.StartPipelineCommand) –
Returns: An instance of streamsets.sdk.sdc_api.StopPipelineCommand.

property subscription_audits#

Subscription audits.

Returns: A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.SubscriptionAudit.

property subscriptions#

Event Subscriptions.

Returns: An instance of streamsets.sdk.sch_models.Subscriptions.

sync_job(*jobs)[source]#

Sync one or more jobs.

Parameters: *jobs – One or more instances of streamsets.sdk.sch_models.Job.

test_pipeline_run(pipeline, reset_origin=False, parameters=None)[source]#

Test run a pipeline.

Parameters

pipeline (streamsets.sdk.sch_models.Pipeline) – Pipeline object.
reset_origin (boolean, optional) – Default: False.
parameters (dict, optional) – Pipeline parameters. Default: None.

Returns

An instance of streamsets.sdk.sdc_api.StartPipelineCommand.

property topologies#

Topologies.

Returns: An instance of streamsets.sdk.sch_models.Topologies.

property transformers#

Transformers registered to the ControlHub instance.

Returns: Returns a streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.Transformer instances.

update_connection(connection)[source]#

Update a connection.

Parameters: connection (streamsets.sdk.sch_models.Connection) – Connection object.
Returns: An instance of streamsets.sdk.sch_api.Command.

update_data_collector_labels(data_collector)[source]#

Update data collector labels.

Parameters: data_collector (streamsets.sdk.sch_models.DataCollector) – Data Collector object.
Returns: An instance of streamsets.sdk.sch_models.DataCollector.

update_data_collector_resource_thresholds(data_collector, max_cpu_load=None, max_memory_used=None, max_pipelines_running=None)[source]#

Updates data collector resource thresholds.

Parameters

data_collector (streamsets.sdk.sch_models.DataCollector) – Data Collector object.
max_cpu_load (float, optional) – Max CPU load in percentage. Default: None.
max_memory_used (int, optional) – Max memory used in MB. Default: None.
max_pipelines_running (int, optional) – Max pipelines running. Default: None.

Returns

An instance of streamsets.sdk.sch_api.Command.

update_deployment(deployment)[source]#

Update a deployment.

Parameters: deployment (streamsets.sdk.sch_models.Deployment) – deployment object.
Returns: An instance of streamsets.sdk.sch_api.Command.

update_environment(environment, timeout_sec=300)[source]#

Update an environment.

Parameters: environment (streamsets.sdk.sch_models.Environment) – environment object.
Returns: An instance of streamsets.sdk.sch_api.Command.

update_group(group)[source]#

Update a group.

Parameters: group (streamsets.sdk.sch_models.Group) – Group object.
Returns: An instance of streamsets.sdk.sch_api.Command.

update_job(job)[source]#

Update a job.

Parameters: job (streamsets.sdk.sch_models.Job) – Job object.
Returns: An instance of streamsets.sdk.sch_models.Job.

update_legacy_deployment(deployment)[source]#

Update a deployment.

Parameters: deployment (streamsets.sdk.sch_models.LegacyDeployment) – Deployment object.
Returns: An instance of streamsets.sdk.sch_api.Command.

update_organization(organization)[source]#

Update an organization.

Parameters: organization (streamsets.sdk.sch_models.Organization) – Organization instance.
Returns: An instance of streamsets.sdk.sch_api.Command.

update_pipelines_with_different_fragment_version(pipelines, from_fragment_version, to_fragment_version)[source]#

Update pipelines with latest pipeline fragment commit version.

Parameters

pipelines (list) – List of streamsets.sdk.sch_models.Pipeline instances.
from_fragment_version (streamsets.sdk.sch_models.PipelineCommit) – commit of fragment from which the pipeline needs to be updated.
to_fragment_version (streamsets.sdk.sch_models.PipelineCommit) – commit of fragment to which the pipeline needs to be updated.

Returns

An instance of streamsets.sdk.sch_api.Command

update_report_definition(report_definition)[source]#

Update an existing Report Definition.

Parameters: report_definition (streamsets.sdk.sch_models.ReportDefinition) – Report Definition instance.
Returns: An instance of streamsets.sdk.sch_api.Command.

update_snowflake_pipeline_defaults(account_url=None, database=None, warehouse=None, schema=None, role=None)[source]#

Create or update the Snowflake pipeline defaults for this user.

Parameters

account_url (str, optional) – Snowflake account url. Default: None
database (str, optional) – Snowflake database to query against. Default: None
warehouse (str, optional) – Snowflake warehouse. Default: None
schema (str, optional) – Schema used. Default: None
role (str, optional) – Role used. Default: None

Returns

An instance of streamsets.sdk.sch_api.Command.

update_snowflake_user_credentials(username, snowflake_login_type, password=None, private_key=None, role=None)[source]#

Create or update the Snowflake user credentials.

Parameters

username (str) – Snowflake account username.
snowflake_login_type (str) – Snowflake login type to use. Options are password and privateKey.
password (str, optional) – Snowflake account password. Default: None
private_key (str, optional) – Snowflake account private key. Default: None
role (str, optional) – Snowflake role of the account. Default: None

Returns

An instance of streamsets.sdk.sch_api.Command.

update_subscription(subscription)[source]#

Update an existing Subscription.

Parameters: subscription (streamsets.sdk.sch_models.Subscription) – A Subscription instance.
Returns: An instance of streamsets.sdk.sch_api.Command.

update_user(user)[source]#

Update a user. Some user attributes are updated by ControlHub such as: last_modified_by, last_modified_on.

Parameters: user (streamsets.sdk.sch_models.User) – User object.
Returns: An instance of streamsets.sdk.sch_api.Command.

upgrade_job(*jobs)[source]#

Upgrade job(s) to latest pipeline version.

Parameters: *jobs – One or more instances of streamsets.sdk.sch_models.Job.
Returns: A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.Job.

upload_offset(job, offset_file=None, offset_json=None)[source]#

Upload offset for given job.

Parameters

job (streamsets.sdk.sch_models.Job) – Job object.
offset_file (file, optional) – File containing the offsets. Default: None. Exactly one of offset_file, offset_json should specified.
offset_json (dict, optional) – Contents of offset. Default: None. Exactly one of offset_file, offset_json should specified.

Returns

An instance of streamsets.sdk.sch_models.Job.

property users#

Users.

Returns: An instance of streamsets.sdk.sch_models.Users.

validate_pipeline(pipeline)[source]#

Validate pipeline. Only Transformer for Snowflake pipelines are supported at this time.

Parameters

pipeline (streamsets.sdk.sch_models.Pipeline) – Pipeline instance.

Raises

streamsets.sdk.exceptions.ValidationError – If validation fails.
NotImplementedError – If used for a pipeline that isn’t a Transformer for Snowflake pipeline.

verify_connection(connection)[source]#

Verify connection.

Parameters: connection (streamsets.sdk.sch_models.Connection) – Connection object.
Returns: An instance of streamsets.sdk.sch_models.ConnectionVerificationResult.

wait_for_job_metric(job, metric, value, timeout_sec=200)[source]#

Block until a job’s realtime summary reaches the desired value for the desired metric.

Parameters

job (streamsets.sdk.sch_models.Job) – The job instance.
metric (str) – The desired metric (e.g. 'output_record_count' or 'input_record_count').
value (int) – The desired value to wait for.
timeout (int, optional) – Timeout to wait for metric to reach value, in seconds. Default: streamsets.sdk.sch.DEFAULT_WAIT_FOR_METRIC_TIMEOUT.

Raises

TimeoutError – If timeout passes without metric reaching value.

wait_for_job_metrics_record_count(job, count, timeout_sec=200)[source]#

Block until a job’s metrics reaches the desired count.

Parameters

job (streamsets.sdk.sch_models.Job) – The job instance.
count (int) – The desired value to wait for.
timeout_sec (int, optional) – Timeout to wait for metric to reach count, in seconds. Default: streamsets.sdk.sch.DEFAULT_WAIT_FOR_METRIC_TIMEOUT.

Raises

TimeoutError – If timeout_sec passes without metric reaching value.

wait_for_job_status(job, status, timeout_sec=300)[source]#

Block until a job reaches the desired status.

Parameters

job (streamsets.sdk.sch_models.Job) – The job instance.
status (str) – The desired status to wait for.
timeout_sec (int, optional) – Timeout to wait for job to reach status, in seconds. Default: streamsets.sdk.sch.DEFAULT_WAIT_FOR_STATUS_TIMEOUT.

Raises

TimeoutError – If timeout_sec passes without job reaching status.

Models#

These models wrap and provide useful functionality for interacting with common SCH abstractions.

ACLs#

class streamsets.sdk.sch_models.ACL(acl, control_hub)[source]#

Represents an ACL.

Parameters

acl (dict) – JSON representation of an ACL.
control_hub (streamsets.sdk.sch.ControlHub) – ControlHub object.

permissions#

A Collection of Permissions.

Type: streamsets.sdk.sch_models.Permissions

add_permission(permission)[source]#

Add new permission to the ACL.

Parameters: permission (streamsets.sdk.sch_models.Permission) – A permission object.
Returns: An instance of streamsets.sdk.sch_api.Command

property permission_builder#

Get a permission builder instance with which a pipeline can be created.

Returns: An instance of streamsets.sdk.sch_models.ACLPermissionBuilder.

remove_permission(permission)[source]#

Remove a permission from ACL.

Parameters: permission (streamsets.sdk.sch_models.Permission) – A permission object.
Returns: An instance of streamsets.sdk.sch_api.Command

class streamsets.sdk.sch_models.ACLPermissionBuilder(permission, acl)[source]#

Class to help build the ACL permission.

Parameters

permission (streamsets.sdk.sch_models.Permission) – A permission object.
acl (streamsets.sdk.sch_models.ACL) – An ACL object.

build(subject_id, subject_type, actions)[source]#

Method to help build the ACL permission.

Parameters

subject_id (str) – Id of the subject e.g. ‘test@test’.
subject_type (str) – Type of the subject e.g. ‘USER’.
actions (list) – A list of actions of type str e.g. [‘READ’, ‘WRITE’, ‘EXECUTE’].

Returns

An instance of streamsets.sdk.sch_models.Permission.

class streamsets.sdk.sch_models.Permission(permission, resource_type, api_client)[source]#

A container for a permission.

Parameters

permission (dict) – A Python object representation of a permission.
resource_type (str) – String representing the type of resource e.g. ‘JOB’, ‘PIPELINE’.
api_client (streamsets.sdk.sch_api.ApiClient) – An instance of ApiClient.

resource_id#

Id of the resource e.g. Pipeline or Job.

Type: str

subject_id#

Id of the subject e.g. user id 'admin@admin'.

Type: str

subject_type#

Type of the subject e.g. 'USER'.

Type: str

last_modified_by#

User who last modified this permission e.g. 'admin@admin'.

Type: str

last_modified_on#

Timestamp at which this permission was last modified e.g. 1550785079811.

Type: int

Alerts#

class streamsets.sdk.sch_models.Alert(alert, control_hub)[source]#

Model for Alerts.

message#

The Alert’s message text.

Type: str

alert_status#

The status of the Alert.

Type: str

Parameters

alert (dict) – JSON representation of an Alert.
control_hub (streamsets.sdk.ControlHub) – Control Hub instance.

acknowledge()[source]#: Acknowledge an active Alert.

delete()[source]#: Delete an acknowledged Alert.

Classifiers#

class streamsets.sdk.sch_models.Classifier(classifier)[source]#

Classifier model.

Parameters: classifier (dict) – A Python dict representation of classifier.

Classification Rules#

class streamsets.sdk.sch_models.ClassificationRule(classification_rule, classifiers)[source]#

Classification Rule Model.

Parameters

classification_rule (dict) – A Python dict representation of classification rule.
classifiers (list) – A list of streamsets.sdk.sch_models.Classifier instances.

class streamsets.sdk.sch_models.ClassificationRuleBuilder(classification_rule, classifier)[source]#

Class with which to build instances of streamsets.sdk.sch_models.ClassificationRule.

Instead of instantiating this class directly, most users should use: streamsets.sdk.sch.ControlHub.get_classification_rule_builder().

Parameters

classification_rule (dict) – Python object defining a classification rule.
classifier (dict) – Python object defining a classifier.

add_classifier(patterns=None, match_with=None, regular_expression_type='RE2/J', case_sensitive=False)[source]#

Add classifier to the classification rule.

Parameters

patterns (list, optional) – List of strings of patterns. Default: None.
match_with (str, optional) – Default: None.
regular_expression_type (str, optional) – Default: 'RE2/J'.
case_sensitive (bool, optional) – Default: False.

Returns

An instance of streamsets.sdk.sch_models.Classifier.

build(name, category, score)[source]#

Build the classification rule.

Parameters

name (str) – Classification Rule name.
category (str) – Classification Rule category.
score (float) – Classification Rule score.

Returns

An instance of streamsets.sdk.sch_models.ClassificationRule.

Connections#

class streamsets.sdk.sch_models.Connection(connection, control_hub)[source]#

Model for connection.

Parameters

connection (dict) – A Python object representation of Connection.
control_hub (streamsets.sdk.sch.ControlHub) – ControlHub object.

property acl#

Get Connection ACL.

Returns: An instance of streamsets.sdk.sch_models.ACL.

add_tag(*tags)[source]#

Add a tag

Parameters: *tags – One or more instances of str

property pipeline_commits#

Get the pipeline commits using this connection.

Returns

A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.PipelineCommit: instances.

remove_tag(*tags)[source]#

Remove a tag

Parameters: *tags – One or more instances of str

property tags#

Get the connection tags.

Returns: A streamsets.sdk.utils.SeekableList of instances of streamsets.sdk.sch_models.Tag.

class streamsets.sdk.sch_models.Connections(control_hub, organization)[source]#

Collection of streamsets.sdk.sch_models.Connection instances.

Parameters

control_hub – An instance of streamsets.sdk.sch.ControlHub.
organization (str) – Organization Id.

class streamsets.sdk.sch_models.ConnectionBuilder(connection, control_hub)[source]#

Class with which to build instances of streamsets.sdk.sch_models.Connection.

Instead of instantiating this class directly, most users should use: streamsets.sdk.sch.ControlHub.get_connection_builder().

Parameters

connection (dict) – Python object built from our Swagger ConnectionJson definition.
control_hub (streamsets.sdk.sch.ControlHub) – ControlHub object.

build(title, connection_type, authoring_data_collector, tags=None)[source]#

Define the connection.

Parameters

title (str) – Connection title.
connecion_type (str) – Type of connection.
authoring_data_collector (streamsets.sdk.sch.DataCollector) – Authoring Data Collector.
tags (list, optional) – List of tags (strings). Default: None.

Returns

An instance of streamsets.sdk.sch_models.Connection.

class streamsets.sdk.sch_models.ConnectionVerificationResult(connection_preview_json)[source]#

Model for connection verification result.

Parameters: connection_preview_json (dict) – dynamic preview API response JSON

property issue_count#

The count of the number of issues for the connection verification result.

Returns: A int that represents the number of issues.

property issue_message#

The message provided for the connection verification result.

Returns: A str message detailing the response for the connection verification result.

DataCollectors#

class streamsets.sdk.sch_models.DataCollector(data_collector, control_hub)[source]#

Model for Data Collector.

execution_mode#

True for Edge and False for SDC.

Type: bool

id#

Data Collectort id.

Type: str

labels#

Labels for Data Collector.

Type: list

last_validated_on#

Last validated time for Data Collector.

Type: str

reported_labels#

Reported labels for Data Collector.

Type: list

url#

Data Collector’s url.

Type: str

version#

Data Collector’s version.

Type: str

property accessible#: Returns a bool for whether the Data Collector instance is accessible.

property acl#

Get DataCollector ACL.

Returns: An instance of streamsets.sdk.sch_models.ACL.

property attributes#: Returns a dict of Data Collector attributes.

property pipelines_committed#

ControlHub Job IDs that are about to be started but have no corresponding pipeline status yet.

Returns: A list of Job IDs (str objects).

property resource_thresholds#

Return DataCollector resource thresholds.

Returns

A dict of DataCollector thresholds named as “max_memory_used”, “max_cpu_load”: and “max_pipelines_running”

property responding#: Returns a bool for whether the Data Collector instance is responding.

Deployments#

class streamsets.sdk.sch_models.AzureVMDeployment(deployment, control_hub=None, **kwargs)[source]#

Model for Azure VM Deployment.

attach_public_ip#

Type: bool

azure_tags#

Azure tags to apply to all provisioned resources in this Azure deployment.

Type: dict

key_pair_name#

SSH key pair.

Type: str

managed_identity#

Type: str

public_ssh_key#

Type: str

resource_group#

Type: str

ssh_key_source#

SSH key source.

Type: str

vm_size#

Type: str

zones#

Type: list

static get_ui_value_mappings()[source]#: This method returns a map for the values shown in the UI and internal constants to be used.

is_complete()[source]#: Checks if all required fields are set in this deployment.

class streamsets.sdk.sch_models.Deployment(deployment, control_hub=None, **kwargs)[source]#

Model for Deployment. This is an abstract class.

created_by#

User that created this deployment.

Type: str

created_on#

Time at which this deployment was created.

Type: int

deployment_id#

Id of the deployment.

Type: str

deployment_name#

Name of the deployment.

Type: str

deployment_tags#

Raw deployment tags of the deployment.

Type: list

deployment_type#

Type of the deployment.

Type: str

desired_instances#

Type: int

engine_configuration#

Type: DeploymentEngineConfiguration

engine_type#

Type: str

engine_version#

Type: str

environment_id#

Enabled environment where engine will be deployed.

Type: list

last_modified_by#

User that last modified this deployment.

Type: str

last_modified_on#

Time at which this deployment was last modified.

Type: int

network_tags#

Type: list

scala_binary_version#

scala Binary Version.

Type: str

organization#

Type: str

json_state#

State of the deployment - this is json field called state.

Type: str

state#

State of the deployment - displayed as state on UI.

Type: str

status#

Status of the deployment.

Type: str

status_detail#

Status detail of the deployment.

Type: str

class ENGINE_TYPES(value)[source]#: An enumeration.

class TYPES(value)[source]#: An enumeration.

property acl#

Get deployment ACL.

Returns: An instance of streamsets.sdk.sch_models.ACL.

property engine_configuration#: The engine configuration of deployment engine. :returns: An instance of streamsets.sdk.sch_models.DeploymentEngineConfiguration.

property tags#

Get the tags.

Returns: A streamsets.sdk.utils.SeekableList of str instances.

class streamsets.sdk.sch_models.DeploymentBuilder(deployment, control_hub)[source]#

Class with which to build instances of streamsets.sdk.sch_models.Deployment.

Instead of instantiating this class directly, most users should use: streamsets.sdk.sch.ControlHub.get_deployment_builder().

Parameters

deployment (dict) – Python object built from our Swagger CspDeploymentJson definition.
control_hub (streamsets.sdk.sch.ControlHub) – ControlHub object.

build(deployment_name, engine_type, engine_version, environment, **kwargs)[source]#

Define the deployment.

Parameters

deployment_name (str) – deployment name.
engine_type (str) – Type of engine to deploy.
engine_version (str) – Version of engine to deploy.
environment (str) – ID of the enabled environment where the engine will be deployed.
deployment_tags (list, optional) – List of tags (strings). Default: None.
engine_build (str, optional) – Build of engine to deploy. Default: None.
scala_binary_version (str, optional) – Scala binary version required in case of engine_type=’TF’ Default: None.

Returns

An instance of subclass of streamsets.sdk.sch_models.Deployment.

class streamsets.sdk.sch_models.DeploymentEngineAdvancedConfiguration(deployment_engine_advanced_config, deployment=None)[source]#: Model for advanced configuration in Deployment engine configuration.

class streamsets.sdk.sch_models.DeploymentEngineConfiguration(deployment_engine_config, deployment=None, control_hub=None)[source]#

Model for Deployment engine configuration.

advanced_configuration#

Type: str

aws_image_id#

Type: str

azure_image_id#

Type: str

created_by#

User that created this deployment.

Type: str

created_on#

Time at which this deployment was created.

Type: int

download_url#

Type: list

engine_type#

engine type.

Type: list

engine_version#

Engine version to deploy.

Type: list

id#

User that created this deployment.

Type: str

gcp_image_id#

Type: str

last_modified_by#

User that last modified this deployment.

Type: str

last_modified_on#

Time at which this deployment was last modified.

Type: int

scala_binary_version#

scala Binary Version.

Type: str

class streamsets.sdk.sch_models.DeploymentEngineConfigurations(control_hub)[source]#

Collection of streamsets.sdk.sch_models.DeploymentEngineConfiguration instances.

Parameters: control_hub – An instance of streamsets.sdk.sch.ControlHub.

class streamsets.sdk.sch_models.DeploymentEngineJavaConfiguration(deployment_engine_java_configuration, deployment=None)[source]#

Model for Deployment engine Java configuration.

id#

Type: str

java_options#

Type: str

java_memory_strategy (: object:ABSOLUTE/PERCENTAGE)

maximum_java_heap_size_in_mb#

Type: long

minimum_java_heap_size_in_mb#

Type: long

maximum_java_heap_size_in_percent#

Type: int

minimum_java_heap_size_in_percent#

Type: int

class streamsets.sdk.sch_models.DeploymentStageLibraries(values, deployment_config)[source]#

Wrapper class for the list of stage libraries in a deployment.

Parameters

values (list) – A list of stage library names.
deployment_config (streamsets.sdk.sch_models.DeploymentEngineConfiguration) – The deployment engine configuration object that these stage libraries pertain to.

append(value)[source]#: Append object to the end of the list.

extend(libs)[source]#: Extend list by appending elements from the iterable.

remove(value)[source]#

Remove first occurrence of value.

Raises ValueError if the value is not present.

class streamsets.sdk.sch_models.EC2Deployment(deployment, control_hub=None, **kwargs)[source]#

Model for EC2 Deployment.

ec2_instance_type#

Type EC2 engine instance.

Type: str

engine_instances#

No. of EC2 engine instances.

Type: int

instance_profile#

arn of instance profile created as an environment prerequisite.

Type: str

key_pair_name#

SSH key pair.

Type: str

aws_tags#

AWS tags to apply to all provisioned resources in this AWS deployment.

Type: dict

ssh_key_source#

SSH key source.

Type: str

tracking_url#

Type: str

is_complete()[source]#: Checks if all required fields are set in this deployment.

class streamsets.sdk.sch_models.GCEDeployment(deployment, control_hub=None, **kwargs)[source]#

Model for GCE Deployment.

block_project_ssh_keys#

Blocks the use of project-wide public SSH keys to access the provisioned VM instances.

Type: bool

engine_instances#

Number of engine instances to deploy.

Type: int

gcp_labels#

GCP labels to apply to all provisioned resources in your GCP project.

Type: dict

instance_service_account#

Instance service account.

Type: str

machine_type#

machine type to use for the provisioned VM instances.

Type: str

public_ssh_key#

full contents of public SSH key associated with each VM instance.

Type: str

region#

Region to provision VM instances in.

Type: str

tracking_url#

Tracking URL.

Type: str

zone#

GCP zone

Type: list

is_complete()[source]#: Checks if all required fields are set in this deployment.

class streamsets.sdk.sch_models.SelfManagedDeployment(deployment, control_hub=None, **kwargs)[source]#

Model for self managed Deployment.

install_script()[source]#

Gets the installation script to be run for this deployment.

Returns: An str instance of the installation command.

is_complete()[source]#: Checks if all required fields are set in this deployment.

Environments#

class streamsets.sdk.sch_models.AWSEnvironment(environment, control_hub=None, **kwargs)[source]#

Model for AWS Environment.

access_key_id#

AWS access key id to access AWS account. Required when credential_type is AWS_STATIC_KEYS.

Type: str

aws_tags#

AWS tags to apply to all provisioned resources in this AWS deployment.

Type: dict

credentials#

Credentials of the environment.

Type: dict

default_instance_profile#

aarn of default instance profile created as an environment prerequisite.

Type: dict

region#

AWS region where VPC is located in case of AWS environment.

Type: str

role_arn#

Role AR of the cross-account role created as an environment prerequisite in user’s AWS account. Required when credential_type is AWS_CROSS_ACCOUNT_ROLE.

Type: str

secret_access_key_id#

AWS secret access key to access AWS account. Required when credential_type is AWS_STATIC_KEYS.

Type: str

subnet_ids#

Range of Subnet IDs in the VPC.

Type: str

security_group_id#

Security group ID.

Type: str

vpc_id#

Id of the Amazon VPC created as an environment prerequisite in user’s AWS account.

Type: str

static get_ui_value_mappings()[source]#: This method returns a map for the values shown in the UI and internal constants to be used.

is_complete()[source]#: Checks if all required fields are set in this environment.

class streamsets.sdk.sch_models.AzureEnvironment(environment, control_hub=None, **kwargs)[source]#

Model for Azure Environment.

azure_tags#

Azure tags for this environment.

Type: dict

credential_type#

Credential type of the environment.

Type: str

credentials#

Credentials of the environment.

Type: dict

default_managed_identity#

default managed identity.

Type: str

default_resource_group#

default resource group.

Type: str

region#

Azure region where vnet is located.

Type: str

subnet_id#

Subnet ID in the vnet.

Type: str

security_group_id#

Security group ID.

Type: str

vnet_id#

Id of the vnet.

Type: str

static get_ui_value_mappings()[source]#: This method returns a map for the values shown in the UI and internal constants to be used.

is_complete()[source]#: Checks if all required fields are set in this environment.

class streamsets.sdk.sch_models.Environment(environment, control_hub=None, **kwargs)[source]#

Model for Environment. This is an abstract class.

allow_nightly_engine_builds#

Type: bool

created_by#

User that created this environment.

Type: str

created_on#

Time at which this environment was created.

Type: int

environment_id#

Id of the environment.

Type: str

environment_name#

Name of the environment.

Type: str

environment_tags#

Raw environment tags of the environment.

Type: list

environment_type#

Type of the environment.

Type: str

last_modified_by#

User that last modified this environment.

Type: str

last_modified_on#

Time at which this environment was last modified.

Type: int

json_state#

State of the environment - which is stored in json field called state.

Type: str

state#

State of the environment stateDisplayLabel - shown on the UI as state.

Type: str

status#

Status of the environment.

Type: str

status_detail#

Status detail of the environment.

Type: str

class TYPES(value)[source]#: An enumeration.

property acl#

Get environment ACL.

Returns: An instance of streamsets.sdk.sch_models.ACL.

property tags#

Get the tags.

Returns: A streamsets.sdk.utils.SeekableList of str instances.

class streamsets.sdk.sch_models.EnvironmentBuilder(environment, control_hub)[source]#

Class with which to build instances of streamsets.sdk.sch_models.Environment.

Instead of instantiating this class directly, most users should use: streamsets.sdk.sch.ControlHub.get_environment_builder().

Parameters

environment (dict) – Python object built from our Swagger CspEnvironmentJson definition.
control_hub (streamsets.sdk.sch.ControlHub) – ControlHub object.

build(environment_name, **kwargs)[source]#

Define the environment.

Parameters

environment_name (str) – environment name.
allow_nightly_engine_builds (bool, optional) – Default: False.
environment_tags (list, optional) – List of tags (strings). Default: None.

Returns

An instance of subclass of streamsets.sdk.sch_models.Environment.

class streamsets.sdk.sch_models.GCPEnvironment(environment, control_hub=None, **kwargs)[source]#

Model for GCP Environment.

credentials#

Credentials of the environment.

Type: dict

account_key_json#

JSON file contents for the Service Account Key. Required when credential_type is GCP_SERVICE_ACCOUNT_KEY.

Type: str

project#

Project where VPC is located.

Type: str

gcp_labels#

GCP labels for this environment.

Type: dict

gcp_tags#

GCP tags to apply to all resources provisioned in GCP account.

Type: str

service_account_email#

Service account ID. Required when credential_type is GCP_SERVICE_ACCOUNT_IMPERSONATION.

Type: str

vpc_id#

Id of the GCP VPC created as an environment prerequisite in user’s GCP account.

Type: str

fetch_available_machine_types(zone_id)[source]#: Returns the available machine types for the given Environment and GCP Project and GCP Zone.

fetch_available_projects()[source]#: Returns the available projects for the given Environment.

fetch_available_regions()[source]#: Returns the available regions for the given Environment and GCP Project.

fetch_available_service_accounts()[source]#: Returns the available service accounts for the given Environment and GCP Project.

fetch_available_vpcs()[source]#: Returns the available Networks for the given Environment and GCP Project. Here it is called vpcs since on UI it is shown as VPC.

fetch_available_zones(region_id)[source]#: Returns the available zones for the given environment and GCP project and GCP region.

static get_ui_value_mappings()[source]#: This method returns a map for the values shown in the UI and internal constants to be used.

is_complete()[source]#: Checks if all required fields are set in this environment.

class streamsets.sdk.sch_models.SelfManagedEnvironment(environment, control_hub=None, **kwargs)[source]#: Model for Self Managed Environment.

Transformers#

class streamsets.sdk.sch_models.Transformer(transformer, control_hub)[source]#

Model for Transformer.

execution_mode#

Type: str

id#

Transformer id.

Type: str

labels#

Labels for Transformer.

Type: list

last_validated_on#

Last validated time for Transformer.

Type: str

reported_labels#

Reported labels for Transformer.

Type: list

url#

Transformer’s url.

Type: str

version#

Transformer’s version.

Type: str

property accessible#: Returns a bool for whether the Transformer instance is accessible.

property acl#

Get Transformer ACL.

Returns: An instance of streamsets.sdk.sch_models.ACL.

property attributes#: Returns a dict of Transformer attributes.

Group#

class streamsets.sdk.sch_models.Group(group, roles)[source]#

Model for Group.

Parameters

group (dict) – A Python object representation of Group.
roles (dict) – A mapping of role IDs to role labels.

class streamsets.sdk.sch_models.Groups(control_hub, roles, organization)[source]#

Collection of streamsets.sdk.sch_models.Group instances.

Parameters

control_hub – An instance of streamsets.sdk.sch.ControlHub.
roles (dict) – A mapping of role IDs to role labels.
organization (str) – Organization ID.

class streamsets.sdk.sch_models.GroupBuilder(group, roles, control_hub=None)[source]#

Class with which to build instances of streamsets.sdk.sch_models.Group.

Instead of instantiating this class directly, most users should use: streamsets.sdk.sch.ControlHub.get_group_builder().

Parameters

group (dict) – Python object built from our Swagger GroupJson definition.
roles (dict) – A mapping of role IDs to role labels.
control_hub (streamsets.sdk.sch.ControlHub, optional) – ControlHub object. Default: None

build(display_name, group_id=None, ldap_groups=None)[source]#

Build the group.

Parameters

display_name (str) – Group display name.
group_id (str, optional) – Group ID. Can only include letters, numbers, underscores and periods. Default: None.
ldap_groups (list, optional) – List of LDAP groups (strings).

Returns

An instance of streamsets.sdk.sch_models.Group.

Jobs#

class streamsets.sdk.sch_models.Job(job, control_hub=None)[source]#

Model for Job.

archived#

Flag that indicates if this job is archived.

Type: bool

commit_id#

Pipeline commit id.

Type: str

commit_label#

Pipeline commit label.

Type: str

created_by#

User that created this job.

Type: str

created_on#

Time at which this job was created.

Type: int

data_collector_labels#

Labels of the data collectors.

Type: list

delete_after_completion#

Flag that indicates if this job should be deleted after completion.

Type: bool

description#

Job description.

Type: str

destroyer#

Job destroyer.

Type: str

enable_failover#

Flag that indicates if failover is enabled.

Type: bool

enable_time_series_analysis#

Flag that indicates if time series is enabled.

Type: bool

execution_mode#

True for Edge and False for SDC.

Type: bool

job_deleted#

Flag that indicates if this job is deleted.

Type: bool

job_id#

Id of the job.

Type: str

job_name#

Name of the job.

Type: str

last_modified_by#

User that last modified this job.

Type: str

last_modified_on#

Time at which this job was last modified.

Type: int

number_of_instances#

Number of instances.

Type: int

pipeline_force_stop_timeout#

Timeout for Pipeline force stop.

Type: int

pipeline_id#

Id of the pipeline that is running the job.

Type: str

pipeline_name#

Name of the pipeline that is running the job.

Type: str

pipeline_rule_id#

Rule Id of the pipeline that is running the job.

Type: str

read_policy#

Read Policy of the job.

Type: streamsets.sdk.sch_models.ProtectionPolicy

runtime_parameters#

Run-time parameters of the job.

Type: str

static_parameters#

List of parameters wjat cannot be overriden.

Type: list

statistics_refresh_interval_in_millisecs#

Refresh interval for statistics in milliseconds.

Type: int

status#

Status of the job.

Type: string

template_run_history_list#

List of job template run history.

Type: list

write_policy#

Write Policy of the job.

Type: streamsets.sdk.sch_models.ProtectionPolicy

property acl#

Get job ACL.

Returns: An instance of streamsets.sdk.sch_models.ACL.

add_tag(*tags)[source]#

Add a tag

Parameters: *tags – One or more instances of str

property commit#

Get pipeline commit of the job.

Returns: An instance of streamsets.sdk.sch_models.PipelineCommit.

property committed_offsets#

Get the committed offsets for a given job id.

Returns: An instance of streamsets.sdk.sch_models.JobCommittedOffset.

property metrics#

The metrics from all runs of a Job.

Returns: A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.JobMetrics instances.

property pipeline#: Get the pipeline object corresponding to this job.

remove_tag(*tags)[source]#

Remove a tag

Parameters: *tags – One or more instances of str

property system_job#

Get the sytem Job for this job if exists.

Returns: An instance of streamsets.sdk.sch_models.Job.

property tag#

Get pipeline tag of the job.

Returns: An instance of streamsets.sdk.sch_models.PipelineTag.

property tags#

Get the job tags.

Returns: A streamsets.sdk.utils.SeekableList of instances of streamsets.sdk.sch_models.Tag.

time_series_metrics(metric_type, time_filter_condition='LAST_5M', **kwargs)[source]#

Get historic time series metrics for the job.

Parameters

metric_type (str) – metric type in {‘Record Count Time Series’, ‘Record Throughput Time Series’, ‘Batch Throughput Time Series’, ‘Stage Batch Processing Timer seconds’}.
time_filter_condition (str, optional) – Default: 'LAST_5M'.

Returns

An instance of streamsets.sdk.sch_models.JobTimeSeriesMetrics.

class streamsets.sdk.sch_models.Jobs(control_hub)[source]#

Collection of streamsets.sdk.sch_models.Job instances.

Parameters: control_hub (streamsets.sdk.sch.ControlHub) – ControlHub object.

count(status)[source]#

Get job counts by status.

Parameters: status (str) – Status of the jobs in {‘ACTIVE’, ‘INACTIVE’, ‘ACTIVATING’, ‘DEACTIVATING’, ‘INACTIVE_ERROR’, ‘ACTIVE_GREEN’, ‘ACTIVE_RED’, ‘’}
Returns: An instance of int indicating the count of jobs with specified status.

class streamsets.sdk.sch_models.JobBuilder(job, control_hub)[source]#

Class with which to build instances of streamsets.sdk.sch_models.Job.

Instead of instantiating this class directly, most users should use: streamsets.sdk.sch.ControlHub.get_job_builder().

Parameters: job (dict) – Python object built from our Swagger JobJson definition.

build(job_name, pipeline, job_template=False, runtime_parameters=None, pipeline_commit=None, pipeline_tag=None, pipeline_commit_or_tag=None, tags=None)[source]#

Build the job.

Parameters

job_name (str) – Name of the job.
pipeline (streamsets.sdk.sch_models.Pipeline) – Pipeline object.
job_template (boolean, optional) – Indicate if it is a Job Template. Default: False.
runtime_parameters (dict, optional) – Runtime Parameters for the Job or Job Template. Default: None.
pipeline_commit (streamsets.sdk.sch_models.PipelineCommit) – Default: ``None`, which resolves to the latest pipeline commit.
pipeline_tag (streamsets.sdk.sch_models.PipelineTag) – Default: ``None`, which resolves to the latest pipeline tag.
pipeline_commit_or_tag (str, optional) – Default: None, which resolves to the latest pipeline commit.
tags (list, optional) – Job tags. Default: None.

Returns

An instance of streamsets.sdk.sch_models.Job.

class streamsets.sdk.sch_models.JobMetrics(metrics)[source]#

Model for job metrics.

error_count#

The number of error records generated by this run of the Job.

Type: int

error_records_per_sec#

The number of error records per second generated by this run of the Job.

Type: float

input_count#

The number of records ingested by this run of the Job.

Type: int

input_records_per_sec#

The number of records per second ingested by this run of the Job.

Type: float

output_count#

The number of records output by this run of the Job.

Type: int

output_records_per_sec#

The number of records output per second by the run of the Job.

Type: float

pipeline_version#

The version of the pipeline that was used in this Job run.

Type: str

run_count#

The count corresponding to this Job run.

Type: int

sdc_id#

The ID of the SDC instance on which this Job run was executed.

Type: str

stage_errors_count#

The number of stage error records generated by this run of the Job.

Type: int

stage_error_records_per_sec#

The number of stage error records generated per second by this run of the job.

Type: float

total_error_count#

The total number of both error records and stage errors generated by this run of the job.

Type: int

Parameters: metrics (dict) – Metrics counts in JSON format.

class streamsets.sdk.sch_models.JobOffset(offset)[source]#

Model for offset.

Parameters: offset (dict) – Offset in JSON format.

class streamsets.sdk.sch_models.JobRunEvent(event)[source]#

Model for an event in a Job Run.

Parameters: event (dict) – Job Run Event in JSON format.

class streamsets.sdk.sch_models.JobStatus(status, control_hub, **kwargs)[source]#

Model for Job Status.

run_history#

(streamsets.sdk.utils.JobRunHistoryEvent): History of a particular job run.

Type: streamsets.sdk.utils.SeekableList

offsets#

(streamsets.sdk.utils.JobPipelineOffset): Offsets after the job run.

Type: streamsets.sdk.utils.SeekableList

Parameters

status (dict) – Job status in JSON format.
control_hub (streamsets.sdk.sch.ControlHub) – ControlHub object.

class streamsets.sdk.sch_models.JobTimeSeriesMetric(metric, metric_type)[source]#

Model for job metrics.

name#

Name of measurement.

Type: str

values#

Timeseries data.

Type: list

time_series#

Timeseries data with timestamp as key and metric value as value.

Type: dict

Parameters: metric (dict) – Metrics in JSON format.

class streamsets.sdk.sch_models.JobTimeSeriesMetrics(metrics, metric_type)[source]#

Model for job metrics.

input_records#

Appears when queried for ‘Record Count Time Series’ or ‘Record Throughput Time Series’.

Type: streamsets.sdk.sch_models.JobTimeSeriesMetric

output_records#

Appears when queried for ‘Record Count Time Series’ or ‘Record Throughput Time Series’.

Type: streamsets.sdk.sch_models.JobTimeSeriesMetric

error_records#

Appears when queried for ‘Record Count Time Series’ or ‘Record Throughput Time Series’.

Type: streamsets.sdk.sch_models.JobTimeSeriesMetric

batch_counter#

Appears when queried for ‘Batch Throughput Time Series’.

Type: streamsets.sdk.sch_models.JobTimeSeriesMetric

batch_processing_timer#

Appears when queried for ‘Batch Processing Timer seconds’.

Type: streamsets.sdk.sch_models.JobTimeSeriesMetric

Parameters: metrics (dict) – Metrics in JSON format.

class streamsets.sdk.sch_models.RuntimeParameters(runtime_parameters, job)[source]#

Wrapper for ControlHub job runtime parameters.

Parameters

runtime_parameters (str) – Runtime parameter.
job (streamsets.sdk.sch_models.Job) – Job object.

Organizations#

class streamsets.sdk.sch_models.Organization(organization, organization_admin_user=None, api_client=None)[source]#

Model for Organization.

Parameters

organization (str) – Organization Id.
organization_admin_user (str, optional) – Default: None.
api_client (streamsets.sdk.sch_api.ApiClient, optional) – Default: None.

class streamsets.sdk.sch_models.Organizations(control_hub)[source]#: Collection of streamsets.sdk.sch_models.Organization instances.

class streamsets.sdk.sch_models.OrganizationBuilder(organization, organization_admin_user)[source]#

Class with which to build instances of streamsets.sdk.sch_models.Organization.

Instead of instantiating this class directly, most users should use: streamsets.sdk.sch.ControlHub.get_organization_builder().

Parameters: organization (dict) – Python object built from our Swagger UserJson definition.

build(id, name, admin_user_id, admin_user_display_name, admin_user_email_address, admin_user_ldap_user_name=None)[source]#

Build the organization.

Parameters

id (str) – Organization ID.
name (str) – Organization name.
admin_user_id (str) – User Id of the admin of this organization.
admin_user_display_name (str) – User display name of admin of this organization.
admin_user_email_address (str) – User email address of admin of this organization.
admin_user_ldap_user_name (str, optional) – LDAP username. Default: None.

Returns

An instance of streamsets.sdk.sch_models.Organization.

Pipelines#

class streamsets.sdk.sch_models.Pipeline(pipeline, builder, pipeline_definition, rules_definition, control_hub=None, library_definitions=None)[source]#

Model for Pipeline.

Parameters

pipeline (dict) – Pipeline in JSON format.
builder (streamsets.sdk.sch_models.PipelineBuilder) – Pipeline Builder object.
pipeline_definition (dict) – Pipeline Definition in JSON format.
rules_definition (dict) – Rules Definition in JSON format.
control_hub (streamsets.sdk.sch.ControlHub, optional) – ControlHub object. Default: None.
library_definitions (dict, optional) – Library Definition in JSON format. Default: None.

property Labels#

Get the pipeline labels. This attribute will be deprecated in a future release. Please use labels instead.

Returns: A list of dict

property acl#

Get pipeline ACL.

Returns: An instance of streamsets.sdk.sch_models.ACL.

add_label(*labels)[source]#

Add a label

Parameters: *labels – One or more instances of str

property commits#

Get commits for this pipeline.

Returns: A streamsets.sdk.utils.SeekableList of instances of streamsets.sdk.sch_models.PipelineCommit.

property configuration#

Get pipeline’s configuration.

Returns: An instance of streamsets.sdk.sch_models.Configuration.

property labels#

Get the pipeline labels.

Returns: A streamsets.sdk.utils.SeekableList of instances of streamsets.sdk.sch_models.PipelineLabel.

property parameters#

Get the pipeline parameters.

Returns: A dict like, streamsets.sdk.sch_models.PipelineParameters object of parameter key-value pairs.

remove_label(*labels)[source]#

Remove a label

Parameters: *labels – One or more instances of str

property tags#

Get tags for this pipeline.

Returns: A streamsets.sdk.utils.SeekableList of instances of streamsets.sdk.sch_models.PipelineTag.

class streamsets.sdk.sch_models.Pipelines(control_hub, organization)[source]#

Collection of streamsets.sdk.sch_models.Pipeline instances.

Parameters

control_hub – An instance of streamsets.sdk.sch.ControlHub.
organization (str) – Organization Id.

class streamsets.sdk.sch_models.PipelineBuilder(pipeline, data_collector_pipeline_builder, control_hub=None, fragment=False)[source]#

Class with which to build instances of streamsets.sdk.sch_models.Pipeline.

Instead of instantiating this class directly, most users should use: streamsets.sdk.sch.ControlHub.get_pipeline_builder().

Parameters

pipeline (dict) – Python object built from our Swagger PipelineJson definition.
data_collector_pipeline_builder (streamsets.sdk.sdc_models.PipelineBuilder) – Data Collector Pipeline Builder object.
control_hub (streamsets.sdk.sch.ControlHub) – Default: None.
fragment (boolean, optional) – Specify if a fragment builder. Default: False.

add_stage(label=None, name=None, type=None, library=None)[source]#

Add a stage to the pipeline.

When specifying a stage, either label or name must be used. type and library may also be used to select a particular stage if ambiguities exist. If type and/or library are omitted, the first stage definition matching the given label or name will be used.

Parameters

label (str, optional) – Stage label to use when selecting stage from definitions. Default: None.
name (str, optional) – Stage name to use when selecting stage from definitions. Default: None.
type (str, optional) – Stage type to use when selecting stage from definitions (e.g. origin, destination, processor, executor). Default: None.
library (str, optional) – Stage library to use when selecting stage from definitions. Default: None.

Returns

An instance of streamsets.sdk.sch_models.SchSdcStage.

build(title='Pipeline', labels=None, **kwargs)[source]#

Build the pipeline.

Parameters

title (str) – title of the pipeline.
labels (list, optional) – List of pipeline labels of type str. Default: None.

Returns

py:class`streamsets.sdk.sch_models.Pipeline`.

Return type

An instance of

class streamsets.sdk.sch_models.PipelineCommit(pipeline_commit, control_hub=None)[source]#

Model for pipeline commit.

Parameters

pipeline_commit (dict) – Pipeline commit in JSON format.
control_hub (streamsets.sdk.sch.ControlHub) – ControlHub object.

class streamsets.sdk.sch_models.PipelineLabel(pipeline_label)[source]#

Model for pipeline label.

Parameters: pipeline_label (dict) – Pipeline label in JSON format.

class streamsets.sdk.sch_models.PipelineLabels(control_hub, organization)[source]#

Collection of streamsets.sdk.sch_models.PipelineLabel instances.

Parameters

control_hub – An instance of streamsets.sdk.sch.ControlHub.
organization (str) – Organization Id.

class streamsets.sdk.sch_models.PipelineParameters(pipeline)[source]#

Parameters for pipelines.

Parameters: pipeline (streamsets.sdk.sch_models.Pipeline) – Pipeline Instance.

update(parameters_dict)[source]#

Update existing parameters. Works similar to Python dictionary update.

Parameters: parameters_dict (dict) – Dictionary of key-value pairs to be used as parameters.

class streamsets.sdk.sch_models.StPipelineBuilder(pipeline, transformer_pipeline_builder, control_hub=None, fragment=False)[source]#

Class with which to build instances of streamsets.sdk.sch_models.Pipeline.

Instead of instantiating this class directly, most users should use: streamsets.sdk.sch.ControlHub.get_pipeline_builder().

Parameters

pipeline (dict) – Python object built from our Swagger PipelineJson definition.
transformer_pipeline_builder (streamsets.sdk.sdc_models.PipelineBuilder) – Transformer Pipeline Builder object.
control_hub (streamsets.sdk.sch.ControlHub) – Default: None.
fragment (boolean, optional) – Specify if a fragment builder. Default: False.

add_stage(label=None, name=None, type=None, library=None)[source]#

Add a stage to the pipeline.

Parameters

label (str, optional) – Transformer stage label to use when selecting stage from definitions. Default: None.
name (str, optional) – Transformer stage name to use when selecting stage from definitions. Default: None.
type (str, optional) – Transformer stage type to use when selecting stage from definitions (e.g. origin, destination, processor, executor). Default: None.
library (str, optional) – Transformer stage library to use when selecting stage from definitions. Default: None.

Returns

An instance of streamsets.sdk.sch_models.SchStStage.

build(title='Pipeline', **kwargs)[source]#

Build the pipeline.

Parameters: title (str) – title of the pipeline.
Returns: py:class`streamsets.sdk.sch_models.Pipeline`.
Return type: An instance of

Protection Methods#

class streamsets.sdk.sch_models.ProtectionMethod(stage)[source]#

Protection Method Model.

Parameters: stage (dict) – JSON representation of a stage.

class streamsets.sdk.sch_models.ProtectionMethodBuilder(pipeline_builder)[source]#

Class with which to build instances of streamsets.sdk.sch_models.ProtectionMethod.

Instead of instantiating this class directly, most users should use: streamsets.sdk.sch.ControlHub.get_protection_method_builder().

Parameters: pipeline_builder (streamsets.sdk.sch_models.PipelineBuilder) – Pipeline Builder object.

Protection Policies#

class streamsets.sdk.sch_models.ProtectionPolicy(protection_policy, procedures=None)[source]#

Model for Protection Policy.

Parameters

protection_policy (dict) – JSON representation of Protection Policy.
procedures (list) – A list of streamsets.sdk.sch_models.PolicyProcedure instances, Default: None.

class streamsets.sdk.sch_models.ProtectionPolicies(control_hub)[source]#: Collection of streamsets.sdk.sch_models.ProtectionPolicy instances.

class streamsets.sdk.sch_models.ProtectionPolicyBuilder(control_hub, protection_policy, policy_procedure)[source]#

Class with which to build instances of streamsets.sdk.sch_models.ProtectionPolicy.

Instead of instantiating this class directly, most users should use: streamsets.sdk.sch.ControlHub.get_protection_policy_builder().

Parameters

protection_policy (dict) – Python object defining a protection policy.
policy_procedure (dict) – Python object defining a policy procedure.

class streamsets.sdk.sch_models.PolicyProcedure(policy_procedure)[source]#

Model for Policy Procedure.

Parameters: policy_procedure (dict) – JSON representation of Policy Procedure.

ProvisioningAgents#

class streamsets.sdk.sch_models.Deployment(deployment, control_hub=None, **kwargs)[source]#

Model for Deployment. This is an abstract class.

created_by#

User that created this deployment.

Type: str

created_on#

Time at which this deployment was created.

Type: int

deployment_id#

Id of the deployment.

Type: str

deployment_name#

Name of the deployment.

Type: str

deployment_tags#

Raw deployment tags of the deployment.

Type: list

deployment_type#

Type of the deployment.

Type: str

desired_instances#

Type: int

engine_configuration#

Type: DeploymentEngineConfiguration

engine_type#

Type: str

engine_version#

Type: str

environment_id#

Enabled environment where engine will be deployed.

Type: list

last_modified_by#

User that last modified this deployment.

Type: str

last_modified_on#

Time at which this deployment was last modified.

Type: int

network_tags#

Type: list

scala_binary_version#

scala Binary Version.

Type: str

organization#

Type: str

json_state#

State of the deployment - this is json field called state.

Type: str

state#

State of the deployment - displayed as state on UI.

Type: str

status#

Status of the deployment.

Type: str

status_detail#

Status detail of the deployment.

Type: str

class ENGINE_TYPES(value)[source]#: An enumeration.

class TYPES(value)[source]#: An enumeration.

property acl#

Get deployment ACL.

Returns: An instance of streamsets.sdk.sch_models.ACL.

property engine_configuration#: The engine configuration of deployment engine. :returns: An instance of streamsets.sdk.sch_models.DeploymentEngineConfiguration.

property tags#

Get the tags.

Returns: A streamsets.sdk.utils.SeekableList of str instances.

class streamsets.sdk.sch_models.Deployments(control_hub, organization)[source]#

Collection of streamsets.sdk.sch_models.Deployment instances.

Parameters

control_hub – An instance of streamsets.sdk.sch.ControlHub.
organization (str) – Organization Id.

class streamsets.sdk.sch_models.DeploymentBuilder(deployment, control_hub)[source]#

Class with which to build instances of streamsets.sdk.sch_models.Deployment.

Instead of instantiating this class directly, most users should use: streamsets.sdk.sch.ControlHub.get_deployment_builder().

Parameters

deployment (dict) – Python object built from our Swagger CspDeploymentJson definition.
control_hub (streamsets.sdk.sch.ControlHub) – ControlHub object.

build(deployment_name, engine_type, engine_version, environment, **kwargs)[source]#

Define the deployment.

Parameters

deployment_name (str) – deployment name.
engine_type (str) – Type of engine to deploy.
engine_version (str) – Version of engine to deploy.
environment (str) – ID of the enabled environment where the engine will be deployed.
deployment_tags (list, optional) – List of tags (strings). Default: None.
engine_build (str, optional) – Build of engine to deploy. Default: None.
scala_binary_version (str, optional) – Scala binary version required in case of engine_type=’TF’ Default: None.

Returns

An instance of subclass of streamsets.sdk.sch_models.Deployment.

class streamsets.sdk.sch_models.ProvisioningAgent(provisioning_agent, control_hub)[source]#

Model for Provisioning Agent.

Parameters

provisioning_agent (dict) – A Python object representation of Provisioning Agent.
control_hub – An instance of streamsets.sdk.sch.ControlHub.

property acl#

Get the ACL of a Provisioning Agent.

Returns: An instance of streamsets.sdk.sch_models.ACL.

property deployments#

Get the deployments associated with the Provisioning Agent.

Returns: A list of streamsets.sdk.sch_models.LegacyDeployment instances.

class streamsets.sdk.sch_models.ProvisioningAgents(control_hub, organization)[source]#

Collection of streamsets.sdk.sch_models.ProvisioningAgent instances.

Parameters

control_hub – An instance of streamsets.sdk.sch.ControlHub.
organization (str) – Organization Id.

Reports#

class streamsets.sdk.sch_models.GenerateReportCommand(control_hub, report_defintion, response)[source]#

Command to interact with the response from generate_report.

Parameters

control_hub (streamsets.sdk.sch.ControlHub) – ControlHub instance.
report_definition (dict) – JSON representation of Report Definition.
response (dict) – Api response from generating the report.

class streamsets.sdk.sch_models.Report(report, control_hub, report_definition_id)[source]#

Model for Report.

Parameters: report (dict) – JSON representation of Report.

download()[source]#

Download the Report in PDF format

Returns: An instance of bytes.

class streamsets.sdk.sch_models.Reports(control_hub, report_definition_id)[source]#

Collection of streamsets.sdk.sch_models.Report instances.

Parameters

control_hub (streamsets.sdk.sch.ControlHub) – ControlHub object.
report_definition_id (str) – Report Definition Id.

class streamsets.sdk.sch_models.ReportDefinition(report_definition, control_hub)[source]#

Model for Report Definition.

Parameters

report_definition (dict) – JSON representation of Report Definition.
control_hub (streamsets.sdk.sch.ControlHub) – ControlHub instance.

property acl#

Get Report Definition ACL.

Returns: An instance of streamsets.sdk.sch_models.ACL.

generate_report()[source]#

Generate a Report for Report Definition.

Returns: An instance of streamsets.sdk.sch_models.Report.

property report_resources#

Get Report Resources of the Report Definition.

Returns: An instance of streamsets.sdk.sch_models.ReportResources.

property reports#

Get Reports of the Report Definition.

Returns: An instance of streamsets.sdk.sch_models.Reports.

class streamsets.sdk.sch_models.ReportDefinitions(control_hub)[source]#: Collection of streamsets.sdk.sch_models.ReportDefinition instances.

class streamsets.sdk.sch_models.ReportDefinitionBuilder(report_definition, control_hub)[source]#

Class with which to build instances of streamsets.sdk.sch_models.ReportDefinition.

Instead of instantiating this class directly, most users should use: streamsets.sdk.sch.ControlHub.get_report_definition_builder().

Parameters

report_definition (dict) – JSON representation of Report Definition.
control_hub (streamsets.sdk.sch.ControlHub) – ControlHub instance.

add_report_resource(resource)[source]#

Add a given resource to Report Definition resources.

Parameters: resource (streamsets.sdk.sch_models.Job) or (streamsets.sdk.sch_models.Topology) –

build(name, description=None)[source]#

Build the report definition.

Parameters

name (str) – Name of the Report Definition.
description (str, optional) – Description of the Report Definition. Default: None.

Returns

An instance of streamsets.sdk.sch_models.ReportDefinition.

import_report_definition(report_definition)[source]#

Import an existing Report Definition to update it.

Parameters: report_definition (streamsets.sdk.sch_models.ReportDefinition) – Report Definition object.

remove_report_resource(resource)[source]#

Remove a resource from Report Definition Resources.

Parameters: resource (streamsets.sdk.sch_models.Job) or (streamsets.sdk.sch_models.Topology) –
Returns: A resource of type dict that is removed from Report Definition Resources.

set_data_retrieval_period(start_time, end_time)[source]#

Set Time range over which the report will be generated.

Parameters

start_time (str) or (int) – Absolute or relative start time for the Report.
end_time (str) or (int) – Absolute or relative end time for the Report.

class streamsets.sdk.sch_models.ReportResource(report_resource)[source]#

Model for Report Resource.

Parameters: report_resource (dict) – JSON representation of Report Resource.

class streamsets.sdk.sch_models.ReportResources(report_resources, report_definition)[source]#

Model for the collection of Report Resources.

Parameters

report_resources (list) – List of Report Resources.
report_definition (streamsets.sdk.sch_models.ReportDefinition) – Report Definition object.

Roles#

class streamsets.sdk.sch_models.Roles(values, entity, role_label_to_id)[source]#

Wrapper class over the list of Roles.

Parameters

values (list) – List of roles.
entity (streamsets.sdk.sch_models.Group) – (streamsets.sdk.sch_models.User): Group or User object.
role_label_to_id (dict) – Role label to Role ID mapping.

append(value)[source]#: Append object to the end of the list.

remove(value)[source]#

Remove first occurrence of value.

Raises ValueError if the value is not present.

Scheduler#

class streamsets.sdk.sch_models.ScheduledTask(task, control_hub=None)[source]#

Model for Scheduled Task.

Parameters

task (dict) – JSON representation of task.
control_hub (streamsets.sdk.sch.ControlHub) – ControlHub instance.

property acl#

Get the ACL of a Scheduled Task.

Returns: An instance of streamsets.sdk.sch_models.ACL.

property audits#

Get Scheduled Task Audits.

Returns: A streamsets.sdk.utils.SeekableList of inherited instances of streamsets.sdk.sch_models.ScheduledTaskAudit.

property runs#

Get Scheduled Task Runs.

Returns: A streamsets.sdk.utils.SeekableList of inherited instances of streamsets.sdk.sch_models.ScheduledTaskRun.

class streamsets.sdk.sch_models.ScheduledTaskAudit(audit)[source]#

Scheduled Task Audit.

Parameters: run (dict) – JSON representation of scheduled task audit.

class streamsets.sdk.sch_models.ScheduledTaskBaseModel(data, attributes_to_ignore=None, attributes_to_remap=None, repr_metadata=None)[source]#: Base Model for Scheduled Task related classes.

class streamsets.sdk.sch_models.ScheduledTaskBuilder(job_selection_types, control_hub)[source]#

Builder for Scheduled Task.

Instead of instantiating this class directly, most users should use: streamsets.sdk.sch.ControlHub.get_scheduled_task_builder().

Parameters

job_selection_types (dict) – JSON representation of job selection types.
control_hub (streamsets.sdk.sch.ControlHub) – ControlHub instance.

build(task_object, action='START', name=None, description=None, cron_expression='0 0 1/1 * ? *', time_zone='UTC', status='RUNNING', start_time=None, end_time=None, missed_execution_handling='IGNORE')[source]#

Builder for Scheduled Task.

Parameters

task_object (streamsets.sdk.sch_models.Job) – (streamsets.sdk.sch_models.ReportDefinition): Job or ReportDefinition object.
action (str, optional) – One of the {‘START’, ‘STOP’, ‘UPGRADE’} actions. Default: START.
name (str, optional) – Name of the task. Default: None.
description (str, optional) – Description of the task. Default: None.
crontab_mask (str, optional) – Schedule in cron syntax. Default: "0 0 1/1 * ? *". (Daily at 12).
time_zone (str, optional) – Time zone. Default: "UTC".
status (str, optional) – One of the {‘RUNNING’, ‘PAUSED’} statuses. Default: RUNNING.
start_time (str, optional) – Start time of task. Default: None.
end_time (str, optional) – End time of task. Default: None.
missed_trigger_handling (str, optional) – One of {‘IGNORE’, ‘RUN ALL’, ‘RUN ONCE’}. Default: IGNORE.

Returns

An instance of streamsets.sdk.sch_models.ScheduledTask.

class streamsets.sdk.sch_models.ScheduledTaskRun(run)[source]#

Scheduled Task Run.

Parameters: run (dict) – JSON representation if scheduled task run.

class streamsets.sdk.sch_models.ScheduledTasks(control_hub)[source]#: Collection of streamsets.sdk.sch_models.ScheduledTask instances.

Subscriptions#

class streamsets.sdk.sch_models.Subscription(subscription, control_hub)[source]#

Subscription.

Parameters

subscription (dict) – JSON representation of Subscription.
control_hub (streamsets.sdk.sch.ControlHub) – ControlHub instance.

property acl#

Get the ACL of an Event Subscription.

Returns: An instance of streamsets.sdk.sch_models.ACL.

property action#: Action of the Subscription.

property events#: Events of the Subscription.

class streamsets.sdk.sch_models.Subscriptions(control_hub)[source]#

Collection of streamsets.sdk.sch_models.Subscription instances.

Parameters: control_hub (streamsets.sdk.sch.ControlHub) – ControlHub object.

class streamsets.sdk.sch_models.SubscriptionAction(action)[source]#

Action to take when the Subscription is triggered.

Parameters: action (dict) – JSON representation of an Action for a Subscription.

class streamsets.sdk.sch_models.SubscriptionAudit(audit)[source]#

Model for subscription audit.

Parameters: audit (dict) – JSON representation of a subscription audit.

class streamsets.sdk.sch_models.SubscriptionBuilder(subscription, control_hub)[source]#

Builder for Subscription.

Instead of instantiating this class directly, most users should use: streamsets.sdk.sch.ControlHub.get_subscription_builder().

Parameters

subscription (dict) – JSON representation of event subscription.
control_hub (streamsets.sdk.sch.ControlHub) – ControlHub instance.

add_event(event_type, filter=None)[source]#

Add event to the Subscription.

Parameters

event_type (str) – Type of event in {‘Job Status Change’, ‘Data SLA Triggered’, ‘Pipeline Committed’, ‘Pipeline Status Change’, ‘Report Generated’, ‘Data Collector not Responding’}.
filter (str, optional) – Filter to be applied on event. Default: None.

build(name, description=None)[source]#

Builder for Scheduled Task.

Parameters

name (str) – Name of Subscription.
description (str, optional) – Description of subscription. Default: None.

Returns

An instance of streamsets.sdk.sch_models.Subscription.

import_subscription(subscription)[source]#

Import an existing Subscription into the builder to update it.

Parameters: subscription (streamsets.sdk.sch_models.Subscription) – Subscription instance.

remove_event(event_type)[source]#

Remove event from the subscription.

Parameters: event_type (str) – Type of event in {‘Job Status Change’, ‘Data SLA Triggered’, ‘Pipeline Committed’, ‘Pipeline Status Change’, ‘Report Generated’, ‘Data Collector not Responding’}.
Returns: An instance of streamsets.sdk.sch_models.SubscriptionEvent.

set_email_action(recipients, subject=None, body=None)[source]#

Set the Email action.

Parameters

recipients (list) – List of email addresses.
subject (str, optional) – Subject of the email. Default: None.
body (str, optional) – Body of the email. Default: None.

set_webhook_action(uri, method='GET', content_type=None, payload=None, auth_type=None, username=None, password=None, timeout=30000, headers=None)[source]#

Set the Webhook action.

Parameters

uri (str) – URI for the Webhook.
method (str, optional) – HTTP method to use. Default: 'GET'.
content_type (str, optional) – Content Type of the request. Default: None.
payload (str, optional) – Payload of the request. Default: None.
auth_type (str, optional) – 'basic' or None. Default: None.
username (str, optional) – username for the authentication. Default: None.
password (str, optional) – password for the authentication. Default: None.
timeout (int, optional) – timeout for the Webhook action. Default: 30000.
headers (dict, optional) – Headers to be sent to the Webhook call. Default: None.

class streamsets.sdk.sch_models.SubscriptionEvent(event, control_hub)[source]#

An Event of a Subscription.

Parameters

event (dict) – JSON representation of Events of a Subscription.
control_hub (streamsets.sdk.sch.ControlHub) – ControlHub instance.

Topologies#

class streamsets.sdk.sch_models.DataSla(data_sla, control_hub)[source]#

Model for DataSla.

Parameters

data_sla (dict) – JSON representation of SLA.
control_hub (streamsets.sdk.sch.ControlHub) – An instance of the Control Hub.

class streamsets.sdk.sch_models.DataSlaBuilder(data_sla, control_hub)[source]#

Class with which to build instances of streamsets.sdk.sch_models.DataSla.

Instead of instantiating this class directly, most users should use: streamsets.sdk.sch.ControlHub.get_data_sla_builder().

Parameters

data_sla (dict) – Python object built from our Swagger DataSlaJson definition.
control_hub (streamsets.sdk.sch.ControlHub) – An instance of the Control Hub.

build(topology, label, job, alert_text, qos_parameter='THROUGHPUT_RATE', function_type='Max', min_max_value=100, enabled=True)[source]#

Build the Data Sla.

Parameters

topology (streamsets.sdk.sch_models.Topology) – Topology object.
label (str) – Label for the SLA.
job (list) – List of streamsets.sdk.sch_models.Job objects.
alert_text (str) – Alert text.
qos_parameter (str, optional) – paramter in {‘THROUGHPUT_RATE’, ‘ERROR_RATE’}. Default: 'THROUGHPUT_RATE'.
function_type (str, optional) – paramter in {‘Max’, ‘Min’}. Default: 'Max'.
min_max_value (str, optional) – Default: 100.
enabled (boolean, optional) – Default: True.

Returns

An instance of streamsets.sdk.sch_models.DataSla.

class streamsets.sdk.sch_models.Topology(topology, control_hub=None)[source]#

Model for Topology.

Parameters: topology (dict) – JSON representation of Topology.

commit_id#

Pipeline commit id.

Type: str

commit_message#

Commit Message.

Type: str

commit_time#

Time at which commit was made.

Type: int

committed_by#

User that made the commit.

Type: str

default_topology#

Default Topology.

Type: bool

description#

Topology description.

Type: str

draft#

Indicates whether this topology is a draft.

Type: bool

last_modified_by#

User that last modified this topology.

Type: str

last_modified_on#

Time at which this topology was last modified.

Type: int

new_pipeline_version_available#

Whether any job in the topology has a new pipeline version to be updated to.

Type: bool

organization#

Id of the organization.

Type: str

parent_version#

Version of the parent topology.

Type: str

topology_definition#

Definition of the topology.

Type: str

topology_id#

Id of the topology.

Type: str

topology_name#

Name of the topology.

Type: str

validation_issues#

Any validation issues that exist for this Topology.

Type: dict

version#

Version of this topology.

Type: str

acknowledge_job_errors()[source]#

Acknowledge all errors for the jobs in a topology.

Returns: An instance of streamsets.sdk.sch_api.Command.

property acl#

Get the ACL of a Topology.

Returns: An instance of streamsets.sdk.sch_models.ACL.

activate_data_sla(*data_slas)[source]#

Activate Data SLAs.

Parameters: *data_slas – One or more instances of streamsets.sdk.sch_models.DataSla.
Returns: An instance of streamsets.sdk.sch_api.Command.

add_data_sla(data_sla)[source]#

Add SLA.

Parameters: data_sla (streamsets.sdk.sch_models.DataSla) – Data SLA object.
Returns: An instance of streamsets.sdk.sch_api.Command.

auto_discover_connections()[source]#

Auto discover connecting systems between nodes in a Topology.

Returns: An instance of streamsets.sdk.sch_api.Command.

auto_fix()[source]#

Auto-fix a topology by rectifying invalid or removed jobs, outdated jobs, etc.

Returns: An instance of streamsets.sdk.sch_api.Command.

deactivate_data_sla(*data_slas)[source]#

Deactivate Data SLAs.

Parameters: *data_slas – One or more instances of streamsets.sdk.sch_models.DataSla.
Returns: An instance of streamsets.sdk.sch_api.Command.

delete_data_sla(*data_slas)[source]#

Delete Data SLAs.

Parameters: *data_slas – One or more instances of streamsets.sdk.sch_models.DataSla.
Returns: An instance of streamsets.sdk.sch_api.Command.

property jobs#

Get the jobs that are contained within the Topology.

Returns: A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.Job instances.

property new_pipeline_version_available#

Determine if a new pipeline version is available for any jobs in the Topology.

Returns: A (bool) value.

property nodes#

Get the job and system nodes that make up the Topology.

Returns

A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.TopologyNode: instances.

start_all_jobs()[source]#

Start all jobs of a topology.

Returns: An instance of streamsets.sdk.sch_api.Command.

stop_all_jobs(force=False)[source]#

Stop all jobs of a topology.

Parameters: force (bool, optional) – Force topology jobs to stop. Default: False.
Returns: An instance of streamsets.sdk.sch_api.Command.

update_jobs_to_latest_change()[source]#

Upgrade a topology’s job(s) to the latest corresponding pipeline change.

Returns: An instance of streamsets.sdk.sch_api.Command.

property validation_issues#

Get any validation issues that are detected for the Topology.

Returns: A (list) of validation issues in JSON format.

class streamsets.sdk.sch_models.Topologies(control_hub)[source]#

Collection of streamsets.sdk.sch_models.Topology instances.

Parameters: control_hub (streamsets.sdk.sch.ControlHub) – An instance of the ControlHub.

class streamsets.sdk.sch_models.TopologyBuilder(topology, control_hub=None)[source]#

Class with which to build instances of streamsets.sdk.sch_models.Topology.

Instead of instantiating this class directly, most users should use: streamsets.sdk.sch.ControlHub.get_topology_builder().

Parameters

topology (dict) – Python object built from our Swagger TopologyJson definition.
control_hub (streamsets.sdk.ControlHub, optional) – Control Hub instance. Default: None

add_job(job)[source]#

Add a job node to the Topology being built.

Parameters: job (streamsets.sdk.sch_models.Job) – An instance of a job to be added.

add_system(name)[source]#

Add a system node to the Topology being built.

Parameters: name (str) – The name of the system to add to the topology.

build(topology_name=None, description=None)[source]#

Build the topology.

Parameters

topology_name (str, optional) – Name of the topology. This parameter is required when building a new topology.
description (str, optional) – Description of the topology. Default: None.

Returns

An instance of streamsets.sdk.sch_models.Topology.

delete_node(topology_node)[source]#

Delete a system or job node from the topology.

Parameters: topology_node (streamsets.sdk.sch_models.TopologyNode) – An instance of a TopologyNode to delete from the topology.

import_topology(topology)[source]#

Import an existing topology to be used in the builder.

Parameters: topology (streamsets.sdk.sch_models.Topology) – An existing Topology instance to modify.

property topology_nodes#

Get all of the nodes currently part of the topology held by the TopologyBuilder.

Returns

A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.TopologyNode: instances.

class streamsets.sdk.sch_models.TopologyNode(topology_node_json)[source]#

Model for a node within a Topology.

Parameters: topology_node_json (dict) – JSON representation of a Topology Node.

node_type#

The type of this node, i.e. SYSTEM, JOB, etc.

Type: str

instance_name#

The name of this node instance.

Type: str

stage_name#

The name of the stage in this node.

Type: str

stage_version#

The version of the stage in this node.

Type: str

job_id#

The ID of the job in this node.

Type: str

pipeline_id#

The pipeline ID associated with the job in this node.

Type: str

pipeline_commit_id#

The commit ID of the pipeline.

Type: str

pipeline_version#

The version of the pipeline.

Type: str

input_lanes#

A list of str representing the input lanes for this node.

Type: list

output_lanes#

A list of str representing the output lanes for this node.

Type: list

Users#

class streamsets.sdk.sch_models.User(user, roles)[source]#

Model for User.

Parameters

user (dict) – JSON representation of User.
roles (dict) – A mapping of role IDs to role labels.

active#

Whether the user is active or not.

Type: bool

created_by#

Creator of this user.

Type: str

created_on#

Creation time of this user.

Type: str

display_name#

Display name of this user.

Type: str

email_address#

Email address of this user.

Type: str

id#

Id of this user.

Type: str

groups#

Groups this user belongs to.

Type: list

last_modified_by#

User last modified by.

Type: str

last_modified_on#

User last modification time.

Type: str

password_expires_on#

User’s password expiration time.

Type: str

password_system_generated#

Whether User’s password is system generated or not.

Type: bool

roles#

A set of role labels.

Type: set

saml_user_name#

SAML username of user.

Type: str

class streamsets.sdk.sch_models.Users(control_hub, roles, organization)[source]#

Collection of streamsets.sdk.sch_models.User instances.

Parameters

control_hub – An instance of streamsets.sdk.sch.ControlHub.
roles (dict) – A mapping of role IDs to role labels.
organization (str) – Organization ID.

class streamsets.sdk.sch_models.UserBuilder(user, roles, control_hub=None)[source]#

Class with which to build instances of streamsets.sdk.sch_models.User.

Instead of instantiating this class directly, most users should use: streamsets.sdk.sch.ControlHub.get_user_builder().

Parameters

user (dict) – Python object built from our Swagger UserJson definition.
roles (dict) – A mapping of role IDs to role labels.
control_hub – An instance of streamsets.sdk.sch.ControlHub. Default: None.

build(email_address)[source]#

Build the user.

Parameters: email_address (str) – User Email Address.
Returns: An instance of streamsets.sdk.sch_models.User.

Common#

Models used by StreamSets Data Collector and StreamSets Control Hub:

class streamsets.sdk.models.Configuration(configuration=None, property_key='name', property_value='value', **kwargs)[source]#

Abstraction for stage configurations.

This class enables easy access to and modification of data stored as a list of dictionaries. As an example, SDC’s pipeline configuration is stored in the form

[ {
  "name" : "executionMode",
  "value" : "STANDALONE"
}, {
  "name" : "deliveryGuarantee",
  "value" : "AT_LEAST_ONCE"
}, ... ]

By implementing simple __getitem__ and __setitem__ methods, this class allows items in this list to be accessed using

configuration['executionMode'] = 'CLUSTER_BATCH'

Instead of the more verbose

for property in configuration:
    if property['name'] == 'executionMode':
        property['value'] = 'CLUSTER_BATCH'
    break

Parameters

configuration (str) – List of dictionaries comprising the configuration.
property_key (str, optional) – The dictionary entry denoting the property key. Default: name
property_value (str, optional) – The dictionary entry denoting the property value. Default: value

get(key, default=None)[source]#: Return the value of key or, if not in the configuration, the default value.

items()[source]#

Gets the configuration’s items.

Returns: A new view of the configuration’s items ((key, value) pairs).

update(configs)[source]#

Update instance with a collection of configurations.

Parameters: configs (dict) – Dictionary of configurations to use.

Exceptions#

Common exceptions.

exception streamsets.sdk.exceptions.BadRequestError(response)[source]#: Bad request error (HTTP 400).

exception streamsets.sdk.exceptions.ConnectionError(code, message)[source]#: Connection Catalog errors.

exception streamsets.sdk.exceptions.InternalServerError(response)[source]#: Internal server error.

exception streamsets.sdk.exceptions.InvalidCredentialsError(message)[source]#: Invalid credentials error.

exception streamsets.sdk.exceptions.JobInactiveError(message)[source]#: Job status changed into INACTIVE_ERROR.

exception streamsets.sdk.exceptions.JobRunnerError(code, message)[source]#: JobRunner errors.

exception streamsets.sdk.exceptions.LegacyDeploymentInactiveError(message)[source]#: Legacy deployment status changed into INACTIVE_ERROR.

exception streamsets.sdk.exceptions.MultipleIssuesError(errors)[source]#: Multiple errors were returned.

exception streamsets.sdk.exceptions.TopologyIssuesError(issues)[source]#: Topology has some issues.

exception streamsets.sdk.exceptions.UnprocessableEntityError(response)[source]#: Unprocessable Entity Error (HTTP 422).

exception streamsets.sdk.exceptions.UnsupportedMethodError(message)[source]#: An unsupported method was called.

exception streamsets.sdk.exceptions.ValidationError(issues)[source]#: Validation issues.

Platform SDK for Python Version 4.3.0

StreamSets Control Hub

Section Contents

StreamSets Control Hub#

Main interface#

Models#

ACLs#

Alerts#

Classifiers#

Classification Rules#

Connections#

DataCollectors#

Deployments#

Environments#

Transformers#

Group#

Jobs#

Organizations#

Pipelines#

Protection Methods#

Protection Policies#

ProvisioningAgents#

Reports#

Roles#

Scheduler#

Subscriptions#

Topologies#

Users#

Common#

Exceptions#