StreamSets Control Hub#

Main interface#

This is the main entry point used by users when interacting with SCH instances.

class streamsets.sdk.ControlHub(credential_id, token, use_websocket_tunneling=True, **kwargs)[source]#

Class to interact with StreamSets Control Hub.

Parameters

credential_id (str) – ControlHub credential ID.
token (str) – ControlHub token.
use_websocket_tunneling (bool, optional) – use websocket tunneling. Default: True.

version#

Version of ControlHub.

Type: str

ldap_enabled#

Indication if LDAP is enabled or not.

Type: bool

organization_global_configuration#

Organization’s Global Configuration instance.

Type: :py:class:`streamsets.sdk.models.Configuration`

pipeline_labels#

Pipeline Labels.

Type: streamsets.sdk.sch_models.PipelineLabels

users#

Organization’s Users.

Type: streamsets.sdk.sch_models.Users

login_audits#

ControlHub Login Audits.

Type: streamsets.sdk.sch_models.LoginAudits

action_audits#

ControlHub Action Audits.

Type: streamsets.sdk.sch_models.ActionAudits

connection_audits#

ControlHub Connection Audits.

Type: streamsets.sdk.sch_models.ConnectionAudits

groups#

ControlHub Groups.

Type: streamsets.sdk.sch_models.Groups

data_collectors#

Data Collector instances registered under ControlHub.

Type: Returns a streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.DataCollector instances.

transformers#

Transformer instances registered under ControlHub.

Type: Returns a streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.Transformer instances.

provisioning_agents (: py:class:`streamsets.sdk.sch_models.ProvisioningAgents): Provisioning Agents registered to the ControlHub instance.

legacy_deployments#

LegacyDeployments instances.

Type: streamsets.sdk.sch_models.LegacyDeployments

organizations#

All the organizations that the current user belongs to.

Type: streamsets.sdk.sch_models.Organizations

api_credentials#

ControlHub Api Credentials.

Type: A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.ApiCredential

pipelines#

ControlHub Pipelines.

Type: streamsets.sdk.sch_models.Pipelines

draft_runs#

ControlHub Draft Runs.

Type: streamsets.sdk.sch_models.DraftRuns

jobs#

ControlHub Jobs.

Type: streamsets.sdk.sch_models.Jobs

data_protector_enabled (: bool): Whether Data Protector is enabled for the current organization.

connection_tags#

ControlHub Connection Tags.

Type: streamsets.sdk.sch_models.ConnectionTags

alerts#

ControlHub Alerts.

Type: A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.Alert

protection_policies#

ControlHub Protection Policies.

Type: streamsets.sdk.sch_models.ProtectionPolicies

scheduled_tasks#

ControlHub Scheduled Tasks.

Type: streamsets.sdk.sch_models.ScheduledTasks

subscriptions#

ControlHub Subscriptions.

Type: streamsets.sdk.sch_models.Subscriptions

subscription_audits#

ControlHub Subscription audits.

Type: A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.SubscriptionAudit

topologies#

ControlHub Topologies.

Type: streamsets.sdk.sch_models.Topologies

connections#

ControlHub Connections.

Type: streamsets.sdk.sch_models.Connections

environments#

ControlHub Environments.

Type: streamsets.sdk.sch_models.Environments

engine_versions#

ControlHub Deployment Engine Configuration.

Type: streamsets.sdk.sch_models.DeploymentEngineConfiguration

deployments#

ControlHub Deployments.

Type: streamsets.sdk.sch_models.Deployments

engines#

ControlHub Engines.

Type: streamsets.sdk.sch_models.Engines

saql_saved_searches_pipeline#

ControlHub SAQL Searches for type Pipeline.

Type: streamsets.sdk.sch_models.SAQLSearches

saql_saved_searches_fragment#

ControlHub SAQL Searches for type Fragment.

Type: streamsets.sdk.sch_models.SAQLSearches

saql_saved_searches_job_instance#

ControlHub SAQL Searches for type Job Instance.

Type: streamsets.sdk.sch_models.SAQLSearches

saql_saved_searches_job_template#

ControlHub SAQL Searches for type Job Template.

Type: streamsets.sdk.sch_models.SAQLSearches

saql_saved_searches_draft_run#

ControlHub SAQL Searches for type Draft Run.

Type: streamsets.sdk.sch_models.SAQLSearches

acknowledge_event_subscription_error(subscription)[source]#

Acknowledge an error on given Event Subscription.

Parameters: subscription (streamsets.sdk.sch_models.Subscription) – A Subscription instance.
Returns: An instance of streamsets.sdk.sch_api.Command.

acknowledge_job_error(*jobs)[source]#

Acknowledge errors for one or more jobs.

Parameters: *jobs – One or more instances of streamsets.sdk.sch_models.Job.

acknowledge_legacy_deployment_error(*deployments)[source]#

Acknowledge errors for one or more deployments.

Parameters: *deployments – One or more instances of streamsets.sdk.sch_models.LegacyDeployment.

property action_audits#

Action Audits.

Returns: An instance of streamsets.sdk.sch_models.ActionAudits.

activate_api_credential(api_credential)[source]#

Activate an api credential.

Parameters: api_credential (streamsets.sdk.sch_models.ApiCredential) – ApiCredential object.
Returns: An instance of streamsets.sdk.sch_api.Command.

activate_engine(engine)[source]#

Activate an engine.

Parameters: engine (streamsets.sdk.sch_models.Engine) – Engine instance.

activate_environment(*environments, timeout_sec=300)[source]#

Activate environments.

Parameters: *environments – One or more instances of streamsets.sdk.sch_models.Environment.
Returns: An instance of streamsets.sdk.sch_api.Command or None.

activate_provisioning_agent(provisioning_agent)[source]#

Activate provisioning agent.

Parameters: provisioning_agent (streamets.sdk.sch_models.ProvisioningAgent) –
Returns: An instance of streamsets.sdk.sch_api.Command.

activate_user(*users)[source]#

Activate users for all given User IDs.

Parameters: *users – One or more instance of streamsets.sdk.sch_models.User.
Returns: An instance of streamsets.sdk.aster_api.Command.

add_api_credential(api_credential)[source]#

Add an api credential. Some api credential attributes are updated by ControlHub such as: created_by.

Parameters: api_credential (streamsets.sdk.sch_models.ApiCredential) – An API credential instance, built via the streamsets.sdk.sch_models.ApiCredentialBuilder.build() method.
Returns: An instance of streamsets.sdk.sch_api.Command.

add_classification_rule(classification_rule, commit=False)[source]#

Add a classification rule.

Parameters

classification_rule (streamsets.sdk.sch_models.ClassificationRule) – Classification Rule object.
commit (bool, optional) – Whether to commit the rule after adding it. Default: False.

add_connection(connection)[source]#

Add a connection.

Parameters: connection (streamsets.sdk.sch_models.Connection) – Connection object.
Returns: An instance of streamsets.sdk.sch_api.Command.

add_deployment(deployment)[source]#

Add a deployment.

Parameters: deployment (streamsets.sdk.sch_models.Deployment) – Deployment object.
Returns: An instance of streamsets.sdk.sch_api.Command.

add_environment(environment)[source]#

Add an environment.

Parameters: environment (streamsets.sdk.sch_models.Environment) – Environment object.
Returns: An instance of streamsets.sdk.sch_api.Command.

add_group(group)[source]#

Add a group.

Parameters: group (streamsets.sdk.sch_models.Group) – Group object.
Returns: An instance of streamsets.sdk.sch_api.Command.

add_job(job)[source]#

Add a job.

Parameters: job (streamsets.sdk.sch_models.Job) – Job object.
Returns: An instance of streamsets.sdk.sch_api.Command.

add_legacy_deployment(deployment)[source]#

Add a legacy deployment.

Parameters: deployment (streamsets.sdk.sch_models.LegacyDeployment) – Deployment object.
Returns: An instance of streamsets.sdk.sch_api.Command.

add_organization(organization)[source]#

Add an organization.

Parameters: organization (streamsets.sdk.sch_models.Organization) – Organization object.
Returns: An instance of streamsets.sdk.sch_api.Command.

add_protection_policy(protection_policy)[source]#

Add a protection policy.

Parameters: protection_policy (streamsets.sdk.sch_models.ProtectionPolicy) – Protection Policy object.

add_report_definition(report_definition)[source]#

Add Report Definition to ControlHub.

Parameters: report_definition (streamsets.sdk.sch_models.ReportDefinition) – Report Definition instance.
Returns: An instance of streamsets.sdk.sch_api.Command.

add_scheduled_task(task)[source]#

Add the scheduled task to Control Hub.

Parameters: task (streamsets.sdk.sch_models.ScheduledTask) – Scheduled task object.
Returns: An instance of streamsets.sdk.sch_api.Command.
Raises: Exception – Thrown if publishing the task was unsuccessful

add_subscription(subscription)[source]#

Add Subscription to ControlHub.

Parameters: subscription (streamsets.sdk.sch_models.Subscription) – A Subscription instance.
Returns: An instance of streamsets.sdk.sch_api.Command.

property alerts#

Alerts.

Returns: A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.Alert.

property api_credentials#

Api Credentials.

Returns: A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.ApiCredential.

assign_administrator_role(user)[source]#

Designate the specified user as an Organization Administrator.

Parameters: user (streamsets.sdk.sch_models.User) – User object.
Returns: An instance of streamsets.sdk.aster_api.Command.

balance_engines(*engines)[source]#

Balance all jobs running on given Engine.

Parameters: *engines – One or more instances of streamsets.sdk.sch_models.Engine.

balance_job(*jobs)[source]#

Balance one or more jobs.

Parameters: *jobs – One or more instances of streamsets.sdk.sch_models.Job.

check_snowflake_account_validation(token)[source]#

Check the status of a snowflake account validation process.

Parameters: token (str) – Token of the validation process.
Returns: An instance of streamsets.sdk.sch_api.Command.

clone_deployment(deployment, name=None, deployment_tags=None, engine_labels=None, engine_version=None)[source]#

Clone a deployment.

Parameters

deployment (streamsets.sdk.sch_models.Deployment) – Deployment object.
name (str, optional) – Name of the new deployment. Default: None.
deployment_tags (list, optional) – Tags to add to the cloned deployment. Default: `None.
engine_labels (list, optional) – Labels to assign to engines belonging to this deployment. Default: None.
engine_version (str, optional) – Engine Version ID Default: None.

Returns

An instance of streamsets.sdk.sch_models.Deployment.

property connection_audits#

Connection Audits.

Returns: An instance of streamsets.sdk.sch_models.ConnectionAudits.

property connection_tags#

Connection Tags.

Returns: An instance of streamsets.sdk.sch_models.ConnectionTags.

property connections#

Connections.

Returns: An instance of streamsets.sdk.sch_models.Connections.

create_components(component_type, number_of_components=1, active=True)[source]#

Create components.

Parameters

component_type (str) – Component type.
number_of_components (int, optional) – Default: 1.
active (bool, optional) – Default: True.

Returns

An instance of streamsets.sdk.sch_api.CreateComponentsCommand.

property data_collectors#

Data Collectors registered to the ControlHub instance.

Returns: Returns a streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.DataCollector instances.

property data_protector_enabled#

Whether Data Protector is enabled for the current organization.

Type: bool

deactivate_api_credential(api_credential)[source]#

Deactivate an api credential.

Parameters: api_credential (streamsets.sdk.sch_models.ApiCredential) – ApiCredential object.
Returns: An instance of streamsets.sdk.sch_api.Command.

deactivate_engine(engine)[source]#

Deactivate an engine.

Parameters: engine (streamsets.sdk.sch_models.Engine) – Engine instance.

deactivate_environment(*environments)[source]#

Deactivate environments.

Parameters: *environments – One or more instances of streamsets.sdk.sch_models.Environment.
Returns: An instance of streamsets.sdk.sch_api.Command.

deactivate_provisioning_agent(provisioning_agent)[source]#

Deactivate provisioning agent.

Parameters: provisioning_agent (streamets.sdk.sch_models.ProvisioningAgent) –
Returns: An instance of streamsets.sdk.sch_api.Command.

deactivate_user(*users)[source]#

Deactivate Users for all given User IDs.

Parameters: *users – One or more instances of streamsets.sdk.sch_models.User.
Returns: An instance of streamsets.sdk.aster_api.Command.

delete_and_unregister_engine(engine)[source]#

Delete and Unregister an engine.

Parameters: engine (streamsets.sdk.sch_models.Engine) – Engine instance.

delete_api_credentials(*api_credentials)[source]#

Delete api_credentials.

Parameters: *api_credentials – One or more instances of streamsets.sdk.sch_models.ApiCredential.

delete_connection(*connections)[source]#

Delete connections.

Parameters: *connections – One or more instances of streamsets.sdk.sch_models.Connection.

delete_deployment(*deployments, stop=True)[source]#

Delete deployments.

Parameters

*deployments (streamsets.sdk.sch_models.Deployment) – One or more deployments.
stop (bool) – Stop the deployment. Default True.

delete_draft_run(*draft_runs)[source]#

Remove a draft run.

Parameters: *draft_runs – One or more instances of streamsets.sdk.sch_models.DraftRun.

delete_engine(engine)[source]#

Delete an engine.

Parameters: engine (streamsets.sdk.sch_models.Engine) – Engine instance.

delete_environment(*environments)[source]#

Delete environments.

Parameters: *environments – One or more instances of streamsets.sdk.sch_models.Environment.
Returns: An instance of streamsets.sdk.sch_api.Command.

delete_group(*groups)[source]#

Delete groups.

Parameters: *groups – One or more instances of streamsets.sdk.sch_models.Group.
Returns: An instance of streamsets.sdk.sch_api.Command.

delete_job(*jobs)[source]#

Delete one or more jobs.

Parameters: *jobs – One or more instances of streamsets.sdk.sch_models.Job.

delete_job_sequences(*job_sequences)[source]#

Delete Job Sequences.

Parameters: *job_sequences – One or more instances of streamsets.sdk.sch_models.JobSequence
Returns: An instance of streamsets.sdk.sch_api.Command.

delete_legacy_deployment(*deployments)[source]#

Delete deployments.

Parameters: *deployments – One or more instances of streamsets.sdk.sch_models.LegacyDeployment.
Returns: An instance of streamsets.sdk.sch_api.Command.

delete_pipeline(pipeline, only_selected_version=False)[source]#

Delete a pipeline.

Parameters

pipeline (streamsets.sdk.sch_models.Pipeline) – Pipeline object.
only_selected_version (boolean) – Delete only current commit.

Returns

An instance of streamsets.sdk.sch_api.Command.

delete_pipeline_labels(*pipeline_labels)[source]#

Delete pipeline labels.

Parameters: *pipeline_labels – One or more instances of streamsets.sdk.sch_models.PipelineLabel.
Returns: An instance of streamsets.sdk.sch_api.Command.

delete_provisioning_agent(provisioning_agent)[source]#

Delete provisioning agent.

Parameters: provisioning_agent (streamets.sdk.sch_models.ProvisioningAgent) –
Returns: An instance of streamsets.sdk.sch_api.Command.

delete_provisioning_agent_token(provisioning_agent)[source]#

Delete provisioning agent token.

Parameters: provisioning_agent (streamets.sdk.sch_models.ProvisioningAgent) –
Returns: An instance of streamsets.sdk.sch_api.Command.

delete_report_definition(report_definition)[source]#

Delete an existing Report Definition.

Parameters: report_definition (streamsets.sdk.sch_models.ReportDefinition) – Report Definition instance.
Returns: An instance of streamsets.sdk.sch_api.Command.

delete_scheduled_tasks(*scheduled_tasks)[source]#

Delete given scheduled tasks.

Parameters: *scheduled_tasks – One or more instances of streamsets.sdk.sch_models.ScheduledTask
Returns: An instance of streamsets.sdk.sch_api.Command.

delete_snowflake_pipeline_defaults()[source]#

Delete the Snowflake pipeline defaults for this user.

Returns: An instance of streamsets.sdk.sch_api.Command.

delete_snowflake_user_credentials()[source]#

Delete the Snowflake user credentials.

Returns: An instance of streamsets.sdk.sch_api.Command.

delete_subscription(subscription)[source]#

Delete an exisiting Subscription.

Parameters: subscription (streamsets.sdk.sch_models.Subscription) – A Subscription instance.
Returns: An instance of streamsets.sdk.sch_api.Command.

delete_topology(topology, only_selected_version=False)[source]#

Delete a topology.

Parameters

topology (streamsets.sdk.sch_models.Topology) – Topology object.
only_selected_version (boolean) – Delete only current commit.

Returns

An instance of streamsets.sdk.sch_api.Command.

delete_unregistered_auth_tokens(executor_type)[source]#

Delete auth tokens for engines that have been unregistered.

Parameters: executor_type (str) – The executor type. Acceptable values are ‘DATACOLLECTOR’ or ‘TRANSFORMER’.
Returns: An instance of streamsets.sdk.sch_api.Command.

delete_user(*users, deactivate=False)[source]#

Delete users. Deactivate users before deleting if configured.

Parameters

*users – One or more instances of streamsets.sdk.sch_models.User.
deactivate (bool, optional) – Default: False.

Returns

An instance of streamsets.sdk.aster_api.Command.

property deployments#

Deployments.

Returns: An instance of streamsets.sdk.sch_models.Deployments.

disable_job_sequence(job_sequence)[source]#

Disable the Job Sequence.

Parameters: job_sequence – An instance of streamsets.sdk.sch_models.JobSequence
Returns: An instance of streamsets.sdk.sch_api.Command.

property draft_runs#

Draft Runs.

Returns: An instance of streamsets.sdk.sch_models.DraftRuns.

duplicate_job(job, name=None, description=None, number_of_copies=1)[source]#

Duplicate an existing job.

Parameters

job (streamsets.sdk.sch_models.Job) – Job object.
name (str, optional) – Name of the new job(s). Default: None. If not specified, name of the job with ' copy' appended to the end will be used.
description (str, optional) – Description for new job(s). Default: None.
number_of_copies (int, optional) – Number of copies. Default: 1.

Returns

A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.Job.

duplicate_pipeline(pipeline, name=None, description='New Pipeline', number_of_copies=1)[source]#

Duplicate an existing pipeline.

Parameters

pipeline (streamsets.sdk.sch_models.Pipeline) – Pipeline object.
name (str, optional) – Name of the new pipeline(s). Default: None.
description (str, optional) – Description for new pipeline(s). Default: 'New Pipeline'.
number_of_copies (int, optional) – Number of copies. Default: 1.

Returns

A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.Pipeline.

edit_job(job)[source]#

Edit a job.

Parameters: job (streamsets.sdk.sch_models.Job) – Job object.
Returns: An instance of streamsets.sdk.sch_models.Job.

enable_job_sequence(job_sequence)[source]#

Enable the Job Sequence.

Parameters: job_sequence – An instance of streamsets.sdk.sch_models.JobSequence
Returns: An instance of streamsets.sdk.sch_api.Command.

property engine_versions#

Deployment engine version. The engine_versions and engine_configuration (Deployment) properties: have the same object structure represented by DeploymentEngineConfiguration.

Returns: An instance of streamsets.sdk.sch_models.DeploymentEngineConfiguration.

property environments#

environments.

Returns: An instance of streamsets.sdk.sch_models.Environments.

export_jobs(jobs)[source]#

Export jobs to a compressed archive.

Parameters: jobs (list) – A list of streamsets.sdk.sch_models.Job instances.
Returns: An instance of type bytes indicating the content of zip file with job json files.

export_pipelines(pipelines, fragments=False, include_plain_text_credentials=False)[source]#

Export pipelines.

Parameters

pipelines (list) – A list of streamsets.sdk.sch_models.Pipeline instances.
fragments (bool) – Indicates if exporting fragments is needed.
include_plain_text_credentials (bool) – Indicates if plain text credentials should be included.

Returns

An instance of type bytes indicating the content of zip file with pipeline json files.

export_protection_policies(protection_policies)[source]#

Export protection policies to a compressed archive.

Parameters

protection_policies (list) – A list of streamsets.sdk.sch_models.ProtectionPolicy
instances. –

Returns

An instance of type bytes indicating the content of zip file with protection policy json files.

export_topologies(topologies)[source]#

Export topologies.

Parameters: topologies (list) – A list of streamsets.sdk.sch_models.Topology instances.
Returns: An instance of type bytes indicating the content of zip file with pipeline json files.

get_admin_tool(base_url, username, password)[source]#

Get ControlHub admin tool.

Returns: An instance of streamsets.sdk.sch_models.AdminTool.

get_api_credential_builder()[source]#

Get api credential Builder.

Returns: An instance of streamsets.sdk.sch_models.ApiCredentialBuilder.

get_classification_rule_builder()[source]#

Get a classification rule builder instance with which a classification rule can be created.

Returns: An instance of streamsets.sdk.sch_models.ClassificationRuleBuilder.

get_components(component_type_id, offset=None, len_=None, order_by='LAST_VALIDATED_ON', order='ASC')[source]#

Get components.

Parameters

component_type_id (str) – Component type id.
offset (str, optional) – Default: None.
len (str, optional) – Default: None.
order_by (str, optional) – Default: 'LAST_VALIDATED_ON'.
order (str, optional) – Default: 'ASC'.

get_connection_builder()[source]#

Get a connection builder instance with which a connection can be created.

Returns: An instance of streamsets.sdk.sch_models.ConnectionBuilder.

get_current_job_status(job)[source]#

Returns the current job status for given job id.

Parameters: job (streamsets.sdk.sch_models.Job) – Job object.

get_data_sla_builder()[source]#

Get a Data SLA builder instance with which a Data SLA can be created.

Returns: An instance of streamsets.sdk.sch_models.DataSlaBuilder.

get_deployment_builder(deployment_type='SELF')[source]#

Get a deployment builder instance with which a deployment can be created.

Parameters: ( (deployment_type) – obj: str, optional) Valid values are ‘AWS’, ‘GCP’ and ‘SELF’. Default 'SELF'
Returns: An instance of streamsets.sdk.sch_models.DeploymentBuilder.

get_engine_labels(engine)[source]#

Returns all labels assigned to an engine.

Parameters: engine (streamsets.sdk.sch_models.Engine) – Engine instance.
Returns: A list of engine assigned labels.

get_environment_builder(environment_type='SELF')[source]#

Get an environment builder instance with which an environment can be created.

Parameters: ( (environment_type) – obj: str, optional) Valid values are ‘AWS’, ‘GCP’, ‘KUBERNETES’ and ‘SELF’. Default 'SELF'
Returns: An instance of streamsets.sdk.sch_models.EnvironmentBuilder.

get_group_builder()[source]#

Get a group builder instance with which a group can be created.

Returns: An instance of streamsets.sdk.sch_models.GroupBuilder.

get_job_builder()[source]#

Get a job builder instance with which a job can be created.

Returns: An instance of streamsets.sdk.sch_models.JobBuilder.

get_job_sequence_builder()[source]#

Get a job sequence builder instance with which a job sequence can be created.

Returns: An instance of streamsets.sdk.sch_models.JobSequenceBuilder.

get_kubernetes_apply_agent_yaml_command(environment)[source]#

Get install script for a Kubernetes environment.

Parameters: environment (streamsets.sdk.sch_models.Environment) – Environment object.
Returns: An instance of str.

get_kubernetes_environment_yaml(environment)[source]#

Get the YAML file for a Kubernetes environment.

Parameters: environment (streamsets.sdk.sch_models.Environment) – Environment for which the YAML file should be fetched.
Returns: An instance of str.

get_legacy_deployment_builder()[source]#

Get a deployment builder instance with which a legacy deployment can be created.

Returns: An instance of streamsets.sdk.sch_models.LegacyDeploymentBuilder.

get_organization_builder()[source]#

Get an organization builder instance with which an organization can be created.

Returns: An instance of streamsets.sdk.sch_models.OrganizationBuilder.

get_pipeline_builder(engine_type=None, engine_id=None, fragment=False, **kwargs)[source]#

Get a pipeline builder instance with which a pipeline can be created.

Parameters

engine_type (str) – The type of pipeline that will be created. The options are 'data_collector', 'snowflake' or 'transformer'.
engine_id (str) – The ID of the Data Collector or Transformer in which to author the pipeline if not using Transformer for Snowflake.
fragment (boolean, optional) – Specify if a fragment builder. Default: False.

Returns

An instance of streamsets.sdk.sch_models.PipelineBuilder or streamsets.sdk.sch_models.StPipelineBuilder.

get_protection_method_builder()[source]#

Get a protection method builder instance with which a protection method can be created.

Returns: An instance of streamsets.sdk.sch_models.ProtectionMethodBuilder.

get_protection_policy_builder()[source]#

Get a protection policy builder instance with which a protection policy can be created.

Returns: An instance of streamsets.sdk.sch_models.ProtectionPolicyBuilder.

get_saql_search_builder(saql_search_type, mode='BASIC')[source]#

Get SAQL Search Builder.

saql_search_type (str): Type of SAQL search object, limited to``’PIPELINE’`` or 'FRAGMENT',

'JOB_INSTANCE', 'JOB_TEMPLATE',``’JOB_SEQUENCE’, '``JOB_RUN' and: 'JOB_DRAFT_RUN'.

mode (str, optional): mode of SAQL Search. Default: 'BASIC'

Returns: An instance of streamsets.sdk.sch_models.SAQLSearchBuilder.

get_scheduled_task_builder()[source]#

Get a scheduled task builder instance with which a scheduled task can be created.

Returns: An instance of streamsets.sdk.sch_models.ScheduledTaskBuilder.

get_self_managed_deployment_install_script(deployment, install_mechanism='DEFAULT', install_type=None, java_version=None)[source]#

Get install script for a Self Managed deployment.

Parameters

deployment (streamsets.sdk.sch_models.Deployment) – deployment object.
install_mechanism (str, optional) – Possible values for install are “DEFAULT”, “BACKGROUND” and “FOREGROUND”. Default: DEFAULT
install_type (str, optional) – Possible values for install are “DOCKER”, “TARBALL”. Default: None. If not provided will use the value supplied via the deployment.
java_version (str, optional) – Java Development Kit Version to be used. Example: “8”. Default: None.

Returns

An instance of streamsets.sdk.sch_api.Command.

get_snowflake_pipeline_defaults()[source]#

Get the Snowflake pipeline defaults for this user (if it exists).

Returns: A dict of Snowflake pipeline defaults.

get_snowflake_user_credentials()[source]#

Get the Snowflake user credentials (if they exist). They will be redacted.

Returns: A dict of Snowflake user credentials (redacted).

get_subscription_builder()[source]#

Get Event Subscription Builder.

Returns: An instance of streamsets.sdk.sch_models.SubscriptionBuilder.

get_topology_builder()[source]#

Get a topology builder instance with which a topology can be created.

Returns: An instance of streamsets.sdk.sch_models.TopologyBuilder.

get_user_builder()[source]#

Get a user builder instance with which a user can be created.

Returns: An instance of streamsets.sdk.sch_models.UserBuilder.

property groups#

Groups.

Returns: An instance of streamsets.sdk.sch_models.Groups.

import_jobs(archive, pipeline=True, number_of_instances=False, labels=False, runtime_parameters=False, **kwargs)[source]#

Import jobs from archived zip directory.

Parameters

archive (file) – file containing the jobs.
pipeline (boolean, optional) – Indicate if pipeline should be imported. Default: True.
number_of_instances (boolean, optional) – Indicate if number of instances should be imported. Default: False.
labels (boolean, optional) – Indicate if labels should be imported. Default: False.
runtime_parameters (boolean, optional) – Indicate if runtime parameters should be imported. Default: False.

Returns

A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.Job.

import_pipeline(pipeline, commit_message, name=None, data_collector_instance=None)[source]#

Import pipeline from json file.

Parameters

pipeline (dict) – A python dict representation of ControlHub Pipeline.
commit_message (str) – Commit message.
name (str, optional) – Name of the pipeline. If left out, pipeline name from JSON object will be used. Default None.
data_collector_instance (streamsets.sdk.sch_models.DataCollector) – If excluded, system sdc will be used. Default None.

Returns

An instance of streamsets.sdk.sch_models.Pipeline.

import_pipelines_from_archive(archive, commit_message, fragments=False)[source]#

Import pipelines from archived zip directory.

Parameters

archive (file) – file containing the pipelines.
commit_message (str) – Commit message.
fragments (bool, optional) – Indicates if pipeline contains fragments.

Returns

A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.Pipeline.

import_protection_policies(policies_archive)[source]#

Import protection policies from a compressed archive.

Parameters: policies_archive (file) – file containing the protection policies.
Returns: A py:class:streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.ProtectionPolicy.

import_topologies(archive, import_number_of_instances=False, import_labels=False, import_runtime_parameters=False, **kwargs)[source]#

Import topologies from archived zip directory.

Parameters

archive (file) – file containing the topologies.
import_number_of_instances (boolean, optional) – Indicate if number of instances should be imported. Default: False.
import_labels (boolean, optional) – Indicate if labels should be imported. Default: False.
import_runtime_parameters (boolean, optional) – Indicate if runtime parameters should be imported. Default: False.

Returns

A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.Topology.

invite_user(user)[source]#

Invite a user to the StreamSets Platform.

Parameters: user (streamsets.sdk.sch_models.User) – User object.
Returns: An instance of streamsets.sdk.sch_api.Command.

property job_sequences#

Job Sequences.

Returns: An instance of streamsets.sdk.sch_models.JobSequences.

property jobs#

Jobs.

Returns: An instance of streamsets.sdk.sch_models.Jobs.

kill_scheduled_tasks(*scheduled_tasks)[source]#

Kill given scheduled tasks.

Parameters: *scheduled_tasks – One or more instances of streamsets.sdk.sch_models.ScheduledTask
Returns: An instance of streamsets.sdk.sch_api.Command.

property ldap_enabled#

Indication if LDAP is enabled or not.

Returns: An instance of boolean.

property legacy_deployments#

Deployments.

Returns: An instance of streamsets.sdk.sch_models.LegacyDeployments.

lock_deployment(deployment)[source]#

Lock deployment.

Parameters: deployment – An instance of streamsets.sdk.sch_models.Deployment.

property login_audits#

Returns: An instance of streamsets.sdk.sch_models.LoginAudits.

mark_saql_search_as_favorite(saql_search)[source]#

Marks a saql search as a favorite. In other words, it stars an existing search query.

Parameters: saql_search (streamsets.sdk.sch_models.SAQLSearch) – The SAQL Search.
Returns: An instance of streamsets.sdk.sch_api.Command.

property metering_and_usage#

Metering and usage for the Organization. By default, this will return a report for the last 30 days of metering data. A report for a custom time window can be retrieved by indexing this object with a slice that contains a datetime object for the start (required) and stop (optional, defaults to datetime.now()).

ex. metering_and_usage[datetime(2022, 1, 1):datetime(2022, 1, 14)]
metering_and_usage[datetime.now() - timedelta(7):]

Returns: An instance of streamsets.sdk.sch_models.MeteringUsage

property organizations#

Organizations.

Returns: An instance of streamsets.sdk.sch_models.Organizations.

pause_scheduled_tasks(*scheduled_tasks)[source]#

Pause given scheduled tasks.

Parameters: *scheduled_tasks – One or more instances of streamsets.sdk.sch_models.ScheduledTask
Returns: An instance of streamsets.sdk.sch_api.Command.

property pipeline_labels#

Pipeline labels.

Returns: An instance of streamsets.sdk.sch_models.PipelineLabels.

property pipelines#

Pipelines.

Returns: An instance of streamsets.sdk.sch_models.Pipelines.

preview_classification_rule(classification_rule, parameter_data, data_collector=None)[source]#

Dynamic preview of a classification rule.

Parameters

classification_rule (streamsets.sdk.sch_models.ClassificationRule) – Classification Rule object.
parameter_data (dict) – A python dict representation of raw JSON parameters required for preview.
data_collector (streamsets.sdk.sch_models.DataCollector, optional) – The Data Collector in which to preview the pipeline. If omitted, ControlHub’s first executor SDC will be used. Default: None.

Returns

An instance of streamsets.sdk.sdc_api.PreviewCommand.

property protection_policies#

Protection policies.

Returns: An instance of streamsets.sdk.sch_models.ProtectionPolicies.

property provisioning_agents#

Provisioning Agents registered to the ControlHub instance.

Returns: An instance of streamsets.sdk.sch_models.ProvisioningAgents.

publish_job_sequence(job_sequence)[source]#

Publishes an Job Sequence to ControlHub.

Parameters: job_sequence – An instance of streamsets.sdk.sch_models.JobSequence
Returns: An instance of streamsets.sdk.sch_api.Command.

publish_pipeline(pipeline, commit_message='New pipeline', draft=False, validate=False)[source]#

Publish a pipeline.

Parameters

pipeline (streamsets.sdk.sch_models.Pipeline) – Pipeline object.
commit_message (str, optional) – Default: 'New pipeline'.
draft (boolean, optional) – Default: False.
validate (boolean, optional) – Default: False.

Returns

An instance of streamsets.sdk.sch_api.Command.

publish_scheduled_task(task)[source]#

Send the scheduled task to Control Hub. DEPRECATED

Parameters: task (streamsets.sdk.sch_models.ScheduledTask) – Scheduled task object.
Returns: An instance of streamsets.sdk.sch_api.Command.

publish_topology(topology, commit_message=None)[source]#

Publish a topology.

Parameters

topology (streamsets.sdk.sch_models.Topology) – Topology object to publish.
commit_message (str, optional) – Commit message to supply with the Topology. Default: None

Returns

An instance of streamsets.sdk.sch_api.Command.

regenerate_api_credential_auth_token(api_credential)[source]#

Regenerate the auth token for an api credential.

Parameters: api_credential (streamsets.sdk.sch_models.ApiCredential) – ApiCredential object.
Returns: An instance of streamsets.sdk.sch_api.Command.

remove_administrator_role(user)[source]#

Remove the Organization Administrator status from the specified user.

Parameters: user (streamsets.sdk.sch_models.User) – User object.
Returns: An instance of streamsets.sdk.aster_api.Command.

remove_saql_search(saql_search)[source]#

Remove a saql search.

Parameters: saql_search (streamsets.sdk.sch_models.SAQLSearch) – The SAQL Search object.
Returns: An instance of streamsets.sdk.sch_api.Command.

rename_api_credential(api_credential)[source]#

Rename an api credential.

Parameters: api_credential (streamsets.sdk.sch_models.ApiCredential) – ApiCredential object.
Returns: An instance of streamsets.sdk.sch_api.Command.

property report_definitions#

Report Definitions.

Returns: An instance of streamsets.sdk.sch_models.ReportDefinitions.

reset_origin(*jobs)[source]#

Reset all pipeline offsets for given jobs.

Parameters: *jobs – One or more instances of streamsets.sdk.sch_models.Job.
Returns: A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.Job.

restart_engines(*engines)[source]#

Restart the given engine(s).

Parameters: *engines – One or more instances of streamsets.sdk.sch_models.DataCollector or streamsets.sdk.sch_models.Transformer.
Returns: An instance of streamsets.sdk.sch_api.Command.

resume_scheduled_tasks(*scheduled_tasks)[source]#

Resume given scheduled tasks.

Parameters: *scheduled_tasks – One or more instances of streamsets.sdk.sch_models.ScheduledTask
Returns: An instance of streamsets.sdk.sch_api.Command.

run_job_sequence(job_sequence, start_from_step_number=None, single_step=False)[source]#

Run the Job Sequence.

Parameters

job_sequence – An instance of streamsets.sdk.sch_models.JobSequence.
start_from_step_number (int, optional) – The Step to start execution from. Default: None.
single_step (bool, optional) – Whether to only run the specified step. Default: False.

Returns

An instance of streamsets.sdk.sch_api.Command.

run_pipeline_preview(pipeline, batches=1, batch_size=10, skip_targets=True, skip_lifecycle_events=True, end_stage=None, only_schema=None, timeout=120000, test_origin=False, read_policy=None, write_policy=None, executor=None, **kwargs)[source]#

Run pipeline preview.

Parameters

pipeline (streamsets.sdk.sch_models.Pipeline) – Pipeline object.
batches (int, optional) – Number of batches. Default: 1.
batch_size (int, optional) – Batch size. Default: 10.
skip_targets (bool, optional) – Skip targets. Default: True.
skip_lifecycle_events (bool, optional) – Skip life cycle events. Default: True.
end_stage (str, optional) – End stage. Default None.
only_schema (bool, optional) – Only schema. Default: None.
timeout (int, optional) – Timeout. Default: 120000.
test_origin (bool, optional) – Test origin. Default: False.
read_policy (streamsets.sdk.sch_models.ProtectionPolicy) – Read Policy for preview. If not provided, uses default read policy if one available. Default: None.
write_policy (streamsets.sdk.sch_models.ProtectionPolicy) – Write Policy for preview. If not provided, uses default write policy if one available. Default: None.
executor (streamsets.sdk.sch_models.DataCollector, optional) – The Data Collector in which to preview the pipeline. If omitted, ControlHub’s first executor SDC will be used. Default: None.

Returns

An instance of streamsets.sdk.sdc_api.PreviewCommand.

run_snowflake_pipeline_preview(pipeline_id, rev=0, batches=1, batch_size=10, skip_targets=True, end_stage=None, only_schema=None, push_limit_down=True, timeout=2000, test_origin=False, stage_outputs_to_override_json=None, **kwargs)[source]#

Run Snowflake pipeline preview.

Parameters

pipeline_id (str) – The pipeline instance’s ID.
rev (int, optional) – Pipeline revision. Default: 0.
batches (int, optional) – Number of batches. Default: 1.
batch_size (int, optional) – Batch size. Default: 10.
skip_targets (bool, optional) – Skip targets. Default: True.
end_stage (str, optional) – End stage. Default: None.
only_schema (bool, optional) – Only schema. Default: None.
push_limit_down (bool, optional) – Push limit down. Default: True.
timeout (int, optional) – Timeout. Default: 2000.
test_origin (bool, optional) – Test origin. Default: False
stage_outputs_to_override_json (str, optional) – Stage outputs to override. Default: None.
wait (bool, optional) – Wait for pipeline preview to finish. Default: True.

Returns

An instance of streamsets.sdk.sdc_api.PreviewCommand.

property saql_saved_searches_draft_run#

Get SAQL Searches for type Draft Run.

Returns: An instance of streamsets.sdk.sch_models.SAQLSearches.

property saql_saved_searches_fragment#

Get SAQL Searches for type Fragment.

Returns: An instance of streamsets.sdk.sch_models.SAQLSearches.

property saql_saved_searches_job_instance#

Get SAQL Searches for type Job Instance.

Returns: An instance of streamsets.sdk.sch_models.SAQLSearches.

property saql_saved_searches_job_template#

Get SAQL Searches for type Job Template.

Returns: An instance of streamsets.sdk.sch_models.SAQLSearches.

property saql_saved_searches_pipeline#

Get SAQL Searches for type Pipeline.

Returns: An instance of streamsets.sdk.sch_models.SAQLSearches.

save_saql_search(saql_search)[source]#

Save a saql search query.

Parameters: saql_search (streamsets.sdk.sch_models.SAQLSearch) – The SAQL Search object.
Returns: An instance of streamsets.sdk.sch_api.Command.

scale_legacy_deployment(deployment, num_instances)[source]#

Scale up/down active deployment.

Parameters

deployment (streamsets.sdk.sch_models.LegacyDeployment) – Deployment object.
num_instances (int) – Number of sdc instances.

Returns

An instance of streamsets.sdk.sch_api.Command.

property scheduled_tasks#

Scheduled Tasks.

Returns: An instance of streamsets.sdk.sch_models.ScheduledTasks.

set_default_read_protection_policy(protection_policy)[source]#

Set a default read protection policy.

Parameters: protection_policy –

:param (streamsets.sdk.sch_models.ProtectionPolicy): Protection :param Policy object to be set as the default read policy.:

Returns: An updated instance of streamsets.sdk.sch_models.ProtectionPolicy.

set_default_write_protection_policy(protection_policy)[source]#

Set a default write protection policy.

Parameters: protection_policy –

:param (streamsets.sdk.sch_models.ProtectionPolicy): Protection :param Policy object to be set as the default write policy.:

Returns: An updated instance of streamsets.sdk.sch_models.ProtectionPolicy.

shutdown_engines(*engines)[source]#

Call shutdown on the given engine(s).

Parameters: *engines – One or more instances of streamsets.sdk.sch_models.DataCollector or streamsets.sdk.sch_models.Transformer.
Returns: An instance of streamsets.sdk.sch_api.Command.

start_deployment(*deployments)[source]#

Start Deployments.

Parameters: *deployments – One or more instances of streamsets.sdk.sch_models.Deployment.

start_draft_run(pipeline, reset_origin=False, runtime_parameters=None)[source]#

Start a draft run.

Parameters

pipeline (streamsets.sdk.sch_models.Pipeline) – Pipeline object. (Must be in draft mode)
reset_origin (boolean, optional) – Default: False.
runtime_parameters (dict, optional) – Pipeline runtime parameters. Default: None.

Returns

An instance of streamsets.sdk.sch_api.Command.

start_job(*jobs, wait=True, **kwargs)[source]#

Start one or more jobs.

Parameters

*jobs – One or more instances of streamsets.sdk.sch_models.Job.
wait (bool, optional) – Wait for pipelines to reach RUNNING status before returning. Default: True.
**kwargs – Arbitrary keyword arguments.

Returns

An instance of streamsets.sdk.sch_api.StartJobsCommand.

start_job_template(job_template, name=None, description=None, attach_to_template=True, delete_after_completion=False, instance_name_suffix='COUNTER', number_of_instances=1, parameter_name=None, raw_job_tags=None, runtime_parameters=None, wait_for_data_collectors=False, inherit_permissions=False, wait=True, asynchronous=False)[source]#

Start Job instances from a Job Template.

Parameters

job_template (streamsets.sdk.sch_models.Job) – A Job instance with the property job_template set to True.
name (str, optional) – Name of the new job(s). Default: None. If not specified, name of the job template with ' copy' appended to the end will be used.
description (str, optional) – Description for new job(s). Default: None.
attach_to_template (bool, optional) – Default: True.
delete_after_completion (bool, optional) – Default: False.
instance_name_suffix (str, optional) – Suffix to be used for Job names in {‘COUNTER’, ‘TIME_STAMP’, ‘PARAM_VALUE’}. Default: COUNTER.
number_of_instances (int, optional) – Number of instances to be started using given parameters. Default: 1.
parameter_name (str, optional) – Specified when instance_name_suffix is ‘PARAM_VALUE’. Default: None.
raw_job_tags (list, optional) – Default: None.
runtime_parameters (dict) or (list) – Runtime Parameters to be used in the jobs. If a dict is specified, number_of_instances jobs will be started. If a list is specified, number_of_instances is ignored and job instances will be started using the elements of the list as Runtime Parameters for each job. If left out, Runtime Parameters from Job Template will be used. Default: None.
wait_for_data_collectors (bool, optional) – Wait for the created job instance(s) to be assigned an engine before proceeding. Default: False.
inherit_permissions (bool, optional) – Parameter to determine if the user wants to inherit the ACL from the template instead of getting the default ACL for it. Default: False.
wait (bool, optional) – Wait for jobs to reach ACTIVE status before returning. Default: True.
asynchronous (bool, optional) – Whether to start and create the jobs asynchronously or not. Default: False.

Returns

A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.Job instances.

start_legacy_deployment(deployment, **kwargs)[source]#

Start Deployment.

Parameters

deployment (streamsets.sdk.sch_models.LegacyDeployment) – Deployment instance.
wait (bool, optional) – Wait for deployment to start. Default: True.
wait_for_statuses (list, optional) – Deployment statuses to wait on. Default: ['ACTIVE'].

Returns

An instance of streamsets.sdk.sch_api.DeploymentStartStopCommand.

stop_deployment(*deployments)[source]#

Stop Deployments.

Parameters: *deployments – One or more instances of streamsets.sdk.sch_models.Deployment.

stop_draft_run(*draft_runs, force=False, timeout_sec=300)[source]#

Stop a draft run.

Parameters

*draft_runs – One or more instances of streamsets.sdk.sch_models.DraftRuns.
force (bool, optional) – Force draft run to stop. Default: False.
timeout_sec (int, optional) – Timeout in secs. Default: 300.

Returns

An instance of streamsets.sdk.sch_api.StopJobsCommand.

stop_job(*jobs, force=False, timeout_sec=300)[source]#

Stop one or more jobs.

Parameters

*jobs – One or more instances of streamsets.sdk.sch_models.Job.
force (bool, optional) – Force job to stop. Default: False.
timeout_sec (int, optional) – Timeout in secs. Default: 300.

Returns

An instance of streamsets.sdk.sch_api.StopJobsCommand.

stop_legacy_deployment(deployment, wait_for_statuses=['INACTIVE'])[source]#

Stop Deployment.

Parameters

deployment (streamsets.sdk.sch_models.LegacyDeployment) – Deployment instance.
wait_for_statuses (list, optional) – List of statuses to wait for. Default: ['INACTIVE'].

Returns

An instance of streamsets.sdk.sch_api.DeploymentStartStopCommand.

stop_test_pipeline_run(start_pipeline_command)[source]#

Stop the test run of pipeline.

Parameters: start_pipeline_command (streamsets.sdk.sdc_api.StartPipelineCommand) –
Returns: An instance of streamsets.sdk.sdc_api.StopPipelineCommand.

property subscription_audits#

Subscription audits.

Returns: A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.SubscriptionAudit.

property subscriptions#

Event Subscriptions.

Returns: An instance of streamsets.sdk.sch_models.Subscriptions.

sync_job(*jobs)[source]#

Sync one or more jobs.

Parameters: *jobs – One or more instances of streamsets.sdk.sch_models.Job.

test_pipeline_run(pipeline, reset_origin=False, parameters=None)[source]#

Test run a pipeline.

Parameters

pipeline (streamsets.sdk.sch_models.Pipeline) – Pipeline object.
reset_origin (boolean, optional) – Default: False.
parameters (dict, optional) – Pipeline parameters. Default: None.

Returns

An instance of streamsets.sdk.sdc_api.StartPipelineCommand.

property topologies#

Topologies.

Returns: An instance of streamsets.sdk.sch_models.Topologies.

property transformers#

Transformers registered to the ControlHub instance.

Returns: Returns a streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.Transformer instances.

unlock_deployment(deployment)[source]#

Unlock deployment.

Parameters: deployment – An instance of streamsets.sdk.sch_models.Deployment.

update_connection(connection)[source]#

Update a connection.

Parameters: connection (streamsets.sdk.sch_models.Connection) – Connection object.
Returns: An instance of streamsets.sdk.sch_api.Command.

update_deployment(deployment)[source]#

Update a deployment.

Parameters: deployment (streamsets.sdk.sch_models.Deployment) – deployment object.
Returns: An instance of streamsets.sdk.sch_api.Command.

update_engine_labels(engine)[source]#

Update an engine’s labels.

Parameters: engine (streamsets.sdk.sch_models.Engine) – Engine instance.
Returns: An instance of streamsets.sdk.sch_models.Engine.

update_engine_resource_thresholds(engine, max_cpu_load=None, max_memory_used=None, max_pipelines_running=None)[source]#

Updates engine resource thresholds.

Parameters

engine (streamsets.sdk.sch_models.Engine) – Engine instance.
max_cpu_load (float, optional) – Max CPU load in percentage. Default: None.
max_memory_used (int, optional) – Max memory used in MB. Default: None.
max_pipelines_running (int, optional) – Max pipelines running. Default: None.

Returns

An instance of streamsets.sdk.sch_api.Command.

update_environment(environment, timeout_sec=300)[source]#

Update an environment.

Parameters: environment (streamsets.sdk.sch_models.Environment) – environment object.
Returns: An instance of streamsets.sdk.sch_api.Command.

update_group(group)[source]#

Update a group.

Parameters: group (streamsets.sdk.sch_models.Group) – Group object.
Returns: An instance of streamsets.sdk.sch_api.Command.

update_job(job)[source]#

Update a job.

Parameters: job (streamsets.sdk.sch_models.Job) – Job object.
Returns: An instance of streamsets.sdk.sch_models.Job.

update_job_sequence_metadata(job_sequence)[source]#

Update Job Sequence metadata.

Parameters: job_sequence – An instance of streamsets.sdk.sch_models.JobSequence
Returns: An instance of streamsets.sdk.sch_api.Command.

update_legacy_deployment(deployment)[source]#

Update a deployment.

Parameters: deployment (streamsets.sdk.sch_models.LegacyDeployment) – Deployment object.
Returns: An instance of streamsets.sdk.sch_api.Command.

update_organization(organization)[source]#

Update an organization.

Parameters: organization (streamsets.sdk.sch_models.Organization) – Organization instance.
Returns: An instance of streamsets.sdk.sch_api.Command.

update_pipelines_with_different_fragment_version(pipelines, from_fragment_version, to_fragment_version)[source]#

Update pipelines with latest pipeline fragment commit version.

Parameters

pipelines (list) – List of streamsets.sdk.sch_models.Pipeline instances.
from_fragment_version (streamsets.sdk.sch_models.PipelineCommit) – commit of fragment from which the pipeline needs to be updated.
to_fragment_version (streamsets.sdk.sch_models.PipelineCommit) – commit of fragment to which the pipeline needs to be updated.

Returns

An instance of streamsets.sdk.sch_api.Command

update_report_definition(report_definition)[source]#

Update an existing Report Definition.

Parameters: report_definition (streamsets.sdk.sch_models.ReportDefinition) – Report Definition instance.
Returns: An instance of streamsets.sdk.sch_api.Command.

update_saql_search(saql_search)[source]#

Update a saql search query.

Parameters: saql_search (streamsets.sdk.sch_models.SAQLSearch) – The SAQL Search object.
Returns: An instance of streamsets.sdk.sch_api.Command.

update_scheduled_task(task)[source]#

Update an existing scheduled task in Control Hub.

Parameters: task (streamsets.sdk.sch_models.ScheduleTask) – Scheduled task object.
Returns: An instance of py:class:streamsets.sdk.sch_api.Command.

update_snowflake_pipeline_defaults(account_url=None, database=None, warehouse=None, schema=None, role=None)[source]#

Create or update the Snowflake pipeline defaults for this user.

Parameters

account_url (str, optional) – Snowflake account url. Default: None
database (str, optional) – Snowflake database to query against. Default: None
warehouse (str, optional) – Snowflake warehouse. Default: None
schema (str, optional) – Schema used. Default: None
role (str, optional) – Role used. Default: None

Returns

An instance of streamsets.sdk.sch_api.Command.

update_snowflake_user_credentials(username, snowflake_login_type, password=None, private_key=None, role=None)[source]#

Create or update the Snowflake user credentials.

Parameters

username (str) – Snowflake account username.
snowflake_login_type (str) – Snowflake login type to use. Options are password and privateKey.
password (str, optional) – Snowflake account password. Default: None
private_key (str, optional) – Snowflake account private key. Default: None
role (str, optional) – Snowflake role of the account. Default: None

Returns

An instance of streamsets.sdk.sch_api.Command.

update_subscription(subscription)[source]#

Update an existing Subscription.

Parameters: subscription (streamsets.sdk.sch_models.Subscription) – A Subscription instance.
Returns: An instance of streamsets.sdk.sch_api.Command.

update_user(user)[source]#

Update a user. Some user attributes are updated by ControlHub such as: last_modified_by, last_modified_on.

Parameters: user (streamsets.sdk.sch_models.User) – User object.
Returns: An instance of streamsets.sdk.sch_api.Command.

upgrade_job(*jobs)[source]#

Upgrade job(s) to latest pipeline version.

Parameters: *jobs – One or more instances of streamsets.sdk.sch_models.Job.
Returns: A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.Job.

upload_offset(job, offset_file=None, offset_json=None)[source]#

Upload offset for given job.

Parameters

job (streamsets.sdk.sch_models.Job) – Job object.
offset_file (file, optional) – File containing the offsets. Default: None. Exactly one of offset_file, offset_json should specified.
offset_json (dict, optional) – Contents of offset. Default: None. Exactly one of offset_file, offset_json should specified.

Returns

An instance of streamsets.sdk.sch_models.Job.

property users#

Users.

Returns: An instance of streamsets.sdk.sch_models.Users.

validate_pipeline(pipeline)[source]#

Validate pipeline. Only Transformer for Snowflake pipelines are supported at this time.

Parameters: pipeline (streamsets.sdk.sch_models.Pipeline) – Pipeline instance.
Raises: streamsets.sdk.exceptions.ValidationError – If validation fails.

validate_snowflake_account(account_url, mode, snowflake_login_type, username, password=None, private_key=None)[source]#

Start a snowflake account validation process.

Parameters

account_url (str) – Snowflake url where the account should exist.
mode (str) – Mode of the account. Possible values are “EDIT”, “PUBLISHED” and “CURRENT_CRED”.
snowflake_login_type (str) – Login type for the snowflake account. Possible values are “PASSWORD” and “PRIVATE_KEY”.
username (str) – Username of the snowflake account.
password (str, optional) – Password of the snowflake account. Default: None.
private_key (str, optional) – Private key of the snowflake account. Default: None.

Returns

An instance of streamsets.sdk.sch_api.Command.

verify_connection(connection, library=None)[source]#

Verify connection.

Parameters

connection (streamsets.sdk.sch_models.Connection) – Connection object.
library (:py:`str`, optional) – Specify the library to test against. Default: None.

Returns

An instance of streamsets.sdk.sch_models.ConnectionVerificationResult.

wait_for_job_metric(job, metric, value, timeout_sec=200)[source]#

Block until a job’s realtime summary reaches the desired value for the desired metric.

Parameters

job (streamsets.sdk.sch_models.Job) – The job instance.
metric (str) – The desired metric (e.g. 'output_record_count' or 'input_record_count').
value (int) – The desired value to wait for.
timeout (int, optional) – Timeout to wait for metric to reach value, in seconds. Default: streamsets.sdk.sch.DEFAULT_WAIT_FOR_METRIC_TIMEOUT.

Raises

TimeoutError – If timeout passes without metric reaching value.

wait_for_job_metrics_record_count(job, count, timeout_sec=200)[source]#

Block until a job’s metrics reaches the desired count.

Parameters

job (streamsets.sdk.sch_models.Job) – The job instance.
count (int) – The desired value to wait for.
timeout_sec (int, optional) – Timeout to wait for metric to reach count, in seconds. Default: streamsets.sdk.sch.DEFAULT_WAIT_FOR_METRIC_TIMEOUT.

Raises

TimeoutError – If timeout_sec passes without metric reaching value.

wait_for_job_sequence_status(job_sequence, expected_status, timeout_sec=300)[source]#

Block until the job sequence reaches the desired status.

Parameters

job_sequence (streamsets.sdk.sch_models.JobSequence) – The JobSequence instance.
expected_status (str) – The desired status to wait for.
timeout_sec (int, optional) – Timeout to wait for job_sequence to reach expected_status, in seconds. Default: streamsets.sdk.sch.DEFAULT_WAIT_FOR_STATUS_TIMEOUT.

Raises

TimeoutError – If timeout_sec passes without job_sequence reaching expected_status.
TypeError – If job_sequence is not a streamsets.sdk.sch_models.JobSequence instance.

wait_for_job_status(job, status, check_failures=False, timeout_sec=300)[source]#

Block until a job reaches the desired status.

Parameters

job (streamsets.sdk.sch_models.Job) – The job instance.
status (str) – The desired status to wait for.
check_failures (bool, optional) – Flag to check for job exceptions. Default: False
timeout_sec (int, optional) – Timeout to wait for job to reach status, in seconds. Default: streamsets.sdk.sch.DEFAULT_WAIT_FOR_STATUS_TIMEOUT.

Raises

TimeoutError – If timeout_sec passes without job reaching status.

Models#

These models wrap and provide useful functionality for interacting with common SCH abstractions.

ACLs#

class streamsets.sdk.sch_models.ACL(*args, **kwargs)[source]#

Represents an ACL.

Parameters

acl (dict) – JSON representation of an ACL.
control_hub (streamsets.sdk.sch.ControlHub) – ControlHub object.

resource_id#

Resource ID of the ACL.

Type: str

resource_owner#

Resource owner of the ACL.

Type: str

resource_created_time#

Creation time of the resource.

Type: str

resource_type#

Resource type of the ACL.

Type: str

last_modified_by#

Who the resource was last modified by.

Type: str

last_modified_on#

Last modified time of the resource.

Type: str

permissions#

A Collection of Permissions.

Type: streamsets.sdk.sch_models.Permissions

permission_builder#

A Permission Builder instance.

Type: streamsets.sdk.sch_models.ACLPermissionBuilder

add_permission(permission)[source]#

Add new permission to the ACL.

Parameters: permission (streamsets.sdk.sch_models.Permission) – A permission object.
Returns: An instance of streamsets.sdk.sch_api.Command

property permission_builder#

Get a permission builder instance with which a pipeline can be created.

Returns: An instance of streamsets.sdk.sch_models.ACLPermissionBuilder.

remove_permission(permission)[source]#

Remove a permission from ACL.

Parameters: permission (streamsets.sdk.sch_models.Permission) – A permission object.
Returns: An instance of streamsets.sdk.sch_api.Command

class streamsets.sdk.sch_models.ACLPermissionBuilder(permission, acl)[source]#

Class to help build the ACL permission.

Parameters

permission (streamsets.sdk.sch_models.Permission) – A permission object.
acl (streamsets.sdk.sch_models.ACL) – An ACL object.

build(subject_id, subject_type, actions)[source]#

Method to help build the ACL permission.

Parameters

subject_id (str) – ID of the USER or GROUP.
subject_type (str) – Type of the subject. Accepted Values are ‘USER’, ‘GROUP’.
actions (list) – A list of actions of type str e.g. [‘READ’, ‘WRITE’, ‘EXECUTE’].

Returns

An instance of streamsets.sdk.sch_models.Permission.

class streamsets.sdk.sch_models.Permission(*args, **kwargs)[source]#

A container for a permission.

Parameters

permission (dict) – A Python object representation of a permission.
resource_type (str) – String representing the type of resource e.g. ‘JOB’, ‘PIPELINE’.
api_client (streamsets.sdk.sch_api.ApiClient) – An instance of ApiClient.

resource_id#

Id of the resource e.g. Pipeline or Job.

Type: str

subject_id#

Id of the subject e.g. user id 'admin@admin'.

Type: str

subject_type#

Type of the subject e.g. 'USER'.

Type: str

last_modified_by#

User who last modified this permission e.g. 'admin@admin'.

Type: str

last_modified_on#

Timestamp at which this permission was last modified e.g. 1550785079811.

Type: int

class VALID_ACTIONS(value)[source]#: An enumeration.

Alerts#

class streamsets.sdk.sch_models.Alert(*args, **kwargs)[source]#

Model for Alerts.

message#

The Alert’s message text.

Type: str

alert_status#

The status of the Alert.

Type: str

Parameters

alert (dict) – JSON representation of an Alert.
control_hub (streamsets.sdk.ControlHub) – Control Hub instance.

acknowledge()[source]#: Acknowledge an active Alert.

delete()[source]#: Delete an acknowledged Alert.

Api Credentials#

class streamsets.sdk.sch_models.ApiCredential(*args, **kwargs)[source]#

Model for ApiCredential.

Parameters

api_credential (dict) – A Python object representation of ApiCredential.
control_hub (streamsets.sdk.sch.ControlHub) – ControlHub object. Default: None

active#

Whether the API Credential is active or not.

Type: bool

auth_token#

Auth token to be used while making an API call.

Type: str

created_by#

User that created this API Credential.

Type: str

created_for#

The user who should use the API Credential.

Type: str

credential_id#

Credential ID to be used while making an API call.

Type: str

name#

Name of the API Credential.

Type: str

class streamsets.sdk.sch_models.ApiCredentials(control_hub)[source]#

Collection of streamsets.sdk.sch_models.ApiCredential instances.

Parameters: control_hub (streamsets.sdk.sch.ControlHub) – ControlHub object.

class streamsets.sdk.sch_models.ApiCredentialBuilder(api_credential, control_hub)[source]#

Class with which to build instances of streamsets.sdk.sch_models.ApiCredential.

Instead of instantiating this class directly, most users should use: streamsets.sdk.sch.ControlHub.get_api_credential_builder().

Parameters

api_credential (dict) – Python object built from our Swagger ApiCredentialJson definition.
control_hub (streamsets.sdk.sch.ControlHub) – ControlHub object.

build(name, user_id=None)[source]#

Build the ApiCredential.

Parameters

name (str) – ApiCredential name.
user_id (str, optional) – User ID for whom the credentials should be created, can only be used as an Org-Admin.

Returns

An instance of streamsets.sdk.sch_models.ApiCredential.

Classifiers#

class streamsets.sdk.sch_models.Classifier(*args, **kwargs)[source]#

Classifier model.

patterns#

Classifier patterns.

Type: str

Parameters: classifier (dict) – A Python dict representation of classifier.

Classification Rules#

class streamsets.sdk.sch_models.ClassificationRule(*args, **kwargs)[source]#

Classification Rule Model.

Parameters

classification_rule (dict) – A Python dict representation of classification rule.
classifiers (list) – A list of streamsets.sdk.sch_models.Classifier instances.

class streamsets.sdk.sch_models.ClassificationRuleBuilder(classification_rule, classifier)[source]#

Class with which to build instances of streamsets.sdk.sch_models.ClassificationRule.

Instead of instantiating this class directly, most users should use: streamsets.sdk.sch.ControlHub.get_classification_rule_builder().

Parameters

classification_rule (dict) – Python object defining a classification rule.
classifier (dict) – Python object defining a classifier.

add_classifier(patterns=None, match_with=None, regular_expression_type='RE2/J', case_sensitive=False)[source]#

Add classifier to the classification rule.

Parameters

patterns (list, optional) – List of strings of patterns. Default: None.
match_with (str, optional) – Default: None.
regular_expression_type (str, optional) – Default: 'RE2/J'.
case_sensitive (bool, optional) – Default: False.

Returns

An instance of streamsets.sdk.sch_models.Classifier.

build(name, category, score)[source]#

Build the classification rule.

Parameters

name (str) – Classification Rule name.
category (str) – Classification Rule category.
score (float) – Classification Rule score.

Returns

An instance of streamsets.sdk.sch_models.ClassificationRule.

Connections#

class streamsets.sdk.sch_models.Connection(*args, **kwargs)[source]#

Model for connection.

Parameters

connection (dict) – A Python object representation of Connection.
control_hub (streamsets.sdk.sch.ControlHub) – ControlHub object.

connection_definition#

The Connection Definition JSON.

Type: dict

library_definition#

The Library Definition JSON.

Type: dict

pipeline_commits (:py:class:`streamsets.sdk.utils.SeekableList` of: streamsets.sdk.sch_models.PipelineCommit instances): Pipeline commits using this connection.

acl#

Update Connection ACL.

Type: streamsets.sdk.sch_models.ACL

tags#

Connection tags.

Type: streamsets.sdk.utils.SeekableList of instances of streamsets.sdk.sch_models.Tag

property acl#

Get Connection ACL.

Returns: An instance of streamsets.sdk.sch_models.ACL.

add_tag(*tags)[source]#

Add a tag

Parameters: *tags – One or more instances of str

property pipeline_commits#

Get the pipeline commits using this connection.

Returns

A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.PipelineCommit: instances.

remove_tag(*tags)[source]#

Remove a tag

Parameters: *tags – One or more instances of str

property tags#

Get the connection tags.

Returns: A streamsets.sdk.utils.SeekableList of instances of streamsets.sdk.sch_models.Tag.

class streamsets.sdk.sch_models.Connections(control_hub, organization)[source]#

Collection of streamsets.sdk.sch_models.Connection instances.

Parameters

control_hub – An instance of streamsets.sdk.sch.ControlHub.
organization (str) – Organization Id.

class streamsets.sdk.sch_models.ConnectionBuilder(connection, control_hub)[source]#

Class with which to build instances of streamsets.sdk.sch_models.Connection.

Instead of instantiating this class directly, most users should use: streamsets.sdk.sch.ControlHub.get_connection_builder().

Parameters

connection (dict) – Python object built from our Swagger ConnectionJson definition.
control_hub (streamsets.sdk.sch.ControlHub) – ControlHub object.

build(title, connection_type, authoring_data_collector=None, tags=None, **kwargs)[source]#

Define the connection.

Parameters

title (str) – Connection title.
connection_type (str) – Type of connection. The options are ‘STREAMSETS_AWS_EMR_CLUSTER’, ‘STREAMSETS_MYSQL’, ‘STREAMSETS_SNOWFLAKE’, ‘STREAMSETS_COAP_CLIENT’, ‘STREAMSETS_OPC_UA_CLIENT’, ‘STREAMSETS_GOOGLE_PUB_SUB’, ‘STREAMSETS_MQTT’, ‘STREAMSETS_POSTGRES’, ‘STREAMSETS_GOOGLE_CLOUD_STORAGE’, ‘STREAMSETS_AWS_REDSHIFT’, ‘STREAMSETS_GOOGLE_BIG_QUERY’, ‘STREAMSETS_ORACLE’, ‘STREAMSETS_AWS_S3’, ‘STREAMSETS_REMOTE_FILE’, ‘STREAMSETS_SQLSERVER’, ‘STREAMSETS_AWS_SQS’, ‘STREAMSETS_SNOWPIPE’, and ‘STREAMSETS_JDBC’.
authoring_data_collector (streamsets.sdk.sch.DataCollector) – Authoring Data Collector.
tags (list, optional) – List of tags (strings). Default: None.

Returns

An instance of streamsets.sdk.sch_models.Connection.

class streamsets.sdk.sch_models.ConnectionVerificationResult(*args, **kwargs)[source]#

Model for connection verification result.

Parameters: connection_preview_json (dict) – dynamic preview API response JSON.

issue_count#

The count of the number of issues for the connection verification result.

Type: int

issue_message#

The message provided for the connection verification result.

Type: str

property issue_count#

The count of the number of issues for the connection verification result.

Returns: A int that represents the number of issues.

property issue_message#

The message provided for the connection verification result.

Returns: A str message detailing the response for the connection verification result.

DataCollectors#

class streamsets.sdk.sch_models.DataCollector(*args, **kwargs)[source]#

Model for Data Collector.

execution_mode#

True for Edge and False for SDC.

Type: bool

id#

Data Collectort id.

Type: str

labels#

Labels for Data Collector.

Type: list

last_validated_on#

Last validated time for Data Collector.

Type: str

reported_labels#

Reported labels for Data Collector.

Type: list

url#

Data Collector’s url.

Type: str

accessible#

Whether the Data Collector instance is accessible.

Type: bool

responding#

Whether the Data Collector instance is responding.

Type: bool

attributes#

Data Collector’s attributes.

Type: dict

attributes_updated_on#

When the Data Collector’s attributes were last updated on.

Type: str

authentication_token_generated_on#

When the Data Collector authentication token was generated.

Type: str

jobs#

The Data Collector’s jobs.

Type: streamsets.sdk.sch_models.Job

job_ids#

Data Collector’s job ids.

Type: str

registered_by#

Who registered the Data Collector.

Type: str

pipelines_committed#

Pipelines that have been committed.

Type: list of (str objects)

resource_thresholds#

DataCollector resource thresholds.

Type: str

acl#

DataCollector ACL.

Type: streamsets.sdk.sch_models.ACL

property accessible#: Returns a bool for whether the Data Collector instance is accessible.

property acl#

Get DataCollector ACL.

Returns: An instance of streamsets.sdk.sch_models.ACL.

property attributes#: Returns a dict of Data Collector attributes.

property attributes_updated_on#: Returns when the Data Collector attributes was updated.

property authentication_token_generated_on#: Returns when the Data Collector authentication token was generated.

property job_ids#: Returns the Data Collector Job ids.

property jobs#: Returns the Data Collector streamsets.sdk.sch_models.Job instances.

property pipelines_committed#

Pipelines that have been committed.

Returns: A list of Job IDs (str objects).

property registered_by#: Returns who the Data Collector was registered by.

property resource_thresholds#

Return DataCollector resource thresholds.

Returns

A dict of DataCollector thresholds named as “max_memory_used”, “max_cpu_load”: and “max_pipelines_running”

property responding#: Returns a bool for whether the Data Collector instance is responding.

Deployments#

class streamsets.sdk.sch_models.AzureVMDeployment(deployment, control_hub=None, **kwargs)[source]#

Model for Azure VM Deployment.

attach_public_ip#

Type: bool

azure_tags#

Azure tags to apply to all provisioned resources in this Azure deployment.

Type: dict

desired_instances#

Number of engine instances to deploy.

Type: int

key_pair_name#

SSH key pair.

Type: str

managed_identity#

Type: str

public_ssh_key#

Type: str

resource_group#

Type: str

ssh_key_source#

SSH key source.

Type: str

vm_size#

Type: str

zones#

Type: list

static get_ui_value_mappings()[source]#: This method returns a map for the values shown in the UI and internal constants to be used.

is_complete()[source]#: Checks if all required fields are set in this deployment.

class streamsets.sdk.sch_models.Deployment(deployment, control_hub=None, **kwargs)[source]#

Model for Deployment. This is an abstract class.

acl#

The Deployment ACL instance.

Type: streamsets.sdk.sch_models.ACL

created_by#

User that created this deployment.

Type: str

created_on#

Time at which this deployment was created.

Type: int

deployment_events#

Name of the deployment.

Type: DeploymentEvents

deployment_id#

Id of the deployment.

Type: str

deployment_name#

Name of the deployment.

Type: str

deployment_tags#

Raw deployment tags of the deployment.

Type: list

deployment_type#

Type of the deployment.

Type: str

desired_instances#

The Deployment desired number of instances.

Type: int

engine_configuration#

The Deployment Engine Configuration.

Type: DeploymentEngineConfiguration

engine_type#

The Deployment Engine type.

Type: str

engine_version#

The Deployment Engine Version.

Type: str

environment_id#

Enabled environment where engine will be deployed.

Type: list

last_modified_by#

User that last modified this deployment.

Type: str

last_modified_on#

Time at which this deployment was last modified.

Type: int

network_tags#

The Deployment network tags.

Type: list

scala_binary_version#

scala Binary Version.

Type: str

organization#

The Deployment organization.

Type: str

json_state#

State of the deployment - this is json field called state.

Type: str

state#

State of the deployment - displayed as state on UI.

Type: str

status#

Status of the deployment.

Type: str

status_detail#

Status detail of the deployment.

Type: str

registered_engines#

Registered Engines of the deployment.

Type: RegisteredEngines

tags#

Tags of the deployment.

Type: a streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.Tag instances

class ENGINE_TYPES(value)[source]#: An enumeration.

class TYPES(value)[source]#: An enumeration.

property acl#

Get deployment ACL.

Returns: An instance of streamsets.sdk.sch_models.ACL.

property deployment_events#

Get the events for a deployment.

Returns: An instance of streamsets.sdk.sch_models.DeploymentEvents

property engine_configuration#: The engine configuration of deployment engine. :returns: An instance of streamsets.sdk.sch_models.DeploymentEngineConfiguration.

property registered_engines#

Get the registered engines.

Returns: An instance of streamsets.sdk.sch_models.RegisteredEngines

property tags#

Get the tags.

Returns: A streamsets.sdk.utils.SeekableList of str instances.

class streamsets.sdk.sch_models.DeploymentBuilder(deployment, control_hub)[source]#

Class with which to build instances of streamsets.sdk.sch_models.Deployment.

Instead of instantiating this class directly, most users should use: streamsets.sdk.sch.ControlHub.get_deployment_builder().

Parameters

deployment (dict) – Python object built from our Swagger CspDeploymentJson definition.
control_hub (streamsets.sdk.sch.ControlHub) – ControlHub object.

build(deployment_name, engine_type, engine_version, environment, external_resource_location=None, engine_labels=None, max_cpu_load=80.0, max_memory_used=100.0, max_pipelines_running=1000000, deployment_tags=None, engine_build=None, scala_binary_version=None, **kwargs)[source]#

Define the deployment.

Parameters

deployment_name (str) – deployment name.
engine_type (str) – Type of engine to deploy.
engine_version (str) – Version of engine to deploy.
environment (streamsets.sdk.sch_models.Environment) – The environment instance.
external_resource_location (str, optional) – External Resource Location URL. Default: None.
engine_labels (list, optional) – List of labels (strings). Default: None.
max_cpu_load (float or int, optional) – Max CPU Load (percent). Default: 80.0.
max_memory_used (float or int, optional) – Max Memory Used (percent). Default: 100.0.
max_pipelines_running (int, optional) – Max Running Pipeline Count. Default: 1000000.
deployment_tags (list, optional) – List of tags (strings). Default: None.
engine_build (str, optional) – Build of engine to deploy. Default: None.
scala_binary_version (str, optional) – Scala binary version required in case of engine_type=’TF’ Default: None.

Returns

An instance of subclass of streamsets.sdk.sch_models.Deployment.

class streamsets.sdk.sch_models.DeploymentEngineAdvancedConfiguration(deployment_engine_advanced_config, deployment=None)[source]#

Model for advanced configuration in Deployment engine configuration.

data_collector_configuration#

The Data Collector Configuration.

Type: dict

transformer_configuration#

The Transformer Configuration.

Type: dict

credential_stores#

Type: dict

proxy_properties#

Proxy Properties for the Deployment engine configuration.

Type: dict

security_policy#

Security Policy for the Deployment engine configuration.

Type: dict

class streamsets.sdk.sch_models.DeploymentEngineConfiguration(*args, **kwargs)[source]#

Model for Deployment engine configuration.

advanced_configuration#

Type: str

aws_image_id#

Type: str

azure_image_id#

Type: str

created_by#

User that created this deployment.

Type: str

created_on#

Time at which this deployment was created.

Type: int

download_url#

Type: list

engine_type#

engine type.

Type: list

engine_version#

Engine version to deploy.

Type: list

id#

User that created this deployment.

Type: str

gcp_image_id#

Type: str

last_modified_by#

User that last modified this deployment.

Type: str

last_modified_on#

Time at which this deployment was last modified.

Type: int

scala_binary_version#

scala Binary Version.

Type: str

java_configuration#

Java Configuration instance.

Type: DeploymentEngineJavaConfiguration

stage_libs#

DeploymentStageLibraries instance.

Type: DeploymentStageLibraries

class streamsets.sdk.sch_models.DeploymentEngineConfigurations(control_hub)[source]#

Collection of streamsets.sdk.sch_models.DeploymentEngineConfiguration instances.

Parameters: control_hub – An instance of streamsets.sdk.sch.ControlHub.

class streamsets.sdk.sch_models.DeploymentEngineJavaConfiguration(deployment_engine_java_configuration, deployment=None)[source]#

Model for Deployment engine Java configuration.

id#

Type: str

java_options#

Type: str

java_memory_strategy (: object:ABSOLUTE/PERCENTAGE)

maximum_java_heap_size_in_mb#

Type: long

minimum_java_heap_size_in_mb#

Type: long

maximum_java_heap_size_in_percent#

Type: int

minimum_java_heap_size_in_percent#

Type: int

class streamsets.sdk.sch_models.DeploymentStageLibraries(values, deployment_config)[source]#

Wrapper class for the list of stage libraries in a deployment.

Parameters

values (list) – A list of stage library names.
deployment_config (streamsets.sdk.sch_models.DeploymentEngineConfiguration) – The deployment engine configuration object that these stage libraries pertain to.

append(value)[source]#: Append object to the end of the list.

extend(libs)[source]#: Extend list by appending elements from the iterable.

remove(value)[source]#

Remove first occurrence of value.

Raises ValueError if the value is not present.

class streamsets.sdk.sch_models.EC2Deployment(deployment, control_hub=None, **kwargs)[source]#

Model for EC2 Deployment.

ec2_instance_type#

Type EC2 engine instance.

Type: str

desired_instances#

No. of EC2 engine instances.

Type: int

instance_profile#

arn of instance profile created as an environment prerequisite.

Type: str

key_pair_name#

SSH key pair.

Type: str

aws_tags#

AWS tags to apply to all provisioned resources in this AWS deployment.

Type: dict

ssh_key_source#

SSH key source.

Type: str

tracking_url#

Type: str

is_complete()[source]#: Checks if all required fields are set in this deployment.

class streamsets.sdk.sch_models.GCEDeployment(deployment, control_hub=None, **kwargs)[source]#

Model for GCE Deployment.

block_project_ssh_keys#

Blocks the use of project-wide public SSH keys to access the provisioned VM instances.

Type: bool

desired_instances#

Number of engine instances to deploy.

Type: int

gcp_labels#

GCP labels to apply to all provisioned resources in your GCP project.

Type: dict

instance_service_account#

Instance service account.

Type: str

machine_type#

machine type to use for the provisioned VM instances.

Type: str

public_ssh_key#

full contents of public SSH key associated with each VM instance.

Type: str

region#

Region to provision VM instances in.

Type: str

tracking_url#

Tracking URL.

Type: str

zone#

GCP zone

Type: list

is_complete()[source]#: Checks if all required fields are set in this deployment.

class streamsets.sdk.sch_models.KubernetesDeployment(deployment, control_hub=None, **kwargs)[source]#

Model for Kubernetes Deployment.

kubernetes_labels#

Kubernetes labels to apply to all provisioned resources in your Kubernetes project.

Type: dict

is_complete()[source]#: Checks if all required fields are set in this deployment.

property kubernetes_labels#

Kubernetes labels for the deployment.

Returns: value pairs of labels.
Return type: A (dict) of key

class streamsets.sdk.sch_models.SelfManagedDeployment(deployment, control_hub=None, **kwargs)[source]#

Model for self managed Deployment.

class InstallMechanism(value)[source]#: An enumeration.

class InstallType(value)[source]#: An enumeration.

install_script(install_mechanism='DEFAULT', install_type=None, java_version=None)[source]#

Gets the installation script to be run for this deployment.

Parameters

install_mechanism (str, optional) – Possible values for install are “DEFAULT”, “BACKGROUND” and “FOREGROUND”. Default: DEFAULT.
install_type (str, optional) – Possible values for install are “DOCKER”, “TARBALL”. Default: None. If not provided will use the value supplied via the deployment.
java_version (str, optional) – Java Development Kit Version to be used. Example: “8”. Default: None.

Returns

An str instance of the installation command.

is_complete()[source]#: Checks if all required fields are set in this deployment.

DraftRun#

class streamsets.sdk.sch_models.DraftRun(*args, **kwargs)[source]#

Model for DraftRun.

commit_id#

Pipeline commit id.

Type: str

commit_label#

Pipeline commit label.

Type: str

created_by#

User that created this job.

Type: str

created_on#

Time at which this job was created.

Type: int

data_collector_labels#

Labels of the data collectors.

Type: list

delete_after_completion#

Flag that indicates if this job should be deleted after completion.

Type: bool

description#

Job description.

Type: str

enable_failover#

Flag that indicates if failover is enabled.

Type: bool

enable_time_series_analysis#

Flag that indicates if time series is enabled.

Type: bool

execution_mode#

True for Edge and False for SDC.

Type: bool

job_id#

Id of the job.

Type: str

job_name#

Name of the job.

Type: str

last_modified_by#

User that last modified this job.

Type: str

last_modified_on#

Time at which this job was last modified.

Type: int

number_of_instances#

Number of instances.

Type: int

pipeline_force_stop_timeout#

Timeout for Pipeline force stop.

Type: int

pipeline_id#

Id of the pipeline that is running the job.

Type: str

pipeline_name#

Name of the pipeline that is running the job.

Type: str

pipeline_rule_id#

Rule Id of the pipeline that is running the job.

Type: str

read_policy#

Read Policy of the job.

Type: streamsets.sdk.sch_models.ProtectionPolicy

runtime_parameters#

Run-time parameters of the job.

Type: str

static_parameters#

List of parameters wjat cannot be overriden.

Type: list

statistics_refresh_interval_in_millisecs#

Refresh interval for statistics in milliseconds.

Type: int

status#

Status of the job.

Type: string

snapshots#

Snapshots belonging to the draft run.

Type: streamsets.sdk.utils.SeekableList) of (streamsets.sdk.sch_models.Snapshot

capture_snapshot()[source]#

Generate a new snapshot.

Returns: An instance of streamsets.sdk.sch_api.Command.

get_logs(ending_offset=- 1)[source]#

Retrieve the logs from the engine.

Parameters: ending_offset (int, optional) – The offset to capture logs up until. Default: -1
Returns: A list of dict instances, one dictionary per log line.

remove_snapshot(snapshot)[source]#

Remove a snapshot.

Parameters: snapshot (streamsets.sdk.sch_models.Snapshot) – Snapshot object.
Returns: An instance of streamsets.sdk.sch_api.Command.

property snapshots#

Snapshots belonging to the draft run.

Returns: A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.Snapshot instances.

DraftRuns#

class streamsets.sdk.sch_models.DraftRuns(control_hub)[source]#

Collection of streamsets.sdk.sch_models.DraftRun instances.

Parameters: control_hub (streamsets.sdk.sch.ControlHub) – ControlHub object.

Engines#

class streamsets.sdk.sch_models.Engine(*args, **kwargs)[source]#

Model for Platform Engines.

acl#

The engine ACL instance.

Type: streamsets.sdk.sch_models.ACL

configuration#

The engine Configuration instance.

Type: streamsets.sdk.models.Configuration

id#

ID of the engine.

Type: str

organization#

Organization that the engine belongs to.

Type: str

engine_url#

URL registered for the engine.

Type: str

version#

Version of the engine.

Type: str

labels#

Labels for the engine.

Type: list of str instances

last_reported_time#

The last time the engine contacted StreamSets Platform.

Type: str

start_up_time#

The time the engine was started.

Type: str

edge#

Whether this is an Edge engine or not.

Type: bool

cpu_load#

The percent utilization of the CPU by the engine.

Type: float

memory_used#

The amount of memory used by the engine in MB.

Type: float

total_memory#

The total amount of memory configured for the engine in MB.

Type: float

running_pipelines#

The total number of pipelines running on the engine.

Type: int

responding#

Whether the engine is responding or not.

Type: bool

engine_type#

The type of engine.

Type: str

max_cpu_load#

The percentage limit on CPU load configured for this engine.

Type: int

max_memory_used#

The percentage limit on memory configured for this engine.

Type: int

max_pipelines_running#

The limit on the number of concurrent pipelines for this engine.

Type: int

deployment_id#

The ID of the deployment that the engine belongs to.

Type: str

directories#

The directories of the engine.

Type: dict

external_libraries#

The External Library of the engine.

Type: streamsets.sdk.sch_models.ExternalLibrary

memory_used_mb#

The memory used in MB by the engine.

Type: float

resources#

The External Resource of the engine.

Type: streamsets.sdk.sch_models.ExternalResource

responding#

Whether the engine is responding.

Type: bool

running_pipelines#

The running pipelines of the engine.

Type: list of streamsets.sdk.sch_models.Pipeline instances

running_pipelines_count#

The running pipelines count of the engine.

Type: int

total_memory_mb#

The total memory in MB of the engine.

Type: int

user_stage_libraries#

The user stage libraries of the engine.

Type: dict

add_external_libraries(stage_library, *libraries)[source]#

Add external libraries to the engine.

Parameters

stage_library (str) – The stage library that includes the stage requiring the external library.
*libraries – One or more file instances to add to an engine, in binary format.

add_resources(*resources)[source]#

Add resource files to the engine.

Parameters: *resources – One or more file instances to add to an engine, in binary format.

delete_external_libraries(*libraries)[source]#

Delete external libraries from the engine.

Parameters: *libraries – One or more streamsets.sdk.sch_models.ExternalLibrary instances to delete from the engine.

delete_resources(*resources)[source]#

Delete resource files on the engine.

Parameters: *resources – One or more streamsets.sdk.sch_models.ExternalResource instances to delete from the engine.

property external_libraries#

Get the external libraries for the engine.

Returns: A list of streamsets.sdk.sch_models.ExternalLibrary instances.

get_logs(ending_offset=- 1)[source]#

Retrieve the logs for the engine.

Parameters: ending_offset (int, optional) – The offset to capture logs up until. Default: -1
Returns: A list of dict instances, one dictionary per log line.

get_thread_dump()[source]#

Generate a thread dump for the engine.

Returns: A list of dict instances, one dictionary per thread.

refresh()[source]#: Retrieve the latest state of the Engine from Platform, and update the in-memory representation.

property resources#

Get the external resources for the engine.

Returns: A list of streamsets.sdk.sch_models.ExternalResource instances.

class streamsets.sdk.sch_models.Engines(control_hub)[source]#

Collection of streamsets.sdk.sch_models.Engine instances.

Parameters: control_hub (streamsets.sdk.sch.ControlHub) – ControlHub object.

Environments#

class streamsets.sdk.sch_models.AWSEnvironment(environment, control_hub=None, **kwargs)[source]#

Model for AWS Environment.

access_key_id#

AWS access key id to access AWS account. Required when credential_type is AWS_STATIC_KEYS.

Type: str

aws_tags#

AWS tags to apply to all provisioned resources in this AWS deployment.

Type: dict

credentials#

Credentials of the environment.

Type: dict

credential_type#

Credential Type of the environment.

Type: str

default_instance_profile#

aarn of default instance profile created as an environment prerequisite.

Type: dict

region#

AWS region where VPC is located in case of AWS environment.

Type: str

role_arn#

Role AR of the cross-account role created as an environment prerequisite in user’s AWS account. Required when credential_type is AWS_CROSS_ACCOUNT_ROLE.

Type: str

secret_access_key#

AWS secret access key to access AWS account. Required when credential_type is AWS_STATIC_KEYS.

Type: str

subnet_ids#

Range of Subnet IDs in the VPC.

Type: str

security_group_id#

Security group ID.

Type: str

vpc_id#

Id of the Amazon VPC created as an environment prerequisite in user’s AWS account.

Type: str

static get_ui_value_mappings()[source]#: This method returns a map for the values shown in the UI and internal constants to be used.

is_complete()[source]#: Checks if all required fields are set in this environment.

class streamsets.sdk.sch_models.AzureEnvironment(environment, control_hub=None, **kwargs)[source]#

Model for Azure Environment.

azure_tags#

Azure tags for this environment.

Type: dict

credential_type#

Credential type of the environment.

Type: str

client_id#

Client ID of the environment.

Type: str

client_secret#

Client Secret of the environment.

Type: str

credentials#

Credentials of the environment.

Type: dict

default_managed_identity#

default managed identity.

Type: str

default_resource_group#

default resource group.

Type: str

region#

Azure region where vnet is located.

Type: str

subnet_id#

Subnet ID in the vnet.

Type: str

subscription_id#

Subscription ID in the vnet.

Type: str

security_group_id#

Security group ID.

Type: str

tenet_id#

Tenet ID of the environment.

Type: str

vnet_id#

Id of the vnet.

Type: str

static get_ui_value_mappings()[source]#: This method returns a map for the values shown in the UI and internal constants to be used.

is_complete()[source]#: Checks if all required fields are set in this environment.

class streamsets.sdk.sch_models.Environment(environment, control_hub=None, **kwargs)[source]#

Model for Environment. This is an abstract class.

acl#

Environment ACL.

Type: streamsets.sdk.sch_models.ACL

tags#

Environment tags.

Type: streamsets.sdk.utils.SeekableList of str instances

allow_nightly_engine_builds#

Whether or not the environment allows use of nightly engine build.

Type: bool

created_by#

User that created this environment.

Type: str

created_on#

Time at which this environment was created.

Type: int

environment_id#

Id of the environment.

Type: str

environment_name#

Name of the environment.

Type: str

environment_tags#

Raw environment tags of the environment.

Type: list

environment_type#

Type of the environment.

Type: str

last_modified_by#

User that last modified this environment.

Type: str

last_modified_on#

Time at which this environment was last modified.

Type: int

json_state#

State of the environment - which is stored in json field called state.

Type: str

state#

State of the environment stateDisplayLabel - shown on the UI as state.

Type: str

status#

Status of the environment.

Type: str

status_detail#

Status detail of the environment.

Type: str

class TYPES(value)[source]#: An enumeration.

property acl#

Get environment ACL.

Returns: An instance of streamsets.sdk.sch_models.ACL.

property tags#

Get the tags.

Returns: A streamsets.sdk.utils.SeekableList of str instances.

class streamsets.sdk.sch_models.EnvironmentBuilder(environment, control_hub)[source]#

Class with which to build instances of streamsets.sdk.sch_models.Environment.

Instead of instantiating this class directly, most users should use: streamsets.sdk.sch.ControlHub.get_environment_builder().

Parameters

environment (dict) – Python object built from our Swagger CspEnvironmentJson definition.
control_hub (streamsets.sdk.sch.ControlHub) – ControlHub object.

build(environment_name, **kwargs)[source]#

Define the environment.

Parameters

environment_name (str) – environment name.
allow_nightly_builds (bool, optional) – Default: False.
allow_nightly_engine_builds (bool, optional) – Default: False.
environment_tags (list, optional) – List of tags (strings). Default: None.

Returns

An instance of subclass of streamsets.sdk.sch_models.Environment.

class streamsets.sdk.sch_models.GCPEnvironment(environment, control_hub=None, **kwargs)[source]#

Model for GCP Environment.

credentials#

Credentials of the environment.

Type: dict

account_key_json#

JSON file contents for the Service Account Key. Required when credential_type is GCP_SERVICE_ACCOUNT_KEY.

Type: str

project#

Project where VPC is located.

Type: str

gcp_labels#

GCP labels for this environment.

Type: dict

gcp_tags#

GCP tags to apply to all resources provisioned in GCP account.

Type: str

service_account_email#

Service account ID. Required when credential_type is GCP_SERVICE_ACCOUNT_IMPERSONATION.

Type: str

vpc_id#

Id of the GCP VPC created as an environment prerequisite in user’s GCP account.

Type: str

fetch_available_machine_types(zone_id)[source]#: Returns the available machine types for the given Environment and GCP Project and GCP Zone.

fetch_available_projects()[source]#: Returns the available projects for the given Environment.

fetch_available_regions()[source]#: Returns the available regions for the given Environment and GCP Project.

fetch_available_service_accounts()[source]#: Returns the available service accounts for the given Environment and GCP Project.

fetch_available_vpcs()[source]#: Returns the available Networks for the given Environment and GCP Project. Here it is called vpcs since on UI it is shown as VPC.

fetch_available_zones(region_id)[source]#: Returns the available zones for the given environment and GCP project and GCP region.

static get_ui_value_mappings()[source]#: This method returns a map for the values shown in the UI and internal constants to be used.

is_complete()[source]#: Checks if all required fields are set in this environment.

class streamsets.sdk.sch_models.KubernetesEnvironment(environment, control_hub=None, **kwargs)[source]#

Model for Kubernetes Environment.

agent_event_logs (:obj:`list` of: py:class:`streamsets.sdk.sch_models.KubernetesAgentEvent instances): Agent logs for the environment.

agent_java_options#

Java configuration options set for the Agent.

Type: str

agent_status#

Status of the Agent.

Type: str

agent_status_detail#

Details pertaining to the status of the Agent.

Type: str

agent_version#

Version of the Agent.

Type: str

allow_nightly_engine_builds#

Whether or not the environment allows use of nightly engine build. versions.

Type: bool

created_by#

ID of the user who created the environment.

Type: str

created_on#

Millisecond timestamp of when the environment was created.

Type: int

environment_id#

ID of the environment.

Type: str

environment_name#

Name of the environment.

Type: str

environment_tags#

Tags assigned to the environment.

Type: list of str instances

environment_type (: obj:`str): Type of the environment.

kubernetes_labels#

Kubernetes labels to apply to all resources provisioned in the Kubernetes namespace.

Type: dict

kubernetes_namespace#

Name of the Kubernetes namespace to provision the resources in.

Type: str

last_modified_by#

ID of the user who last modified the environment.

Type: str

last_modified_on#

Millisecond timestamp of the last time the environment was modified.

Type: int

state#

The current state of the environment.

Type: str

property agent_event_logs#

Get the Agent event logs for this environment.

Returns: An instance of streamsets.sdk.sch_models.KubernetesAgentEvents

property agent_java_options#

Get the Agent’s extra java options.

Returns: A str instance of the agent’s extra java options.

property agent_status#

Get the Agent’s status.

Returns: A str instance of the agent’s status.

property agent_status_detail#

Get the details of the Agent’s status

Returns: A str instance of the agent’s status detail.

apply_agent_yaml_command()[source]#

Gets the installation script to be run for this environment.

Returns: An str instance of the installation command.

property kubernetes_labels#

Kubernetes labels for the environment.

Returns: value pairs of labels.
Return type: A (dict) of key

class streamsets.sdk.sch_models.SelfManagedEnvironment(environment, control_hub=None, **kwargs)[source]#: Model for Self Managed Environment.

Transformers#

class streamsets.sdk.sch_models.Transformer(*args, **kwargs)[source]#

Model for Transformer.

execution_mode#

Type: str

id#

Transformer id.

Type: str

labels#

Labels for Transformer.

Type: list

last_validated_on#

Last validated time for Transformer.

Type: str

reported_labels#

Reported labels for Transformer.

Type: list

url#

Transformer’s url.

Type: str

version#

Transformer’s version.

Type: str

accessible#

Whether the Transformer instance is accessible.

Type: bool

attributes#

Transformer’s attributes.

Type: dict

attributes_updated_on#

When the Transformer’s attributes were last updated on.

Type: str

authentication_token_generated_on#

When the Transformer authentication token was generated.

Type: str

registered_by#

Who registered the Transformer.

Type: str

acl#

Transformer ACL.

Type: streamsets.sdk.sch_models.ACL

property accessible#: Returns a bool for whether the Transformer instance is accessible.

property acl#

Get Transformer ACL.

Returns: An instance of streamsets.sdk.sch_models.ACL.

property attributes#: Returns a dict of Transformer attributes.

property attributes_updated_on#: Returns when the Transformer attributes was updated.

property authentication_token_generated_on#: Returns when the Transformer authentication token was generated.

property registered_by#: Returns who the Transformer was registered by.

Groups#

class streamsets.sdk.sch_models.Group(*args, **kwargs)[source]#

Model for Group.

Parameters

group (dict) – A Python object representation of Group.
roles (dict) – A mapping of role IDs to role labels.
control_hub (streamsets.sdk.ControlHub, optional) – An instance of Control Hub. Default: None

created_by#

Creator of this group.

Type: str

created_on#

Creation time of this group.

Type: str

display_name#

Display name of this group.

Type: str

group_id#

ID of this group.

Type: str

last_modified_by#

Group last modified by.

Type: str

last_modified_on#

Group last modification time.

Type: str

organization#

Organization this group belongs to.

Type: str

roles#

A set of role labels.

Type: set

users#

Users that are a member of this group.

Type: streamsets.sdk.util.SeekableList of streamsets.sdk.sch_models.User instances

property roles#: Get the roles for the group.

property users#: Get the users that are members of the group.

class streamsets.sdk.sch_models.Groups(control_hub, roles, organization)[source]#

Collection of streamsets.sdk.sch_models.Group instances.

Parameters

control_hub – An instance of streamsets.sdk.sch.ControlHub.
roles (dict) – A mapping of role IDs to role labels.
organization (str) – Organization ID.

class streamsets.sdk.sch_models.GroupBuilder(group, roles, control_hub=None)[source]#

Class with which to build instances of streamsets.sdk.sch_models.Group.

Instead of instantiating this class directly, most users should use: streamsets.sdk.sch.ControlHub.get_group_builder().

Parameters

group (dict) – Python object built from our Swagger GroupJson definition.
roles (dict) – A mapping of role IDs to role labels.
control_hub (streamsets.sdk.sch.ControlHub, optional) – ControlHub object. Default: None

build(display_name, group_id=None, ldap_groups=None)[source]#

Build the group.

Parameters

display_name (str) – Group display name.
group_id (str, optional) – Group ID. Can only include letters, numbers, underscores and periods. Default: None.
ldap_groups (list, optional) – List of LDAP groups (strings).

Returns

An instance of streamsets.sdk.sch_models.Group.

Jobs#

class streamsets.sdk.sch_models.Job(*args, **kwargs)[source]#

Model for Job.

archived#

Flag that indicates if this job is archived.

Type: bool

commit_id#

Pipeline commit id.

Type: str

commit_label#

Pipeline commit label.

Type: str

created_by#

User that created this job.

Type: str

created_on#

Time at which this job was created.

Type: int

data_collector_labels#

Labels of the data collectors.

Type: list

delete_after_completion#

Flag that indicates if this job should be deleted after completion.

Type: bool

description#

Job description.

Type: str

destroyer#

Job destroyer.

Type: str

enable_failover#

Flag that indicates if failover is enabled.

Type: bool

enable_time_series_analysis#

Flag that indicates if time series is enabled.

Type: bool

execution_mode#

True for Edge and False for SDC.

Type: bool

job_deleted#

Flag that indicates if this job is deleted.

Type: bool

job_id#

Id of the job.

Type: str

job_name#

Name of the job.

Type: str

last_modified_by#

User that last modified this job.

Type: str

last_modified_on#

Time at which this job was last modified.

Type: int

number_of_instances#

Number of instances.

Type: int

pipeline_force_stop_timeout#

Timeout for Pipeline force stop.

Type: int

pipeline_id#

Id of the pipeline that is running the job.

Type: str

pipeline_name#

Name of the pipeline that is running the job.

Type: str

pipeline_rule_id#

Rule Id of the pipeline that is running the job.

Type: str

read_policy#

Read Policy of the job.

Type: streamsets.sdk.sch_models.ProtectionPolicy

runtime_parameters#

Run-time parameters of the job.

Type: str

static_parameters#

List of parameters that cannot be overriden.

Type: list

statistics_refresh_interval_in_millisecs#

Refresh interval for statistics in milliseconds.

Type: int

status#

Status of the job.

Type: string

history#

Status and run count history for the Job.

Type: streamsets.sdk.utils.SeekableList) of (streamsets.sdk.sch_models.JobStatus

template_run_history_list#

List of job template run history.

Type: list

write_policy#

Write Policy of the job.

Type: streamsets.sdk.sch_models.ProtectionPolicy

property acl#

Get job ACL.

Returns: An instance of streamsets.sdk.sch_models.ACL.

add_tag(*tags)[source]#

Add a tag

Parameters: *tags – One or more instances of str

property commit#

Get pipeline commit of the job.

Returns: An instance of streamsets.sdk.sch_models.PipelineCommit.

property committed_offsets#

Get the committed offsets for a given job id.

Returns: An instance of streamsets.sdk.sch_models.JobCommittedOffset.

get_run_logs()[source]#

Retrieve the logs for the last run of the job.

Returns: A list of dict instances, one dictionary per log line.

get_snowflake_generated_queries()[source]#

Retrieve the Snowflake generated queries of the last run of the job.

Returns: A list of dict instances, one dictionary per query.

property history#: Status and run count history for the Job.

property job_history#: Status and run count history for the Job.

property job_sequence#

Get the Job Sequence that this job is a part of.

Returns: An instance of streamsets.sdk.sch_models.JobSequence or None.

property latest_committed_offsets#

Get the latest committed offsets for a given job id.

Returns: A (dict) object.

property metrics#

The metrics from all runs of a Job.

Returns: A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.JobMetrics instances.

property pipeline#: Get the pipeline object corresponding to this job.

property realtime_summary#

Get the Realtime Summary of the Job.

Returns

An instance of streamsets.sdk.sdc_models.PipelineMetrics or: streamsets.sdk.st_models.PipelineMetrics.

remove_tag(*tags)[source]#

Remove a tag

Parameters: *tags – One or more instances of str

property status#: Current job status.

property system_job#

Get the sytem Job for this job if exists.

Returns: An instance of streamsets.sdk.sch_models.Job.

property tag#

Get pipeline tag of the job.

Returns: An instance of streamsets.sdk.sch_models.PipelineTag.

property tags#

Get the job tags.

Returns: A streamsets.sdk.utils.SeekableList of instances of streamsets.sdk.sch_models.Tag.

time_series_metrics(metric_type, time_filter_condition='LAST_5M', **kwargs)[source]#

Get historic time series metrics for the job.

Parameters

metric_type (str) – metric type in {‘Record Count Time Series’, ‘Record Throughput Time Series’, ‘Batch Throughput Time Series’, ‘Stage Batch Processing Timer seconds’}.
time_filter_condition (str, optional) – Default: 'LAST_5M'.

Returns

An instance of streamsets.sdk.sch_models.JobTimeSeriesMetrics.

class streamsets.sdk.sch_models.Jobs(control_hub)[source]#

Collection of streamsets.sdk.sch_models.Job instances.

Parameters: control_hub (streamsets.sdk.sch.ControlHub) – ControlHub object.

count(status)[source]#

Get job counts by status.

Parameters: status (str) – Status of the jobs in {‘ACTIVE’, ‘INACTIVE’, ‘ACTIVATING’, ‘DEACTIVATING’, ‘INACTIVE_ERROR’, ‘ACTIVE_GREEN’, ‘ACTIVE_RED’, ‘’}
Returns: An instance of int indicating the count of jobs with specified status.

class streamsets.sdk.sch_models.JobBuilder(job, control_hub)[source]#

Class with which to build instances of streamsets.sdk.sch_models.Job.

Instead of instantiating this class directly, most users should use: streamsets.sdk.sch.ControlHub.get_job_builder().

Parameters: job (dict) – Python object built from our Swagger JobJson definition.

build(job_name, pipeline, job_template=False, runtime_parameters=None, pipeline_commit=None, pipeline_tag=None, pipeline_commit_or_tag=None, tags=None)[source]#

Build the job.

Parameters

job_name (str) – Name of the job.
pipeline (streamsets.sdk.sch_models.Pipeline) – Pipeline object.
job_template (boolean, optional) – Indicate if it is a Job Template. Default: False.
runtime_parameters (dict, optional) – Runtime Parameters for the Job or Job Template. Default: None.
pipeline_commit (streamsets.sdk.sch_models.PipelineCommit) – Default: ``None`, which resolves to the latest pipeline commit.
pipeline_tag (streamsets.sdk.sch_models.PipelineTag) – Default: ``None`, which resolves to the latest pipeline tag.
pipeline_commit_or_tag (str, optional) – Default: None, which resolves to the latest pipeline commit.
tags (list, optional) – Job tags. Default: None.

Returns

An instance of streamsets.sdk.sch_models.Job.

class streamsets.sdk.sch_models.JobMetrics(*args, **kwargs)[source]#

Model for job metrics.

error_count#

The number of error records generated by this run of the Job.

Type: int

error_records_per_sec#

The number of error records per second generated by this run of the Job.

Type: float

input_count#

The number of records ingested by this run of the Job.

Type: int

input_records_per_sec#

The number of records per second ingested by this run of the Job.

Type: float

output_count#

The number of records output by this run of the Job.

Type: int

output_records_per_sec#

The number of records output per second by the run of the Job.

Type: float

pipeline_version#

The version of the pipeline that was used in this Job run.

Type: str

run_count#

The count corresponding to this Job run.

Type: int

sdc_id#

The ID of the SDC instance on which this Job run was executed.

Type: str

stage_errors_count#

The number of stage error records generated by this run of the Job.

Type: int

stage_error_records_per_sec#

The number of stage error records generated per second by this run of the job.

Type: float

total_error_count#

The total number of both error records and stage errors generated by this run of the job.

Type: int

Parameters: metrics (dict) – Metrics counts in JSON format.

class streamsets.sdk.sch_models.JobOffset(*args, **kwargs)[source]#

Model for offset.

Parameters: offset (dict) – Offset in JSON format.

class streamsets.sdk.sch_models.JobRunEvent(*args, **kwargs)[source]#

Model for an event in a Job Run.

Parameters: event (dict) – Job Run Event in JSON format.

class streamsets.sdk.sch_models.JobStatus(*args, **kwargs)[source]#

Model for Job Status.

color#

Job status color.

Type: str

run_history#

(streamsets.sdk.utils.JobRunHistoryEvent): History of a particular job run.

Type: streamsets.sdk.utils.SeekableList

offsets#

(streamsets.sdk.utils.JobPipelineOffset): Offsets after the job run.

Type: streamsets.sdk.utils.SeekableList

Parameters

status (dict) – Job status in JSON format.
control_hub (streamsets.sdk.sch.ControlHub) – ControlHub object.

property color#: Get the color of the Job Status.

property offsets#: Get the offsets of the Job Status.

property run_history#: Get the run history of the Job Status.

class streamsets.sdk.sch_models.JobTimeSeriesMetric(*args, **kwargs)[source]#

Model for job metrics.

name#

Name of measurement.

Type: str

values#

Timeseries data.

Type: list

time_series#

Timeseries data with timestamp as key and metric value as value.

Type: dict

Parameters: metric (dict) – Metrics in JSON format.

property time_series#: Get the time series (dict).

class streamsets.sdk.sch_models.JobTimeSeriesMetrics(*args, **kwargs)[source]#

Model for job metrics.

input_records#

Appears when queried for ‘Record Count Time Series’ or ‘Record Throughput Time Series’.

Type: streamsets.sdk.sch_models.JobTimeSeriesMetric

output_records#

Appears when queried for ‘Record Count Time Series’ or ‘Record Throughput Time Series’.

Type: streamsets.sdk.sch_models.JobTimeSeriesMetric

error_records#

Appears when queried for ‘Record Count Time Series’ or ‘Record Throughput Time Series’.

Type: streamsets.sdk.sch_models.JobTimeSeriesMetric

batch_counter#

Appears when queried for ‘Batch Throughput Time Series’.

Type: streamsets.sdk.sch_models.JobTimeSeriesMetric

batch_processing_timer#

Appears when queried for ‘Batch Processing Timer seconds’.

Type: streamsets.sdk.sch_models.JobTimeSeriesMetric

Parameters: metrics (dict) – Metrics in JSON format.

class streamsets.sdk.sch_models.RuntimeParameters(runtime_parameters, job)[source]#

Wrapper for ControlHub job runtime parameters.

Parameters

runtime_parameters (str) – Runtime parameter.
job (streamsets.sdk.sch_models.Job) – Job object.

Job Sequences#

class streamsets.sdk.sch_models.JobSequence(*args, **kwargs)[source]#

Model for Job Sequence

Parameters

job_sequence (dict) – The job sequence Swagger JSON.
control_hub (streamsets.sdk.sch.ControlHub) – ControlHub object.

id#

Job Sequence ID.

Type: str

name#

Job Sequence name.

Type: str

description#

Job Sequence description.

Type: str

organization#

Organization that the Job Sequence is a part of.

Type: str

created_by#

Who created the Job Sequence.

Type: str

create_time#

When the Job Sequence was created.

Type: str

last_modified_by#

Who the Job Sequence was last modified by.

Type: str

last_modified_time#

When the Job Sequence was last modified.

Type: str

start_time#

Start time of the Job Sequence.

Type: str

end_time#

End time of the Job Sequence.

Type: str

time_zone#

Job Sequence timezone.

Type: str

cron_tab_mask#

Cron tab mask of the Job Sequence.

Type: str

steps#

Job Sequence steps.

Type: str

status#

Job Sequence status.

Type: str

class JobSequenceStatusType(value)[source]#: An enumeration.

class LogLevel(value)[source]#: An enumeration.

class LogType(value)[source]#: An enumeration.

add_step_with_jobs(jobs, parallel_jobs=False, ignore_error=True)[source]#

Generate one or more Steps by providing new Job objects to be added to the Job Sequence.

Parameters

jobs (list of streamsets.sdk.sch_models.Job) – Jobs to be added to the Job Sequence as part of the step.
parallel_jobs (bool) – Whether the passed in Jobs will be a part of the same step or not. Default: False.
ignore_error (bool) – Whether to ignore Job errors or not. Default: True.

delete_history_logs(history_logs)[source]#

Delete history logs from the Job Sequence.

Parameters: history_logs (list of streamsets.sdk.sch_models.JobSequenceHistoryLog) – History Logs to delete.

get_history_log(log_type=None, log_level=None, last_run_only=None, run_id=None, from_date=None, to_date=None)[source]#

Get the history log of the Job Sequence.

Args:

log_type (str, optional): Accepted values are “SEQUENCE_START”, “SCHEDULER_TRIGGER_ERROR”,
“STEP_START”. Default: None.

log_level (str, optional): Accepted values are “INFO”, “WARN”, “ERROR”. Default: None. last_run_only (bool, optional): Whether to return History Log for the last run only. Default: None. run_id (str, optional): The desired Run ID to get the History Log for. Default: None. from_date (str, optional): The starting date from which we’d like to see the History Log for. Default: None. to_date (str, optional): The end date till which we’d like to see the History Log for. Default: None.

Returns: A SeekableList of streamsets.sdk.sch_models.JobSequenceHistoryLog objects.

property history_logs#

Get the history logs of the Job Sequence.

Returns: An instance of streamsets.sdk.sch_models.JobSequenceHistoryLogs.

mark_job_as_finished(job)[source]#

Send an event in Control Hub indicating that a job has finished.

Parameters: job (streamsets.sdk.sch_models.Job) – Job to mark as finished.
Returns: An instance of streamsets.sdk.sch_api.Command.

move_step(step, target_step_number, swap=False)[source]#

Move one Step to another index within the Job Sequence.

Parameters

step (streamsets.sdk.sch_models.Step) – The step to be moved.
target_step_number (int) – The step number to move the step to.
swap (bool) – Whether to swap the step with the step in the given index, if False, this will move
Default (the step to the index and shift the steps to the right.) – False.

remove_step(step)[source]#

Remove a Step from a Job Sequence.

Parameters: step – A streamsets.sdk.sch_models.Step instance.

property run_ids#

Get all the run IDs of the Job Sequence.

Returns: An list of int.

property steps#

The Steps that are part of the Job Sequence.

Returns: A (streamsets.sdk.utils.SeekableList) of (streamsets.sdk.sch_models.Step) objects.

class streamsets.sdk.sch_models.JobSequences(control_hub)[source]#

Collection of streamsets.sdk.sch_models.JobSequence instances.

Parameters: control_hub (streamsets.sdk.sch.ControlHub) – An instance of the ControlHub.

class streamsets.sdk.sch_models.JobSequenceBuilder(job_sequence, control_hub=None)[source]#

Class with which to build instances of streamsets.sdk.sch_models.JobSequence.

Instead of instantiating this class directly, most users should use: streamsets.sdk.sch.ControlHub.get_job_sequence_builder().

Parameters: job_sequence (dict) – Python object built from our Swagger PipelineJson definition.

add_start_condition(start_time=0, end_time=0, time_zone='UTC', crontab_mask='0/1 * 1/1 * ? *')[source]#

Add a start condition.

Parameters

start_time (int, optional) – Unix timestamp representing the start time for the Job Sequence. Default: 0.
end_time (int, optional) – Unix timestamp representing the end time for the Job Sequence. Default: 0.
time_zone (str, optional) – Time Zone of the Job Sequence. Default: UTC.
crontab_mask (str, optional) – Cron Tab Mask of the Job Sequence. Default: None.

build(name='Job Sequence', description=None)[source]#

Build the Job Sequence.

Parameters

name (str, optional) – Name of the Job Sequence. Default: Job Sequence.
description (str, optional) – End time for the Job Sequence. Default: None.

Returns

A streamsets.sdk.sch_models.JobSequence instances.

class streamsets.sdk.sch_models.JobSequenceHistoryLog(*args, **kwargs)[source]#

Wrapper class for representing JobSequence History Log.

Parameters: history_log (dict) – JSON representation of a History Log.

class streamsets.sdk.sch_models.Step(*args, **kwargs)[source]#

Model for Job Step.

id#

Step ID.

Type: str

job_id#

Job ID.

Type: str

sequence_id#

Job Sequence ID.

Type: str

job_name#

Job Sequence name.

Type: str

organization#

Organization that the Job Sequence is a part of.

Type: str

status#

Job Sequence status.

Type: str

step_number#

Job Sequence status.

Type: str

add_jobs(jobs, ignore_error=True)[source]#

List of Jobs to add to the Step.

Parameters

jobs (list of streamsets.sdk.sch_models.Job) – Job to add to the step.
ignore_error (bool) – Whether to ignore Job errors or not. Default: True.

create_finish_condition(condition_type, job, move_job_to_error=False, crontab_mask=None, end_time=None, timezone=None)[source]#

Create a Finish Conditions in the Step.

Parameters

condition_type (str) – The Condition Type. Accepted Values are ‘END_TIME’ & ‘CRON’.
job (streamsets.sdk.sch_models.Job) – A (streamsets.sdk.sch_models.Job) object.
move_job_to_error (bool, , optional) – Whether to Move Job to Error.
crontab_mask (str, , optional) – The Cron expression for the Finish Condition. Only set when condition_type is ‘CRON’
end_time (str, , optional) – The End Time for the Finish Condition. Only set when condition_type is ‘END_TIME’
timezone (str, , optional) – The Timezone for the Finish Condition. Only set when condition_type is ‘END_TIME’

delete_finish_condition(finish_condition)[source]#

Delete the Finish Conditions from the Step.

Parameters: finish_condition (streamsets.sdk.sch_models.FinishCondition) – FinishCondition instance.

property finish_conditions#

The Finish Conditions that are part of the Step.

Returns

A (streamsets.sdk.utils.SeekableList) of: (streamsets.sdk.sch_models.FinishCondition) objects.

property name#

Get the name of the Step.

Returns: A (str) object.

remove_jobs(*jobs)[source]#

Removes Jobs from Step.

Parameters: jobs – One or more streamsets.sdk.sch_models.Job instances.

property step_jobs#

The Jobs that are part of the Step.

Returns: A (streamsets.sdk.utils.SeekableList) of (streamsets.sdk.sch_models.Job) objects.

update_finish_condition(finish_condition)[source]#

Update the Step’s Finish Conditions.

Parameters: finish_condition (streamsets.sdk.sch_models.FinishCondition) – FinishCondition instance.

Organizations#

class streamsets.sdk.sch_models.Organization(*args, **kwargs)[source]#

Model for Organization.

Parameters

organization (str) – Organization Id.
organization_admin_user (str, optional) – Default: None.
api_client (streamsets.sdk.sch_api.ApiClient, optional) – Default: None.

default_user_password_expiry_time_in_days#

Default expiry time of the user password for the organization.

Type: str

configuration#

Configuration for the organization.

Type: streamsets.sdk.models.Configuration

admin_user_id#

Organization admin’s user ID.

Type: str

created_by#

Who the Organization was created by.

Type: str

saml_intergration_enabled#

Whether SAML integration is enabled.

Type: bool

property configuration#: Get the streamsets.sdk.models.Configuration instance for the organization.

property default_user_password_expiry_time_in_days#: Get the default expiry time of the user password for the organization.

class streamsets.sdk.sch_models.Organizations(control_hub)[source]#: Collection of streamsets.sdk.sch_models.Organization instances.

class streamsets.sdk.sch_models.OrganizationBuilder(organization, organization_admin_user)[source]#

Class with which to build instances of streamsets.sdk.sch_models.Organization.

Instead of instantiating this class directly, most users should use: streamsets.sdk.sch.ControlHub.get_organization_builder().

Parameters: organization (dict) – Python object built from our Swagger UserJson definition.

build(id, name, admin_user_id, admin_user_display_name, admin_user_email_address, admin_user_ldap_user_name=None)[source]#

Build the organization.

Parameters

id (str) – Organization ID.
name (str) – Organization name.
admin_user_id (str) – User Id of the admin of this organization.
admin_user_display_name (str) – User display name of admin of this organization.
admin_user_email_address (str) – User email address of admin of this organization.
admin_user_ldap_user_name (str, optional) – LDAP username. Default: None.

Returns

An instance of streamsets.sdk.sch_models.Organization.

Pipelines#

class streamsets.sdk.sch_models.Pipeline(*args, **kwargs)[source]#

Model for Pipeline.

Parameters

builder (streamsets.sdk.sch_models.PipelineBuilder) – Pipeline Builder object.
control_hub (streamsets.sdk.sch.ControlHub, optional) – ControlHub object. Default: None.
engine_id (str) – The ID of the authoring engine for this pipeline.
library_definitions (dict, optional) – Library Definition in JSON format. Default: None.
pipeline (dict) – Pipeline in JSON format.
pipeline_definition (dict) – Pipeline Definition in JSON format.
rules_definition (dict) – Rules Definition in JSON format.

library_definitions#

Pipeline’s Library Definitions.

Type: dict

pipeline_definition#

Pipeline’s definitions.

Type: dict

commits#

Pipeline’s commits.

Type: list of streamsets.sdk.sch_models.PipelineCommit instances

tags#

Pipeline’s tags.

Type: list of streamsets.sdk.sch_models.Tag instances

configuration#

Pipeline’s configuration.

Type: streamsets.sdk.models.Configuration

acl#

Pipeline’s ACL.

Type: streamsets.sdk.sch_models.ACL

stages#: Pipeline’s Stages.

error_stage#: Pipeline’s Error Stage.

stats_aggregator_stage#: Pipeline’s States Aggregator Stage.

parameters#

Pipeline’s parameters.

Type: str

name#

Pipeline’s name.

Type: str

description#

Pipeline’s description.

Type: str

labels#

Pipeline’s labels.

Type: str

engine_id#

Pipeline’s SDC ID.

Type: str

property acl#

Get pipeline ACL.

Returns: An instance of streamsets.sdk.sch_models.ACL.

add_fragment(fragment, parameter_name_prefix=None)[source]#

Add a fragment to the pipeline.

Parameters

fragment (py:obj:streamsets.sdk.sch_models.Pipeline) – Fragment to add.
parameter_name_prefix (str, optional) – Prefix name for the parameters of fragment. Default: None.

Returns

An instance of streamsets.sdk.sdc_models.Stage.

add_label(*labels)[source]#

Add a label

Parameters: *labels – One or more instances of str

add_stage(label=None, name=None, type=None, library=None)[source]#

Add a stage to the pipeline.

When specifying a stage, either label or name must be used. type and library may also be used to select a particular stage if ambiguities exist. If type and/or library are omitted, the first stage definition matching the given label or name will be used.

Parameters

label (str, optional) – Stage label to use when selecting stage from definitions. Default: None.
name (str, optional) – Stage name to use when selecting stage from definitions. Default: None.
type (str, optional) – Stage type to use when selecting stage from definitions (e.g. origin, destination, processor, executor). Default: None.
library (str, optional) – Stage library to use when selecting stage from definitions. Default: None.

Returns

An instance of streamsets.sdk.sch_models.SchSdcStage: or streamsets.sdk.sch_models.SchStStage.

property commits#

Get commits for this pipeline.

Returns: A streamsets.sdk.utils.SeekableList of instances of streamsets.sdk.sch_models.PipelineCommit.

property configuration#

Get pipeline’s configuration.

Returns: An instance of streamsets.sdk.sch_models.Configuration.

property description#: Get the description of the Pipeline.

property error_stage#: Get the error stage of the Pipeline.

get_jobs_using_pipeline()[source]#

Get the jobs that are running on this pipeline.

Returns: An instance of streamsets.sdk.utils.SeekableList containing instances of streamsets.sdk.sch_models.Job that run on this pipeline.

property labels#

Get the pipeline labels.

Returns: A streamsets.sdk.utils.SeekableList of instances of streamsets.sdk.sch_models.PipelineLabel.

property library_definitions#: Get the Pipeline’s (dict) Library Definitions.

property name#: Get the name of the Pipeline.

property parameters#

Get the pipeline parameters.

Returns: A dict like, streamsets.sdk.sch_models.PipelineParameters object of parameter key-value pairs.

property pipeline_definition#: Get the Pipeline’s (dict) Pipeline Definitions.

remove_label(*labels)[source]#

Remove a label

Parameters: *labels – One or more instances of str

remove_stages(*stages)[source]#

Remove one or more stages from a Pipeline.

Parameters

stages (streamsets.sdk.sdc_models.Stage or streamsets.sdk.st_models.Stage) –
objects (One or more stage) –

property sdc_id#: Get the SDC ID of the Pipeline.

property stages#: Get the stages of the Pipeline.

property stats_aggregator_stage#: Get the stats aggregator stage of the Pipeline.

property tags#

Get tags for this pipeline.

Returns: A streamsets.sdk.utils.SeekableList of instances of streamsets.sdk.sch_models.PipelineTag.

class streamsets.sdk.sch_models.Pipelines(control_hub, organization)[source]#

Collection of streamsets.sdk.sch_models.Pipeline instances.

Parameters

control_hub – An instance of streamsets.sdk.sch.ControlHub.
organization (str) – Organization Id.

class streamsets.sdk.sch_models.PipelineBuilder(pipeline, data_collector_pipeline_builder, control_hub=None, fragment=False, engine_start_up_time=0)[source]#

Class with which to build instances of streamsets.sdk.sch_models.Pipeline.

Instead of instantiating this class directly, most users should use: streamsets.sdk.sch.ControlHub.get_pipeline_builder().

Parameters

pipeline (dict) – Python object built from our Swagger PipelineJson definition.
data_collector_pipeline_builder (streamsets.sdk.sdc_models.PipelineBuilder) – Data Collector Pipeline Builder object.
control_hub (streamsets.sdk.sch.ControlHub) – Default: None.
fragment (boolean, optional) – Specify if a fragment builder. Default: False.
engine_start_up_time (int, optional) – Engine’s start up time. Default: 0.

add_stage(label=None, name=None, type=None, library=None)[source]#

Add a stage to the pipeline.

Parameters

label (str, optional) – Stage label to use when selecting stage from definitions. Default: None.
name (str, optional) – Stage name to use when selecting stage from definitions. Default: None.
type (str, optional) – Stage type to use when selecting stage from definitions (e.g. origin, destination, processor, executor). Default: None.
library (str, optional) – Stage library to use when selecting stage from definitions. Default: None.

Returns

An instance of streamsets.sdk.sch_models.SchSdcStage.

build(title='Pipeline', description='', labels=None, build_from_imported=False, **kwargs)[source]#

Build the pipeline.

Parameters

title (str, optional) – Title of the pipeline.
description (str, optional) – Description of the pipeline. Default: ````.
labels (list, optional) – List of pipeline labels of type str. Default: None.
build_from_imported (boolean, optional) – Whether we want to build a pipeline from an imported pipeline. Default: False

Returns

py:class`streamsets.sdk.sch_models.Pipeline`.

Return type

An instance of

import_pipeline(pipeline, commit_id_regeneration=True, regenerate_id=True, **kwargs)[source]#

Import a pipeline into the PipelineBuilder to use as a starting point based off of an existing Pipeline.

Parameters

pipeline( – py:class`streamsets.sdk.sch_models.Pipeline`): Pipeline object.
commit_id_regeneration (bool, optional) – Whether to use the imported pipeline’s commit ID. When set to False, the imported pipeline will be edited as-is without a fresh pipeline being created. Default: True
regenerate_id (bool, optional) – Whether to use the imported pipeline’s pipeline_id. When set to False, the imported pipeline’s pipeline_id will be used. Default: True

Returns

An instance of streamsets.sdk.sdc_models.PipelineBuilder.

remove_stage(stage)[source]#

Remove a stage from the pipeline builder.

Parameters: stage (streamsets.sdk.sdc_models.Stage) – Stage to disconnect.

class streamsets.sdk.sch_models.PipelineCommit(*args, **kwargs)[source]#

Model for pipeline commit.

Parameters

pipeline_commit (dict) – Pipeline commit in JSON format.
control_hub (streamsets.sdk.sch.ControlHub) – ControlHub object.

pipeline#

The commit’s pipeline.

Type: streamsets.sdk.sch_models.Pipeline

property pipeline#: Get the commits pipeline.

class streamsets.sdk.sch_models.PipelineLabel(*args, **kwargs)[source]#

Model for pipeline label.

Parameters: pipeline_label (dict) – Pipeline label in JSON format.

label#

The pipeline label.

Type: str

property label#: Get the pipeline label.

class streamsets.sdk.sch_models.PipelineLabels(control_hub, organization)[source]#

Collection of streamsets.sdk.sch_models.PipelineLabel instances.

Parameters

control_hub – An instance of streamsets.sdk.sch.ControlHub.
organization (str) – Organization Id.

class streamsets.sdk.sch_models.PipelineParameters(pipeline)[source]#

Parameters for pipelines.

Parameters: pipeline (streamsets.sdk.sch_models.Pipeline) – Pipeline Instance.

update(parameters_dict)[source]#

Update existing parameters. Works similar to Python dictionary update.

Parameters: parameters_dict (dict) – Dictionary of key-value pairs to be used as parameters.

class streamsets.sdk.sch_models.StPipelineBuilder(pipeline, transformer_pipeline_builder, control_hub=None, fragment=False, engine_start_up_time=0)[source]#

Class with which to build instances of streamsets.sdk.sch_models.Pipeline.

Instead of instantiating this class directly, most users should use: streamsets.sdk.sch.ControlHub.get_pipeline_builder().

Parameters

pipeline (dict) – Python object built from our Swagger PipelineJson definition.
transformer_pipeline_builder (streamsets.sdk.sdc_models.PipelineBuilder) – Transformer Pipeline Builder object.
control_hub (streamsets.sdk.sch.ControlHub) – Default: None.
fragment (boolean, optional) – Specify if a fragment builder. Default: False.
engine_start_up_time (int, optional) – Engine’s start up time. Default: 0.

add_stage(label=None, name=None, type=None, library=None)[source]#

Add a stage to the pipeline.

Parameters

label (str, optional) – Transformer stage label to use when selecting stage from definitions. Default: None.
name (str, optional) – Transformer stage name to use when selecting stage from definitions. Default: None.
type (str, optional) – Transformer stage type to use when selecting stage from definitions (e.g. origin, destination, processor, executor). Default: None.
library (str, optional) – Transformer stage library to use when selecting stage from definitions. Default: None.

Returns

An instance of streamsets.sdk.sch_models.SchStStage.

build(title='Pipeline', description='', build_from_imported=False, **kwargs)[source]#

Build the pipeline.

Parameters

title (str) – Title of the pipeline.
description (str, optional) – Description of the pipeline. Default: ````.
build_from_imported (boolean, optional) – Whether we want to build a pipeline from an imported pipeline. Default: False

Returns

py:class`streamsets.sdk.sch_models.Pipeline`.

Return type

An instance of

import_pipeline(pipeline, commit_id_regeneration=True, regenerate_id=True, **kwargs)[source]#

Import a pipeline into the PipelineBuilder to use as a starting point based off of an existing Pipeline.

Parameters

pipeline( – py:class`streamsets.sdk.sch_models.Pipeline`): Pipeline object.
commit_id_regeneration (bool, optional) – Whether to use the imported pipeline’s commit ID. When set to False, the imported pipeline will be edited as-is without a fresh pipeline being created. Default: True
regenerate_id (bool, optional) – Whether to use the imported pipeline’s pipeline_id. When set to False, the imported pipeline’s pipeline_id will be used. Default: True

Returns

An instance of streamsets.sdk.sdc_models.PipelineBuilder.

remove_stage(stage)[source]#

Remove a stage from the pipeline builder.

Parameters: stage (streamsets.sdk.sdc_models.Stage) – Stage to disconnect.

Protection Methods#

class streamsets.sdk.sch_models.ProtectionMethod(stage)[source]#

Protection Method Model.

Parameters: stage (dict) – JSON representation of a stage.

class streamsets.sdk.sch_models.ProtectionMethodBuilder(pipeline_builder)[source]#

Class with which to build instances of streamsets.sdk.sch_models.ProtectionMethod.

Instead of instantiating this class directly, most users should use: streamsets.sdk.sch.ControlHub.get_protection_method_builder().

Parameters: pipeline_builder (streamsets.sdk.sch_models.PipelineBuilder) – Pipeline Builder object.

Protection Policies#

class streamsets.sdk.sch_models.ProtectionPolicy(*args, **kwargs)[source]#

Model for Protection Policy.

Parameters

protection_policy (dict) – JSON representation of Protection Policy.
procedures (list) – A list of streamsets.sdk.sch_models.PolicyProcedure instances, Default: None.

procedures#

Procedures for the Protection Policy.

Type: PolicyProcedure

default_setting#

Sefault Setting for the Protection Policy.

Type: str

enactment#

Type: str

sampling#

Sampling for the Protection Policy.

Type: str

class streamsets.sdk.sch_models.ProtectionPolicies(control_hub)[source]#: Collection of streamsets.sdk.sch_models.ProtectionPolicy instances.

class streamsets.sdk.sch_models.ProtectionPolicyBuilder(control_hub, protection_policy, policy_procedure)[source]#

Class with which to build instances of streamsets.sdk.sch_models.ProtectionPolicy.

Instead of instantiating this class directly, most users should use: streamsets.sdk.sch.ControlHub.get_protection_policy_builder().

Parameters

protection_policy (dict) – Python object defining a protection policy.
policy_procedure (dict) – Python object defining a policy procedure.

class streamsets.sdk.sch_models.PolicyProcedure(*args, **kwargs)[source]#

Model for Policy Procedure.

Parameters: policy_procedure (dict) – JSON representation of Policy Procedure.

ProvisioningAgents#

class streamsets.sdk.sch_models.Deployment(deployment, control_hub=None, **kwargs)[source]#

Model for Deployment. This is an abstract class.

acl#

The Deployment ACL instance.

Type: streamsets.sdk.sch_models.ACL

created_by#

User that created this deployment.

Type: str

created_on#

Time at which this deployment was created.

Type: int

deployment_events#

Name of the deployment.

Type: DeploymentEvents

deployment_id#

Id of the deployment.

Type: str

deployment_name#

Name of the deployment.

Type: str

deployment_tags#

Raw deployment tags of the deployment.

Type: list

deployment_type#

Type of the deployment.

Type: str

desired_instances#

The Deployment desired number of instances.

Type: int

engine_configuration#

The Deployment Engine Configuration.

Type: DeploymentEngineConfiguration

engine_type#

The Deployment Engine type.

Type: str

engine_version#

The Deployment Engine Version.

Type: str

environment_id#

Enabled environment where engine will be deployed.

Type: list

last_modified_by#

User that last modified this deployment.

Type: str

last_modified_on#

Time at which this deployment was last modified.

Type: int

network_tags#

The Deployment network tags.

Type: list

scala_binary_version#

scala Binary Version.

Type: str

organization#

The Deployment organization.

Type: str

json_state#

State of the deployment - this is json field called state.

Type: str

state#

State of the deployment - displayed as state on UI.

Type: str

status#

Status of the deployment.

Type: str

status_detail#

Status detail of the deployment.

Type: str

registered_engines#

Registered Engines of the deployment.

Type: RegisteredEngines

tags#

Tags of the deployment.

Type: a streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.Tag instances

class ENGINE_TYPES(value)[source]#: An enumeration.

class TYPES(value)[source]#: An enumeration.

property acl#

Get deployment ACL.

Returns: An instance of streamsets.sdk.sch_models.ACL.

property deployment_events#

Get the events for a deployment.

Returns: An instance of streamsets.sdk.sch_models.DeploymentEvents

property engine_configuration#: The engine configuration of deployment engine. :returns: An instance of streamsets.sdk.sch_models.DeploymentEngineConfiguration.

property registered_engines#

Get the registered engines.

Returns: An instance of streamsets.sdk.sch_models.RegisteredEngines

property tags#

Get the tags.

Returns: A streamsets.sdk.utils.SeekableList of str instances.

class streamsets.sdk.sch_models.Deployments(control_hub, organization)[source]#

Collection of streamsets.sdk.sch_models.Deployment instances.

Parameters

control_hub – An instance of streamsets.sdk.sch.ControlHub.
organization (str) – Organization Id.

class streamsets.sdk.sch_models.DeploymentBuilder(deployment, control_hub)[source]#

Class with which to build instances of streamsets.sdk.sch_models.Deployment.

Instead of instantiating this class directly, most users should use: streamsets.sdk.sch.ControlHub.get_deployment_builder().

Parameters

deployment (dict) – Python object built from our Swagger CspDeploymentJson definition.
control_hub (streamsets.sdk.sch.ControlHub) – ControlHub object.

Define the deployment.

Parameters

deployment_name (str) – deployment name.
engine_type (str) – Type of engine to deploy.
engine_version (str) – Version of engine to deploy.
environment (streamsets.sdk.sch_models.Environment) – The environment instance.
external_resource_location (str, optional) – External Resource Location URL. Default: None.
engine_labels (list, optional) – List of labels (strings). Default: None.
max_cpu_load (float or int, optional) – Max CPU Load (percent). Default: 80.0.
max_memory_used (float or int, optional) – Max Memory Used (percent). Default: 100.0.
max_pipelines_running (int, optional) – Max Running Pipeline Count. Default: 1000000.
deployment_tags (list, optional) – List of tags (strings). Default: None.
engine_build (str, optional) – Build of engine to deploy. Default: None.
scala_binary_version (str, optional) – Scala binary version required in case of engine_type=’TF’ Default: None.

Returns

An instance of subclass of streamsets.sdk.sch_models.Deployment.

class streamsets.sdk.sch_models.ProvisioningAgent(*args, **kwargs)[source]#

Model for Provisioning Agent.

Parameters

provisioning_agent (dict) – A Python object representation of Provisioning Agent.
control_hub – An instance of streamsets.sdk.sch.ControlHub.

deployments#

ProvisioningAgent deployments.

Type: streamsets.sdk.sch_models.LegacyDeployments

acl#

ProvisioningAgent ACL

Type: streamsets.sdk.sch_models.ACL

property acl#

Get the ACL of a Provisioning Agent.

Returns: An instance of streamsets.sdk.sch_models.ACL.

property deployments#

Get the deployments associated with the Provisioning Agent.

Returns: A list of streamsets.sdk.sch_models.LegacyDeployment instances.

class streamsets.sdk.sch_models.ProvisioningAgents(control_hub, organization)[source]#

Collection of streamsets.sdk.sch_models.ProvisioningAgent instances.

Parameters

control_hub – An instance of streamsets.sdk.sch.ControlHub.
organization (str) – Organization Id.

Roles#

class streamsets.sdk.sch_models.Roles(values, entity, role_label_to_id)[source]#

Wrapper class over the list of Roles.

Parameters

values (list) – List of roles.
entity (streamsets.sdk.sch_models.Group) – (streamsets.sdk.sch_models.User): Group or User object.
role_label_to_id (dict) – Role label to Role ID mapping.

append(value)[source]#

The method appends a role or an iterable of roles

Parameters: value (str, iterable) – Role string or an iterable of roles
Returns: None
Raises: TypeError – This exception is raised if the value argument is not a str or iterable of str.

clear()[source]#: Remove all items from list.

extend(value)[source]#: Extend list by appending elements from the iterable.

insert(ind, value)[source]#: Insert object before index.

pop(value=- 1)[source]#

Remove and return item at index (default last).

Raises IndexError if list is empty or index is out of range.

remove(value)[source]#

The method removes a role or an iterable of roles

Parameters: value (str, iterable) – Role string or an iterable of roles
Returns: None
Raises: TypeError – This exception is raised if the value argument is not a str or iterable of str.

reverse()[source]#: Reverse IN PLACE.

sort()[source]#

Sort the list in ascending order and return None.

The sort is in-place (i.e. the list itself is modified) and stable (i.e. the order of two equal elements is maintained).

If a key function is given, apply it once to each list item and sort them, ascending or descending, according to their function values.

The reverse flag can be set to sort in descending order.

SAQL Searches#

class streamsets.sdk.sch_models.SAQLSearch(*args, **kwargs)[source]#

Model for SAQL Searches.

Parameters

saql_search_json (dict) – The SAQL Search JSON.
control_hub (streamsets.sdk.sch.ControlHub) – ControlHub object.

id#

ID of the SAQL Search.

Type: str

orgId#

ID of the Organization.

Type: str

type#

Type of SAQL search.

Type: str

name#

Name of SAQL search.

Type: str

mode#

Mode of SAQL search.

Type: str

query#

SAQL search query.

Type: str

creator#

Creator of SAQL search.

Type: str

createTime#

Time of creation of SAQL search.

Type: str

lastModifiedBy#

Last author to modify SAQL search.

Type: str

lastModifiedOn#

Last Time of modification of SAQL search.

Type: str

favorite#

Represents whether an SAQL Search is marked as a favorite or not.

Type: boolean

Returns: An instance of streamsets.sdk.sch_models.SAQLSearch.

class JobType(value)[source]#: An enumeration.

class ModeType(value)[source]#: An enumeration.

class PipelineType(value)[source]#: An enumeration.

class streamsets.sdk.sch_models.SAQLSearches(control_hub, saql_search_type)[source]#

Collection of streamsets.sdk.sch_models.SAQLSearch instances.

Parameters

control_hub (streamsets.sdk.sch.ControlHub) – ControlHub object.
saql_search_type (streamsets.sdk.sch_models.SAQLSearch.PipelineType) –

(streamsets.sdk.sch_models.SAQLSearch.JobType) Enum of the SAQL search type,
limited to: 'PIPELINE', ‘FRAGMENT’`,``’JOB_INSTANCE’, ``'JOB_TEMPLATE' and 'JOB_DRAFT_RUN'.

class streamsets.sdk.sch_models.SAQLSearchBuilder(saql_search, control_hub, mode='BASIC')[source]#

Class with which to build instances of streamsets.sdk.sch_models.SAQLSearch.

Instead of instantiating this class directly, most users should use: streamsets.sdk.sch.ControlHub.get_saql_search_builder().

Parameters

saql_search (dict) – Python object built from expected SAQL JSON structure
control_hub (streamsets.sdk.sch.ControlHub) – ControlHub object.
mode (str, optional) – mode of SAQL Search. Default: 'BASIC'

add_filter(property_name='name', property_operator='contains', property_value='', property_condition_combiner='AND')[source]#

Adds a filter to the SAQL Search Query.

Parameters

property_name (str, optional) – Default: 'name'
property_operator (str, optional) – Default: 'contains'
property_value (str, optional) – Default: ''
property_condition_combiner (str, optional) – Default: 'AND'

build(name)[source]#

Builder for SAQL Search.

Parameters: name (str) – Name of SAQL Search.
Returns: An instance of streamsets.sdk.sch_models.SAQLSearch.

Scheduler#

class streamsets.sdk.sch_models.ScheduledTask(*args, **kwargs)[source]#

Model for Scheduled Task.

Parameters

task (dict) – JSON representation of task.
control_hub (streamsets.sdk.sch.ControlHub) – ControlHub instance.

runs#

Scheduled Task Runs.

Type: streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.ScheduledTaskRun

audits (:py:obj:`streamsets.sdk.utils.SeekableList` of: streamsets.sdk.sch_models.ScheduledTaskAudit): Scheduled Task Audits.

acl#

ACL of a Scheduled Task.

Type: streamsets.sdk.sch_models.ACL

job_id#

Job ID for which this task is scheduled.

Type: str

property acl#

Get the ACL of a Scheduled Task.

Returns: An instance of streamsets.sdk.sch_models.ACL.

property audits#

Get Scheduled Task Audits.

Returns: A streamsets.sdk.utils.SeekableList of inherited instances of streamsets.sdk.sch_models.ScheduledTaskAudit.

property job_id#

Job ID for which this task is scheduled.

Returns: An instance of str.

property runs#

Get Scheduled Task Runs.

Returns: A streamsets.sdk.utils.SeekableList of inherited instances of streamsets.sdk.sch_models.ScheduledTaskRun.

class streamsets.sdk.sch_models.ScheduledTaskAudit(*args, **kwargs)[source]#

Scheduled Task Audit.

Parameters: run (dict) – JSON representation of scheduled task audit.

class streamsets.sdk.sch_models.ScheduledTaskBaseModel(*args, **kwargs)[source]#: Base Model for Scheduled Task related classes.

class streamsets.sdk.sch_models.ScheduledTaskBuilder(job_selection_types, control_hub)[source]#

Builder for Scheduled Task.

Instead of instantiating this class directly, most users should use: streamsets.sdk.sch.ControlHub.get_scheduled_task_builder().

Parameters

job_selection_types (dict) – JSON representation of job selection types.
control_hub (streamsets.sdk.sch.ControlHub) – ControlHub instance.

build(task_object, action='START', name=None, description=None, cron_expression='0 0 1/1 * ? *', time_zone='UTC', status='RUNNING', start_time=None, end_time=None, missed_execution_handling='IGNORE')[source]#

Builder for Scheduled Task.

Parameters

task_object (streamsets.sdk.sch_models.Job) – (streamsets.sdk.sch_models.ReportDefinition): Job or ReportDefinition object.
action (str, optional) – One of the {‘START’, ‘STOP’, ‘UPGRADE’} actions. Default: START.
name (str, optional) – Name of the task. Default: None.
description (str, optional) – Description of the task. Default: None.
crontab_mask (str, optional) – Schedule in cron syntax. Default: "0 0 1/1 * ? *". (Daily at 12).
time_zone (str, optional) – Time zone. Default: "UTC".
status (str, optional) – One of the {‘RUNNING’, ‘PAUSED’} statuses. Default: RUNNING.
start_time (str, optional) – Start time of task. Default: None.
end_time (str, optional) – End time of task. Default: None.
missed_trigger_handling (str, optional) – One of {‘IGNORE’, ‘RUN ALL’, ‘RUN ONCE’}. Default: IGNORE.

Returns

An instance of streamsets.sdk.sch_models.ScheduledTask.

class streamsets.sdk.sch_models.ScheduledTaskRun(*args, **kwargs)[source]#

Scheduled Task Run.

Parameters: run (dict) – JSON representation if scheduled task run.

class streamsets.sdk.sch_models.ScheduledTasks(control_hub)[source]#: Collection of streamsets.sdk.sch_models.ScheduledTask instances.

SeekableList#

class streamsets.sdk.utils.SeekableList(iterable=(), /)[source]#

get(**kwargs)[source]#

Retrieve the first instance that matches the supplied arguments.

Parameters: **kwargs – Optional arguments to be passed to filter the results offline.
Returns: The first instance from the group that matches the supplied arguments.

get_all(**kwargs)[source]#

Retrieve all instances that match the supplied arguments.

Parameters: **kwargs – Optional arguments to be passed to filter the results offline.
Returns: A streamsets.sdk.utils.SeekableList of results that match the supplied arguments.

Snapshot#

class streamsets.sdk.sch_models.Snapshot(*args, **kwargs)[source]#

Model for Snapshot.

Parameters

snapshot (dict) – Python object representation of the snapshot.
draft_run (streamsets.sdk.sch.DraftRun) – DraftRun object.

batch_number#

The number of the batch that the snapshot captured.

Type: int

batches#

A list of streamsets.sdk.sdc_models.Batch instances.

Type: list

id#

The snapshot’s ID.

Type: str

name#

The snapshot’s name.

Type: str

pipeline_id#

The ID of the pipeline that the snapshot belongs to.

Type: str

time_stamp#

The creation date of the snapshot as a unix timestamp.

Type: int

Stage#

class streamsets.sdk.sch_models.SchSdcStage(stage, pipeline=None, output_streams=None, supported_connection_types=None)[source]#

add_output(*other_stages, event_lane=False, allow_empty_stages=False, output_lane_index=None)#

Connect output of this stage to another stage.

The __rshift__ operator (>>) has been overloaded to invoke this method.

Parameters

other_stages (streamsets.sdk.sdc_models.Stage) – Stage object.
event_lane (boolean, optional) – Default: False.
allow_empty_stages (bool, optional) – Disable the check for empty stages. Default: False.
output_lane_index (int, optional) – The index of the output lane to attach an output to. Default: None.

Returns

This stage as an instance of streamsets.sdk.sdc_models.Stage).

connect_inputs(stages=None, event_lane=False, target_stage_output_lane_index=None)#

Connect inputs of this stage.

Parameters

stages – A list of streamsets.sdk.sdc_models.Stage objects to connect as inputs. Default: None.
event_lane (boolean, optional) – Default: False.
target_stage_output_lane_index (int, optional) – Index of the target stage’s output lane to attach to. Default: None.

connect_outputs(stages=None, event_lane=False, output_lane_index=None)#

Connect outputs of this stage.

Parameters

stages – A list of streamsets.sdk.sdc_models.Stage objects to connect as outputs. Default: None.
event_lane (boolean, optional) – Default: False.
output_lane_index (int, optional) – The index of THIS stage’s output lane to attach an output to. Default: None.

copy_inputs(stage, override=False)#

Copies inputs of the given stage.

Parameters

stage – The streamsets.sdk.sdc_models.Stage object whose inputs we’d like to copy.
override (bool, optional) – Whether to completely wipe and override the current stage’s input_lanes with the stage that has been passed.

copy_outputs(stage)#

Copies outputs of the given stage.

Parameters: stage – The streamsets.sdk.sdc_models.Stage object whose outputs we’d like to copy.

property description#

The stage’s description.

Type: str

disconnect_input_lanes(stages=None, all_stages=False)#

Disconnect input lanes of this stage from the lanes of the stages that are provided.

Example:
stage_one.disconnect_inputs(stages=[stage_two, stage_three]) Will disconnect the incoming input lanes from stage_two and stage_three to stage_one.

Parameters

stages – A list of streamsets.sdk.sdc_models.Stage objects to disconnect. Default: None.
all_stages (bool, optional) – Disconnect all incoming input lanes to this stage. Default: False.

disconnect_output_lanes(stages=None, all_stages=False, output_lane_index=None)#

Disconnect output lanes of this stage from the lanes of the stages that are provided.

Example:
stage_one.disconnect_outputs(stages=[stage_two, stage_three]) Will disconnect the outgoing output lanes FROM stage_one to stage_two and stage_three.

Parameters

stages – A list of streamsets.sdk.sdc_models.Stage objects to disconnect. Default: None.
all_stages (bool, optional) – Disconnect all outgoing output lanes to this stage. Default: False.
output_lane_index (int, optional) – Index of this stage’s output lane to detach. Default: None.

property event_lanes#

Get the stage’s list of event lanes.

Returns: A list of event lanes.

property input_lanes#

Get the stage’s list of input lanes.

Returns: A list of input lanes.

property label#

The stage’s label.

Type: str

property library#

Get the stage’s library.

Returns: The stage library as a str.

property output_lanes#

Get the stage’s list of output lanes.

Returns: A list of output lanes.

set_attributes(**attributes)#

Set one or more stage attributes.

Parameters: **attributes – Attributes to set.
Returns: This stage as an instance of streamsets.sdk.sdc_models.Stage.

property stage_on_record_error#: The stage’s on record error configuration value.

property stage_record_preconditions#: The stage’s record preconditions configuration value.

property stage_required_fields#: The stage’s required fields configuration value.

use_connection(connection)#

Specify a Connection instance to use for this stage.

Parameters: connection – One streamsets.sdk.sch_models.Connection instance. Only one Connection per stage is currently supported.

class streamsets.sdk.sch_models.SchStStage(stage, pipeline=None, output_streams=None, supported_connection_types=None)[source]#

add_output(*other_stages, event_lane=False, output_lane_index=None)#

Connect output of this stage to another stage.

The __rshift__ operator (>>) has been overloaded to invoke this method.

Parameters

other_stages (streamsets.sdk.st_models.Stage) – Stage object.
event_lane (boolean, optional) – Default: False.
output_lane_index (int, optional) – The index of the output lane to attach an output to. Default: None.

Returns

This stage as an instance of streamsets.sdk.st_models.Stage).

connect_inputs(stages=None, event_lane=False, target_stage_output_lane_index=None)#

Connect inputs of this stage.

Parameters

stages – A list of streamsets.sdk.st_models.Stage objects to connect as inputs. Default: None.
event_lane (boolean, optional) – Default: False.
target_stage_output_lane_index (int, optional) – Index of the target stage’s output lane to attach to. Default: None.

connect_outputs(stages=None, event_lane=False, output_lane_index=None)#

Connect outputs of this stage.

Parameters

stages – A list of streamsets.sdk.st_models.Stage objects to connect as outputs. Default: None.
event_lane (boolean, optional) – Default: False.
output_lane_index (int, optional) – The index of THIS stage’s output lane to attach an output to. Default: None.

copy_inputs(stage, override=False)#

Copies inputs of the given stage.

Parameters

stage – The streamsets.sdk.sdc_models.Stage object whose inputs we’d like to copy.
override (bool, optional) – Whether to completely wipe and override the current stage’s input_lanes with the stage that has been passed.

copy_outputs(stage)#

Copies outputs of the given stage.

Parameters: stage – The streamsets.sdk.sdc_models.Stage object whose outputs we’d like to copy.

disconnect_input_lanes(stages=None, all_stages=False)#

Disconnect input lanes of this stage from the lanes of the stages that are provided.

Example:
stage_one.disconnect_inputs(stages=[stage_two, stage_three]) Will disconnect the incoming input lanes from stage_two and stage_three to stage_one.

Parameters

stages – A list of streamsets.sdk.st_models.Stage objects to disconnect. Default: None.
all_stages (bool, optional) – Disconnect all incoming input lanes to this stage. Default: False.

disconnect_output_lanes(stages=None, all_stages=False, output_lane_index=None)#

Disconnect output lanes of this stage from the lanes of the stages that are provided.

Example:
stage_one.disconnect_outputs(stages=[stage_two, stage_three]) Will disconnect the outgoing output lanes FROM stage_one to stage_two and stage_three. stage_one.disconnect_outputs(output_lane_index=0) Will disconnect the first output lane of this stage.

Parameters

stages – A list of streamsets.sdk.st_models.Stage objects to disconnect. Default: None.
all_stages (bool, optional) – Disconnect all outgoing output lanes to this stage. Default: False.
output_lane_index (int, optional) – Index of this stage’s output lane to detach. Default: None.

property event_lanes#

Get the stage’s list of event lanes.

Returns: A list of event lanes.

property input_lanes#

Get the stage’s list of input lanes.

Returns: A list of input lanes.

property label#

The stage’s label.

Type: str

property library#

Get the stage’s library.

Returns: The stage library as a str.

property output_lanes#

Get the stage’s list of output lanes.

Returns: A list of output lanes.

set_attributes(**attributes)#

Set one or more stage attributes.

Parameters: **attributes – Attributes to set.
Returns: This stage as an instance of streamsets.sdk.st_models.Stage.

property stage_on_record_error#: The stage’s on record error configuration value.

property stage_record_preconditions#: The stage’s record preconditions configuration value.

property stage_required_fields#: The stage’s required fields configuration value.

use_connection(connection)#

Specify a Connection instance to use for this stage.

Parameters: connection – One streamsets.sdk.sch_models.Connection instance. Only one Connection per stage is currently supported.

Subscriptions#

class streamsets.sdk.sch_models.Subscription(*args, **kwargs)[source]#

Subscription.

Parameters

subscription (dict) – JSON representation of Subscription.
control_hub (streamsets.sdk.sch.ControlHub) – ControlHub instance.

events ( A :py:obj:`streamsets.sdk.utils.SeekableList` of: streamsets.sdk.sch_models.SubscriptionEvent instances): The Subscription’s events.

action#

Action of a Subscription.

Type: streamsets.sdk.sch_models.SubscriptionAction

acl#

ACL of a Subscription.

Type: streamsets.sdk.sch_models.ACL

property acl#

Get the ACL of an Event Subscription.

Returns: An instance of streamsets.sdk.sch_models.ACL.

property action#: Action of the Subscription.

property events#: Events of the Subscription.

class streamsets.sdk.sch_models.Subscriptions(control_hub)[source]#

Collection of streamsets.sdk.sch_models.Subscription instances.

Parameters: control_hub (streamsets.sdk.sch.ControlHub) – ControlHub object.

class streamsets.sdk.sch_models.SubscriptionAction(*args, **kwargs)[source]#

Action to take when the Subscription is triggered.

Parameters: action (dict) – JSON representation of an Action for a Subscription.

class streamsets.sdk.sch_models.SubscriptionAudit(*args, **kwargs)[source]#

Model for subscription audit.

Parameters: audit (dict) – JSON representation of a subscription audit.

class streamsets.sdk.sch_models.SubscriptionBuilder(subscription, control_hub)[source]#

Builder for Subscription.

Instead of instantiating this class directly, most users should use: streamsets.sdk.sch.ControlHub.get_subscription_builder().

Parameters

subscription (dict) – JSON representation of event subscription.
control_hub (streamsets.sdk.sch.ControlHub) – ControlHub instance.

add_event(event_type, filter='')[source]#

Add event to the Subscription.

Parameters

event_type (str) – Type of event in {‘Job Status Change’, ‘Data SLA Triggered’, ‘Pipeline Committed’, ‘Pipeline Status Change’, ‘Report Generated’, ‘Data Collector not Responding’}.
filter (str, optional) – Filter to be applied on event. Default: "".

build(name, description=None)[source]#

Builder for Scheduled Task.

Parameters

name (str) – Name of Subscription.
description (str, optional) – Description of subscription. Default: None.

Returns

An instance of streamsets.sdk.sch_models.Subscription.

import_subscription(subscription)[source]#

Import an existing Subscription into the builder to update it.

Parameters: subscription (streamsets.sdk.sch_models.Subscription) – Subscription instance.

remove_event(event_type)[source]#

Remove event from the subscription.

Parameters: event_type (str) – Type of event in {‘Job Status Change’, ‘Data SLA Triggered’, ‘Pipeline Committed’, ‘Pipeline Status Change’, ‘Report Generated’, ‘Data Collector not Responding’}.
Returns: An instance of streamsets.sdk.sch_models.SubscriptionEvent.

set_email_action(recipients, subject=None, body=None, external_smtp_url=None, external_smtp_port=None)[source]#

Set the Email action.

Parameters

recipients (list) – List of email addresses.
subject (str, optional) – Subject of the email. Default: None.
body (str, optional) – Body of the email. Default: None.
external_smtp_url (str, optional) – SMTP URL to route. Default: None.
external_smtp_port (str, optional) – SMTP Port to route. Default: None.

set_webhook_action(uri, method='GET', content_type=None, payload=None, auth_type=None, username=None, password=None, timeout=30000, headers=None, bearer_token=None, api_key_key=None, api_key_value=None, api_key_location=None, token_url=None, client_id=None, client_secret=None, scope=None, location=None, resources=None, audiences=None)[source]#

Set the Webhook action.

Parameters

uri (str) – URI for the Webhook.
method (str, optional) – HTTP method to use. Default: 'GET'.
content_type (str, optional) – Content Type of the request. Default: None.
payload (str, optional) – Payload of the request. Default: None.
auth_type (str, optional) – 'basic' or None. Default: None.
username (str, optional) – Username for the authentication. Default: None.
password (str, optional) – Password for the authentication. Default: None.
timeout (int, optional) – Timeout for the Webhook action. Default: 30000.
headers (dict, optional) – Headers to be sent to the Webhook call. Default: None.
bearer_token (str, optional) – Bearer token for the authentication. Default: None.
token_url – (str, optional): Token URL for the authentication. Default: None.
api_key_key (str, optional) – API key name for the authentication. Default: None.
api_key_value (str, optional) – API key value for the authentication. Default: None.
api_key_location (str, optional) – Where to send the api key values. Default: None.
audiences – (list, optional): Audiences for OAuth2 auth method. Default: None.
resources – (list, optional): Resources for OAuth2 auth method. Default: None.
location – (str, optional): Location of OAuth2 credentials (body or header). Default: None.
scope – (str, optional): Scope for OAuth2 auth method. Default: None.
client_secret – (str, optional): Client secret for OAuth2 auth method. Default: None.
client_id – (str, optional): Client id for OAuth2 auth method. Default: None.

class streamsets.sdk.sch_models.SubscriptionEvent(*args, **kwargs)[source]#

An Event of a Subscription.

Parameters

event (dict) – JSON representation of Events of a Subscription.
control_hub (streamsets.sdk.sch.ControlHub) – ControlHub instance.

Tags#

class streamsets.sdk.sch_models.Tag(*args, **kwargs)[source]#

Model for tag.

Parameters: tag (dict) – tag in JSON format.

id#

The full ID of the tag, in 'tag:organization' format.

Type: str

organization#

The ID of the organization that the tag belongs to.

Type: str

parent_id#

ID of the parent tag if this tag is a child.

Type: str

tag#

The tag’s label.

Type: str

property tag#: Get the Tag.

Topologies#

class streamsets.sdk.sch_models.DataSla(*args, **kwargs)[source]#

Model for DataSla.

Parameters

data_sla (dict) – JSON representation of SLA.
control_hub (streamsets.sdk.sch.ControlHub) – An instance of the Control Hub.

class streamsets.sdk.sch_models.DataSlaBuilder(data_sla, control_hub)[source]#

Class with which to build instances of streamsets.sdk.sch_models.DataSla.

Instead of instantiating this class directly, most users should use: streamsets.sdk.sch.ControlHub.get_data_sla_builder().

Parameters

data_sla (dict) – Python object built from our Swagger DataSlaJson definition.
control_hub (streamsets.sdk.sch.ControlHub) – An instance of the Control Hub.

build(topology, label, job, alert_text, qos_parameter='THROUGHPUT_RATE', function_type='Max', min_max_value=100, enabled=True)[source]#

Build the Data Sla.

Parameters

topology (streamsets.sdk.sch_models.Topology) – Topology object.
label (str) – Label for the SLA.
job (list) – List of streamsets.sdk.sch_models.Job objects.
alert_text (str) – Alert text.
qos_parameter (str, optional) – paramter in {‘THROUGHPUT_RATE’, ‘ERROR_RATE’}. Default: 'THROUGHPUT_RATE'.
function_type (str, optional) – paramter in {‘Max’, ‘Min’}. Default: 'Max'.
min_max_value (str, optional) – Default: 100.
enabled (boolean, optional) – Default: True.

Returns

An instance of streamsets.sdk.sch_models.DataSla.

class streamsets.sdk.sch_models.Topology(*args, **kwargs)[source]#

Model for Topology.

Parameters: topology (dict) – JSON representation of Topology.

acl#

Topology ACL.

Type: streamsets.sdk.sch_models.ACL

commit_id#

Pipeline commit id.

Type: str

commit_message#

Commit Message.

Type: str

commit_time#

Time at which commit was made.

Type: int

committed_by#

User that made the commit.

Type: str

data_slas#

Data SLAs currently part of the topology.

Type: streamsets.sdk.utils.SeekableList) of (streamsets.sdk.sch_models.DataSla

default_topology#

Default Topology.

Type: bool

description#

Topology description.

Type: str

draft#

Indicates whether this topology is a draft.

Type: bool

jobs#

Jobs currently part of the topology.

Type: streamsets.sdk.utils.SeekableList) of (streamsets.sdk.sch_models.Job

last_modified_by#

User that last modified this topology.

Type: str

last_modified_on#

Time at which this topology was last modified.

Type: int

new_pipeline_version_available#

Whether any job in the topology has a new pipeline version to be updated to.

Type: bool

nodes#

Nodes currently part of the topology.

Type: streamsets.sdk.utils.SeekableList) of (streamsets.sdk.sch_models.TopologyNode

organization#

Id of the organization.

Type: str

parent_version#

Version of the parent topology.

Type: str

topology_definition#

Definition of the topology.

Type: str

topology_id#

Id of the topology.

Type: str

topology_name#

Name of the topology.

Type: str

validation_issues#

Any validation issues that exist for this Topology.

Type: dict

version#

Version of this topology.

Type: str

acknowledge_job_errors()[source]#

Acknowledge all errors for the jobs in a topology.

Returns: An instance of streamsets.sdk.sch_api.Command.

property acl#

Get the ACL of a Topology.

Returns: An instance of streamsets.sdk.sch_models.ACL.

activate_data_sla(*data_slas)[source]#

Activate Data SLAs.

Parameters: *data_slas – One or more instances of streamsets.sdk.sch_models.DataSla.
Returns: An instance of streamsets.sdk.sch_api.Command.

add_data_sla(data_sla)[source]#

Add SLA.

Parameters: data_sla (streamsets.sdk.sch_models.DataSla) – Data SLA object.
Returns: An instance of streamsets.sdk.sch_api.Command.

auto_discover_connections()[source]#

Auto discover connecting systems between nodes in a Topology.

Returns: An instance of streamsets.sdk.sch_api.Command.

auto_fix()[source]#

Auto-fix a topology by rectifying invalid or removed jobs, outdated jobs, etc.

Returns: An instance of streamsets.sdk.sch_api.Command.

deactivate_data_sla(*data_slas)[source]#

Deactivate Data SLAs.

Parameters: *data_slas – One or more instances of streamsets.sdk.sch_models.DataSla.
Returns: An instance of streamsets.sdk.sch_api.Command.

delete_data_sla(*data_slas)[source]#

Delete Data SLAs.

Parameters: *data_slas – One or more instances of streamsets.sdk.sch_models.DataSla.
Returns: An instance of streamsets.sdk.sch_api.Command.

property jobs#

Get the jobs that are contained within the Topology.

Returns: A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.Job instances.

property new_pipeline_version_available#

Determine if a new pipeline version is available for any jobs in the Topology.

Returns: A (bool) value.

property nodes#

Get the job and system nodes that make up the Topology.

Returns

A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.TopologyNode: instances.

start_all_jobs()[source]#

Start all jobs of a topology.

Returns: An instance of streamsets.sdk.sch_api.Command.

stop_all_jobs(force=False)[source]#

Stop all jobs of a topology.

Parameters: force (bool, optional) – Force topology jobs to stop. Default: False.
Returns: An instance of streamsets.sdk.sch_api.Command.

update_jobs_to_latest_change()[source]#

Upgrade a topology’s job(s) to the latest corresponding pipeline change.

Returns: An instance of streamsets.sdk.sch_api.Command.

property validation_issues#

Get any validation issues that are detected for the Topology.

Returns: A (list) of validation issues in JSON format.

class streamsets.sdk.sch_models.Topologies(control_hub)[source]#: Collection of streamsets.sdk.sch_models.Topology instances. :param control_hub: An instance of the ControlHub. :type control_hub: streamsets.sdk.sch.ControlHub

class streamsets.sdk.sch_models.TopologyBuilder(topology, control_hub=None)[source]#

Class with which to build instances of streamsets.sdk.sch_models.Topology.

Instead of instantiating this class directly, most users should use: streamsets.sdk.sch.ControlHub.get_topology_builder().

topology_nodes#

(streamsets.sdk.sch_models.TopologyNode): Nodes currently part of the topology held by the TopologyBuilder.

Type: streamsets.sdk.utils.SeekableList

Parameters

topology (dict) – Python object built from our Swagger TopologyJson definition.
control_hub (streamsets.sdk.ControlHub, optional) – Control Hub instance. Default: None

add_job(job)[source]#

Add a job node to the Topology being built.

Parameters: job (streamsets.sdk.sch_models.Job) – An instance of a job to be added.

add_system(name)[source]#

Add a system node to the Topology being built.

Parameters: name (str) – The name of the system to add to the topology.

build(topology_name=None, description=None)[source]#

Build the topology.

Parameters

topology_name (str, optional) – Name of the topology. This parameter is required when building a new topology.
description (str, optional) – Description of the topology. Default: None.

Returns

An instance of streamsets.sdk.sch_models.Topology.

delete_node(topology_node)[source]#

Delete a system or job node from the topology.

Parameters: topology_node (streamsets.sdk.sch_models.TopologyNode) – An instance of a TopologyNode to delete from the topology.

import_topology(topology)[source]#

Import an existing topology to be used in the builder.

Parameters: topology (streamsets.sdk.sch_models.Topology) – An existing Topology instance to modify.

property topology_nodes#

Get all of the nodes currently part of the topology held by the TopologyBuilder.

Returns

A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.TopologyNode: instances.

class streamsets.sdk.sch_models.TopologyNode(*args, **kwargs)[source]#

Model for a node within a Topology.

Parameters: topology_node_json (dict) – JSON representation of a Topology Node.

node_type#

The type of this node, i.e. SYSTEM, JOB, etc.

Type: str

instance_name#

The name of this node instance.

Type: str

stage_name#

The name of the stage in this node.

Type: str

stage_version#

The version of the stage in this node.

Type: str

job_id#

The ID of the job in this node.

Type: str

pipeline_id#

The pipeline ID associated with the job in this node.

Type: str

pipeline_commit_id#

The commit ID of the pipeline.

Type: str

pipeline_version#

The version of the pipeline.

Type: str

input_lanes#

A list of str representing the input lanes for this node.

Type: list

output_lanes#

A list of str representing the output lanes for this node.

Type: list

name#

Name of the Topology Node.

Type: str

Users#

class streamsets.sdk.sch_models.User(*args, **kwargs)[source]#

Model for User.

Parameters

user (dict) – JSON representation of User.
roles (dict) – A mapping of role IDs to role labels.
control_hub (streamsets.sdk.ControlHub, optional) – An instance of Control Hub. Default: None

active#

Whether the user is active or not.

Type: bool

created_by#

Creator of this user.

Type: str

created_on#

Creation time of this user.

Type: str

display_name#

Display name of this user.

Type: str

email_address#

Email address of this user.

Type: str

id#

Id of this user.

Type: str

groups#

Groups this user belongs to.

Type: streamsets.sdk.util.SeekableList of streamsets.sdk.sch_models.Group instances

last_modified_by#

User last modified by.

Type: str

last_modified_on#

User last modification time.

Type: str

organization#

Organization ID that the user is part of.

Type: str

password_expires_on#

User’s password expiration time.

Type: str

password_system_generated#

Whether User’s password is system generated or not.

Type: bool

roles#

A set of role labels.

Type: set

saml_user_name#

SAML username of user.

Type: str

status#

The status of the user.

Type: str

property groups#: Get the group memberships for the user.

property roles#: Get the roles for the user.

property status#: Get the status of the user. Values that can be returned are either ‘ACTIVE’ or ‘DEACTIVATED’.

class streamsets.sdk.sch_models.Users(control_hub, roles, organization)[source]#

Collection of streamsets.sdk.sch_models.User instances.

Parameters

control_hub – An instance of streamsets.sdk.sch.ControlHub.
roles (dict) – A mapping of role IDs to role labels.
organization (str) – Organization ID.

class streamsets.sdk.sch_models.UserBuilder(user, roles, control_hub=None)[source]#

Class with which to build instances of streamsets.sdk.sch_models.User.

Instead of instantiating this class directly, most users should use: streamsets.sdk.sch.ControlHub.get_user_builder().

Parameters

user (dict) – Python object built from our Swagger UserJson definition.
roles (dict) – A mapping of role IDs to role labels.
control_hub – An instance of streamsets.sdk.sch.ControlHub. Default: None.

build(email_address)[source]#

Build the user.

Parameters: email_address (str) – User Email Address.
Returns: An instance of streamsets.sdk.sch_models.User.

Common#

Models used by StreamSets Data Collector and StreamSets Control Hub:

class streamsets.sdk.models.Configuration(configuration=None, compatibility_map=None, property_key='name', property_value='value', update_callable=None, update_callable_kwargs=None, id_to_remap=None)[source]#

Abstraction for configurations.

This class enables easy access to and modification of data stored as a list of dictionaries. A Configuration is stored in the form:

[{"name" : "<name_1>","value" : "<value_1>"}, {"name" : "<name_2>", value" : "<value_2>"},...]

However, the passed in configuration parameter can be a list of Configurations such as:

[[{"name" : "<name_1>","value" : "<value_1>"}, {"name" : "<name_2>", value" : "<value_2>"},...],
[{"name" : "<name_3>","value" : "<value_3>"}, {"name" : "<name_4>", value" : "<value_4>"},...],...]

Parameters

compatibility_map (dict, optional) – A dictionary mapping values used for backwards compatibility.
configuration (list) – List of configurations (see above for format).
property_key (str, optional) – The dictionary entry denoting the property key. Default: name
property_value (str, optional) – The dictionary entry denoting the property value. Default: value
update_callable (optional) – A callable to which self._data will be passed as part of __setitem__.
update_callable_kwargs (dict, optional) – A dictionary of kwargs to pass (along with a body) to the callable.
id_to_remap (dict, optional) – A dictionary mapping configuration IDs to human-readable container keys. Example: {‘custom_region’:’googleCloudConfig.customRegion’, … }

get(key, default=None)[source]#: Return the value of key or, if not in the configuration, the default value.

items()[source]#

Gets the configuration’s items.

Returns: A new view of the configuration’s items ((key, value) pairs).

update(configs)[source]#

Update instance with a collection of configurations.

Parameters: configs (dict) – Dictionary of configurations to use.

Exceptions#

Common exceptions.

exception streamsets.sdk.exceptions.BadRequestError(response)[source]#: Bad request error (HTTP 400).

exception streamsets.sdk.exceptions.ConnectError(response)[source]#: Pipeline connect error.

exception streamsets.sdk.exceptions.ConnectionError(code, message)[source]#: Connection Catalog errors.

exception streamsets.sdk.exceptions.EnginelessError(message)[source]#: Publishing an engineless designed pipeline error

exception streamsets.sdk.exceptions.InternalServerError(response)[source]#: Internal server error.

exception streamsets.sdk.exceptions.InvalidCredentialsError(message)[source]#: Invalid credentials error.

exception streamsets.sdk.exceptions.InvalidError(response)[source]#: Invalid stage config error.

exception streamsets.sdk.exceptions.InvalidVersionError(input)[source]#: The Version number could not be parsed.

exception streamsets.sdk.exceptions.JobInactiveError(message)[source]#: Job status changed into INACTIVE_ERROR.

exception streamsets.sdk.exceptions.JobRunnerError(code, message)[source]#: JobRunner errors.

exception streamsets.sdk.exceptions.LegacyDeploymentInactiveError(message)[source]#: Legacy deployment status changed into INACTIVE_ERROR.

exception streamsets.sdk.exceptions.MultipleIssuesError(errors)[source]#: Multiple errors were returned.

exception streamsets.sdk.exceptions.RunError(response)[source]#: Pipeline running error.

exception streamsets.sdk.exceptions.RunningError(response)[source]#: Pipeline run error.

exception streamsets.sdk.exceptions.ServiceDefinitionNotFound(message)[source]#: ServiceDefinition errors.

exception streamsets.sdk.exceptions.StartError(response)[source]#: Pipeline start error.

exception streamsets.sdk.exceptions.StartingError(response)[source]#: Pipeline starting error.

exception streamsets.sdk.exceptions.StatusError(response)[source]#: Parent class for pipeline status errors.

exception streamsets.sdk.exceptions.TopologyIssuesError(issues)[source]#: Topology has some issues.

exception streamsets.sdk.exceptions.UnprocessableEntityError(response)[source]#: Unprocessable Entity Error (HTTP 422).

exception streamsets.sdk.exceptions.UnsupportedMethodError(message)[source]#: An unsupported method was called.

exception streamsets.sdk.exceptions.ValidationError(issues)[source]#: Validation issues.

Platform SDK for Python Version 6.5.0

StreamSets Control Hub

Section Contents

StreamSets Control Hub#

Main interface#

Models#

ACLs#

Alerts#

Api Credentials#

Classifiers#

Classification Rules#

Connections#

DataCollectors#

Deployments#

DraftRun#

DraftRuns#

Engines#

Environments#

Transformers#

Groups#

Jobs#

Job Sequences#

Organizations#

Pipelines#

Protection Methods#

Protection Policies#

ProvisioningAgents#

Roles#

SAQL Searches#

Scheduler#

SeekableList#

Snapshot#

Stage#

Subscriptions#

Tags#

Topologies#

Users#

Common#

Exceptions#