StreamSets Control Hub#

Main interface#

This is the main entry point used by users when interacting with SCH instances.

class streamsets.sdk.ControlHub(credential_id, token, use_websocket_tunneling=True, **kwargs)[source]#

Class to interact with StreamSets Control Hub.

Parameters
  • credential_id (str) – ControlHub credential ID.

  • token (str) – ControlHub token.

  • use_websocket_tunneling (bool, optional) – use websocket tunneling. Default: True.

version#

Version of ControlHub.

Type

str

ldap_enabled#

Indication if LDAP is enabled or not.

Type

bool

organization_global_configuration#

Organization’s Global Configuration instance.

Type

:py:class:`streamsets.sdk.models.Configuration`

pipeline_labels#

Pipeline Labels.

Type

streamsets.sdk.sch_models.PipelineLabels

users#

Organization’s Users.

Type

streamsets.sdk.sch_models.Users

login_audits#

ControlHub Login Audits.

Type

streamsets.sdk.sch_models.LoginAudits

action_audits#

ControlHub Action Audits.

Type

streamsets.sdk.sch_models.ActionAudits

connection_audits#

ControlHub Connection Audits.

Type

streamsets.sdk.sch_models.ConnectionAudits

groups#

ControlHub Groups.

Type

streamsets.sdk.sch_models.Groups

data_collectors#

Data Collector instances registered under ControlHub.

Type

Returns a streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.DataCollector instances.

transformers#

Transformer instances registered under ControlHub.

Type

Returns a streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.Transformer instances.

provisioning_agents (

py:class:`streamsets.sdk.sch_models.ProvisioningAgents): Provisioning Agents registered to the ControlHub instance.

legacy_deployments#

LegacyDeployments instances.

Type

streamsets.sdk.sch_models.LegacyDeployments

organizations#

All the organizations that the current user belongs to.

Type

streamsets.sdk.sch_models.Organizations

api_credentials#

ControlHub Api Credentials.

Type

A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.ApiCredential

pipelines#

ControlHub Pipelines.

Type

streamsets.sdk.sch_models.Pipelines

draft_runs#

ControlHub Draft Runs.

Type

streamsets.sdk.sch_models.DraftRuns

jobs#

ControlHub Jobs.

Type

streamsets.sdk.sch_models.Jobs

data_protector_enabled (

bool): Whether Data Protector is enabled for the current organization.

connection_tags#

ControlHub Connection Tags.

Type

streamsets.sdk.sch_models.ConnectionTags

alerts#

ControlHub Alerts.

Type

A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.Alert

protection_policies#

ControlHub Protection Policies.

Type

streamsets.sdk.sch_models.ProtectionPolicies

scheduled_tasks#

ControlHub Scheduled Tasks.

Type

streamsets.sdk.sch_models.ScheduledTasks

subscriptions#

ControlHub Subscriptions.

Type

streamsets.sdk.sch_models.Subscriptions

subscription_audits#

ControlHub Subscription audits.

Type

A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.SubscriptionAudit

topologies#

ControlHub Topologies.

Type

streamsets.sdk.sch_models.Topologies

connections#

ControlHub Connections.

Type

streamsets.sdk.sch_models.Connections

environments#

ControlHub Environments.

Type

streamsets.sdk.sch_models.Environments

engine_versions#

ControlHub Deployment Engine Configuration.

Type

streamsets.sdk.sch_models.DeploymentEngineConfiguration

deployments#

ControlHub Deployments.

Type

streamsets.sdk.sch_models.Deployments

engines#

ControlHub Engines.

Type

streamsets.sdk.sch_models.Engines

saql_saved_searches_pipeline#

ControlHub SAQL Searches for type Pipeline.

Type

streamsets.sdk.sch_models.SAQLSearches

saql_saved_searches_fragment#

ControlHub SAQL Searches for type Fragment.

Type

streamsets.sdk.sch_models.SAQLSearches

saql_saved_searches_job_instance#

ControlHub SAQL Searches for type Job Instance.

Type

streamsets.sdk.sch_models.SAQLSearches

saql_saved_searches_job_template#

ControlHub SAQL Searches for type Job Template.

Type

streamsets.sdk.sch_models.SAQLSearches

saql_saved_searches_draft_run#

ControlHub SAQL Searches for type Draft Run.

Type

streamsets.sdk.sch_models.SAQLSearches

acknowledge_event_subscription_error(subscription)[source]#

Acknowledge an error on given Event Subscription.

Parameters

subscription (streamsets.sdk.sch_models.Subscription) – A Subscription instance.

Returns

An instance of streamsets.sdk.sch_api.Command.

acknowledge_job_error(*jobs)[source]#

Acknowledge errors for one or more jobs.

Parameters

*jobs – One or more instances of streamsets.sdk.sch_models.Job.

acknowledge_legacy_deployment_error(*deployments)[source]#

Acknowledge errors for one or more deployments.

Parameters

*deployments – One or more instances of streamsets.sdk.sch_models.LegacyDeployment.

property action_audits#

Action Audits.

Returns

An instance of streamsets.sdk.sch_models.ActionAudits.

activate_api_credential(api_credential)[source]#

Activate an api credential.

Parameters

api_credential (streamsets.sdk.sch_models.ApiCredential) – ApiCredential object.

Returns

An instance of streamsets.sdk.sch_api.Command.

activate_engine(engine)[source]#

Activate an engine.

Parameters

engine (streamsets.sdk.sch_models.Engine) – Engine instance.

activate_environment(*environments, timeout_sec=300)[source]#

Activate environments.

Parameters

*environments – One or more instances of streamsets.sdk.sch_models.Environment.

Returns

An instance of streamsets.sdk.sch_api.Command or None.

activate_provisioning_agent(provisioning_agent)[source]#

Activate provisioning agent.

Parameters

provisioning_agent (streamets.sdk.sch_models.ProvisioningAgent) –

Returns

An instance of streamsets.sdk.sch_api.Command.

activate_user(*users)[source]#

Activate users for all given User IDs.

Parameters

*users – One or more instance of streamsets.sdk.sch_models.User.

Returns

An instance of streamsets.sdk.aster_api.Command.

add_api_credential(api_credential)[source]#
Add an api credential. Some api credential attributes are updated by ControlHub such as

created_by.

Parameters

api_credential (streamsets.sdk.sch_models.ApiCredential) – An API credential instance, built via the streamsets.sdk.sch_models.ApiCredentialBuilder.build() method.

Returns

An instance of streamsets.sdk.sch_api.Command.

add_classification_rule(classification_rule, commit=False)[source]#

Add a classification rule.

Parameters
add_connection(connection)[source]#

Add a connection.

Parameters

connection (streamsets.sdk.sch_models.Connection) – Connection object.

Returns

An instance of streamsets.sdk.sch_api.Command.

add_deployment(deployment)[source]#

Add a deployment.

Parameters

deployment (streamsets.sdk.sch_models.Deployment) – Deployment object.

Returns

An instance of streamsets.sdk.sch_api.Command.

add_environment(environment)[source]#

Add an environment.

Parameters

environment (streamsets.sdk.sch_models.Environment) – Environment object.

Returns

An instance of streamsets.sdk.sch_api.Command.

add_group(group)[source]#

Add a group.

Parameters

group (streamsets.sdk.sch_models.Group) – Group object.

Returns

An instance of streamsets.sdk.sch_api.Command.

add_job(job)[source]#

Add a job.

Parameters

job (streamsets.sdk.sch_models.Job) – Job object.

Returns

An instance of streamsets.sdk.sch_api.Command.

add_legacy_deployment(deployment)[source]#

Add a legacy deployment.

Parameters

deployment (streamsets.sdk.sch_models.LegacyDeployment) – Deployment object.

Returns

An instance of streamsets.sdk.sch_api.Command.

add_organization(organization)[source]#

Add an organization.

Parameters

organization (streamsets.sdk.sch_models.Organization) – Organization object.

Returns

An instance of streamsets.sdk.sch_api.Command.

add_protection_policy(protection_policy)[source]#

Add a protection policy.

Parameters

protection_policy (streamsets.sdk.sch_models.ProtectionPolicy) – Protection Policy object.

add_report_definition(report_definition)[source]#

Add Report Definition to ControlHub.

Parameters

report_definition (streamsets.sdk.sch_models.ReportDefinition) – Report Definition instance.

Returns

An instance of streamsets.sdk.sch_api.Command.

add_scheduled_task(task)[source]#

Add the scheduled task to Control Hub.

Parameters

task (streamsets.sdk.sch_models.ScheduledTask) – Scheduled task object.

Returns

An instance of streamsets.sdk.sch_api.Command.

Raises

Exception – Thrown if publishing the task was unsuccessful

add_subscription(subscription)[source]#

Add Subscription to ControlHub.

Parameters

subscription (streamsets.sdk.sch_models.Subscription) – A Subscription instance.

Returns

An instance of streamsets.sdk.sch_api.Command.

property alerts#

Alerts.

Returns

A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.Alert.

property api_credentials#

Api Credentials.

Returns

A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.ApiCredential.

assign_administrator_role(user)[source]#

Designate the specified user as an Organization Administrator.

Parameters

user (streamsets.sdk.sch_models.User) – User object.

Returns

An instance of streamsets.sdk.aster_api.Command.

balance_engines(*engines)[source]#

Balance all jobs running on given Engine.

Parameters

*engines – One or more instances of streamsets.sdk.sch_models.Engine.

balance_job(*jobs)[source]#

Balance one or more jobs.

Parameters

*jobs – One or more instances of streamsets.sdk.sch_models.Job.

check_snowflake_account_validation(token)[source]#

Check the status of a snowflake account validation process.

Parameters

token (str) – Token of the validation process.

Returns

An instance of streamsets.sdk.sch_api.Command.

clone_deployment(deployment, name=None, deployment_tags=None, engine_labels=None, engine_version=None)[source]#

Clone a deployment.

Parameters
  • deployment (streamsets.sdk.sch_models.Deployment) – Deployment object.

  • name (str, optional) – Name of the new deployment. Default: None.

  • deployment_tags (list, optional) – Tags to add to the cloned deployment. Default: `None.

  • engine_labels (list, optional) – Labels to assign to engines belonging to this deployment. Default: None.

  • engine_version (str, optional) – Engine Version ID Default: None.

Returns

An instance of streamsets.sdk.sch_models.Deployment.

property connection_audits#

Connection Audits.

Returns

An instance of streamsets.sdk.sch_models.ConnectionAudits.

property connection_tags#

Connection Tags.

Returns

An instance of streamsets.sdk.sch_models.ConnectionTags.

property connections#

Connections.

Returns

An instance of streamsets.sdk.sch_models.Connections.

create_components(component_type, number_of_components=1, active=True)[source]#

Create components.

Parameters
  • component_type (str) – Component type.

  • number_of_components (int, optional) – Default: 1.

  • active (bool, optional) – Default: True.

Returns

An instance of streamsets.sdk.sch_api.CreateComponentsCommand.

property data_collectors#

Data Collectors registered to the ControlHub instance.

Returns

Returns a streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.DataCollector instances.

property data_protector_enabled#

Whether Data Protector is enabled for the current organization.

Type

bool

deactivate_api_credential(api_credential)[source]#

Deactivate an api credential.

Parameters

api_credential (streamsets.sdk.sch_models.ApiCredential) – ApiCredential object.

Returns

An instance of streamsets.sdk.sch_api.Command.

deactivate_engine(engine)[source]#

Deactivate an engine.

Parameters

engine (streamsets.sdk.sch_models.Engine) – Engine instance.

deactivate_environment(*environments)[source]#

Deactivate environments.

Parameters

*environments – One or more instances of streamsets.sdk.sch_models.Environment.

Returns

An instance of streamsets.sdk.sch_api.Command.

deactivate_provisioning_agent(provisioning_agent)[source]#

Deactivate provisioning agent.

Parameters

provisioning_agent (streamets.sdk.sch_models.ProvisioningAgent) –

Returns

An instance of streamsets.sdk.sch_api.Command.

deactivate_user(*users)[source]#

Deactivate Users for all given User IDs.

Parameters

*users – One or more instances of streamsets.sdk.sch_models.User.

Returns

An instance of streamsets.sdk.aster_api.Command.

delete_and_unregister_engine(engine)[source]#

Delete and Unregister an engine.

Parameters

engine (streamsets.sdk.sch_models.Engine) – Engine instance.

delete_api_credentials(*api_credentials)[source]#

Delete api_credentials.

Parameters

*api_credentials – One or more instances of streamsets.sdk.sch_models.ApiCredential.

delete_connection(*connections)[source]#

Delete connections.

Parameters

*connections – One or more instances of streamsets.sdk.sch_models.Connection.

delete_deployment(*deployments)[source]#

Delete deployments.

Parameters

*deployments (streamsets.sdk.sch_models.Deployment) – One or more deployments.

delete_draft_run(*draft_runs)[source]#

Remove a draft run.

Parameters

*draft_runs – One or more instances of streamsets.sdk.sch_models.DraftRun.

delete_engine(engine)[source]#

Delete an engine.

Parameters

engine (streamsets.sdk.sch_models.Engine) – Engine instance.

delete_environment(*environments)[source]#

Delete environments.

Parameters

*environments – One or more instances of streamsets.sdk.sch_models.Environment.

Returns

An instance of streamsets.sdk.sch_api.Command.

delete_group(*groups)[source]#

Delete groups.

Parameters

*groups – One or more instances of streamsets.sdk.sch_models.Group.

Returns

An instance of streamsets.sdk.sch_api.Command.

delete_job(*jobs)[source]#

Delete one or more jobs.

Parameters

*jobs – One or more instances of streamsets.sdk.sch_models.Job.

delete_job_sequences(*job_sequences)[source]#

Delete Job Sequences.

Parameters

*job_sequences – One or more instances of streamsets.sdk.sch_models.JobSequence

Returns

An instance of streamsets.sdk.sch_api.Command.

delete_legacy_deployment(*deployments)[source]#

Delete deployments.

Parameters

*deployments – One or more instances of streamsets.sdk.sch_models.LegacyDeployment.

Returns

An instance of streamsets.sdk.sch_api.Command.

delete_pipeline(pipeline, only_selected_version=False)[source]#

Delete a pipeline.

Parameters
Returns

An instance of streamsets.sdk.sch_api.Command.

delete_pipeline_labels(*pipeline_labels)[source]#

Delete pipeline labels.

Parameters

*pipeline_labels – One or more instances of streamsets.sdk.sch_models.PipelineLabel.

Returns

An instance of streamsets.sdk.sch_api.Command.

delete_provisioning_agent(provisioning_agent)[source]#

Delete provisioning agent.

Parameters

provisioning_agent (streamets.sdk.sch_models.ProvisioningAgent) –

Returns

An instance of streamsets.sdk.sch_api.Command.

delete_provisioning_agent_token(provisioning_agent)[source]#

Delete provisioning agent token.

Parameters

provisioning_agent (streamets.sdk.sch_models.ProvisioningAgent) –

Returns

An instance of streamsets.sdk.sch_api.Command.

delete_report_definition(report_definition)[source]#

Delete an existing Report Definition.

Parameters

report_definition (streamsets.sdk.sch_models.ReportDefinition) – Report Definition instance.

Returns

An instance of streamsets.sdk.sch_api.Command.

delete_scheduled_tasks(*scheduled_tasks)[source]#

Delete given scheduled tasks.

Parameters

*scheduled_tasks – One or more instances of streamsets.sdk.sch_models.ScheduledTask

Returns

An instance of streamsets.sdk.sch_api.Command.

delete_snowflake_pipeline_defaults()[source]#

Delete the Snowflake pipeline defaults for this user.

Returns

An instance of streamsets.sdk.sch_api.Command.

delete_snowflake_user_credentials()[source]#

Delete the Snowflake user credentials.

Returns

An instance of streamsets.sdk.sch_api.Command.

delete_subscription(subscription)[source]#

Delete an exisiting Subscription.

Parameters

subscription (streamsets.sdk.sch_models.Subscription) – A Subscription instance.

Returns

An instance of streamsets.sdk.sch_api.Command.

delete_topology(topology, only_selected_version=False)[source]#

Delete a topology.

Parameters
Returns

An instance of streamsets.sdk.sch_api.Command.

delete_unregistered_auth_tokens(executor_type)[source]#

Delete auth tokens for engines that have been unregistered.

Parameters

executor_type (str) – The executor type. Acceptable values are ‘DATACOLLECTOR’ or ‘TRANSFORMER’.

Returns

An instance of streamsets.sdk.sch_api.Command.

delete_user(*users, deactivate=False)[source]#

Delete users. Deactivate users before deleting if configured.

Parameters
Returns

An instance of streamsets.sdk.aster_api.Command.

property deployments#

Deployments.

Returns

An instance of streamsets.sdk.sch_models.Deployments.

disable_job_sequence(job_sequence)[source]#

Disable the Job Sequence.

Parameters

job_sequence – An instance of streamsets.sdk.sch_models.JobSequence

Returns

An instance of streamsets.sdk.sch_api.Command.

property draft_runs#

Draft Runs.

Returns

An instance of streamsets.sdk.sch_models.DraftRuns.

duplicate_job(job, name=None, description=None, number_of_copies=1)[source]#

Duplicate an existing job.

Parameters
  • job (streamsets.sdk.sch_models.Job) – Job object.

  • name (str, optional) – Name of the new job(s). Default: None. If not specified, name of the job with ' copy' appended to the end will be used.

  • description (str, optional) – Description for new job(s). Default: None.

  • number_of_copies (int, optional) – Number of copies. Default: 1.

Returns

A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.Job.

duplicate_pipeline(pipeline, name=None, description='New Pipeline', number_of_copies=1)[source]#

Duplicate an existing pipeline.

Parameters
  • pipeline (streamsets.sdk.sch_models.Pipeline) – Pipeline object.

  • name (str, optional) – Name of the new pipeline(s). Default: None.

  • description (str, optional) – Description for new pipeline(s). Default: 'New Pipeline'.

  • number_of_copies (int, optional) – Number of copies. Default: 1.

Returns

A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.Pipeline.

edit_job(job)[source]#

Edit a job.

Parameters

job (streamsets.sdk.sch_models.Job) – Job object.

Returns

An instance of streamsets.sdk.sch_models.Job.

enable_job_sequence(job_sequence)[source]#

Enable the Job Sequence.

Parameters

job_sequence – An instance of streamsets.sdk.sch_models.JobSequence

Returns

An instance of streamsets.sdk.sch_api.Command.

property engine_versions#
Deployment engine version. The engine_versions and engine_configuration (Deployment) properties

have the same object structure represented by DeploymentEngineConfiguration.

Returns

An instance of streamsets.sdk.sch_models.DeploymentEngineConfiguration.

property environments#

environments.

Returns

An instance of streamsets.sdk.sch_models.Environments.

export_jobs(jobs)[source]#

Export jobs to a compressed archive.

Parameters

jobs (list) – A list of streamsets.sdk.sch_models.Job instances.

Returns

An instance of type bytes indicating the content of zip file with job json files.

export_pipelines(pipelines, fragments=False, include_plain_text_credentials=False)[source]#

Export pipelines.

Parameters
  • pipelines (list) – A list of streamsets.sdk.sch_models.Pipeline instances.

  • fragments (bool) – Indicates if exporting fragments is needed.

  • include_plain_text_credentials (bool) – Indicates if plain text credentials should be included.

Returns

An instance of type bytes indicating the content of zip file with pipeline json files.

export_protection_policies(protection_policies)[source]#

Export protection policies to a compressed archive.

Parameters
Returns

An instance of type bytes indicating the content of zip file with protection policy json files.

export_topologies(topologies)[source]#

Export topologies.

Parameters

topologies (list) – A list of streamsets.sdk.sch_models.Topology instances.

Returns

An instance of type bytes indicating the content of zip file with pipeline json files.

get_admin_tool(base_url, username, password)[source]#

Get ControlHub admin tool.

Returns

An instance of streamsets.sdk.sch_models.AdminTool.

get_api_credential_builder()[source]#

Get api credential Builder.

Returns

An instance of streamsets.sdk.sch_models.ApiCredentialBuilder.

get_classification_rule_builder()[source]#

Get a classification rule builder instance with which a classification rule can be created.

Returns

An instance of streamsets.sdk.sch_models.ClassificationRuleBuilder.

get_components(component_type_id, offset=None, len_=None, order_by='LAST_VALIDATED_ON', order='ASC')[source]#

Get components.

Parameters
  • component_type_id (str) – Component type id.

  • offset (str, optional) – Default: None.

  • len (str, optional) – Default: None.

  • order_by (str, optional) – Default: 'LAST_VALIDATED_ON'.

  • order (str, optional) – Default: 'ASC'.

get_connection_builder()[source]#

Get a connection builder instance with which a connection can be created.

Returns

An instance of streamsets.sdk.sch_models.ConnectionBuilder.

get_current_job_status(job)[source]#

Returns the current job status for given job id.

Parameters

job (streamsets.sdk.sch_models.Job) – Job object.

get_data_sla_builder()[source]#

Get a Data SLA builder instance with which a Data SLA can be created.

Returns

An instance of streamsets.sdk.sch_models.DataSlaBuilder.

get_deployment_builder(deployment_type='SELF')[source]#

Get a deployment builder instance with which a deployment can be created.

Parameters

( (deployment_type) – obj: str, optional) Valid values are ‘AWS’, ‘GCP’ and ‘SELF’. Default 'SELF'

Returns

An instance of streamsets.sdk.sch_models.DeploymentBuilder.

get_engine_labels(engine)[source]#

Returns all labels assigned to an engine.

Parameters

engine (streamsets.sdk.sch_models.Engine) – Engine instance.

Returns

A list of engine assigned labels.

get_environment_builder(environment_type='SELF')[source]#

Get an environment builder instance with which an environment can be created.

Parameters

( (environment_type) – obj: str, optional) Valid values are ‘AWS’, ‘GCP’, ‘KUBERNETES’ and ‘SELF’. Default 'SELF'

Returns

An instance of streamsets.sdk.sch_models.EnvironmentBuilder.

get_group_builder()[source]#

Get a group builder instance with which a group can be created.

Returns

An instance of streamsets.sdk.sch_models.GroupBuilder.

get_job_builder()[source]#

Get a job builder instance with which a job can be created.

Returns

An instance of streamsets.sdk.sch_models.JobBuilder.

get_job_sequence_builder()[source]#

Get a job sequence builder instance with which a job sequence can be created.

Returns

An instance of streamsets.sdk.sch_models.JobSequenceBuilder.

get_kubernetes_apply_agent_yaml_command(environment)[source]#

Get install script for a Kubernetes environment.

Parameters

environment (streamsets.sdk.sch_models.Environment) – Environment object.

Returns

An instance of str.

get_kubernetes_environment_yaml(environment)[source]#

Get the YAML file for a Kubernetes environment.

Parameters

environment (streamsets.sdk.sch_models.Environment) – Environment for which the YAML file should be fetched.

Returns

An instance of str.

get_legacy_deployment_builder()[source]#

Get a deployment builder instance with which a legacy deployment can be created.

Returns

An instance of streamsets.sdk.sch_models.LegacyDeploymentBuilder.

get_organization_builder()[source]#

Get an organization builder instance with which an organization can be created.

Returns

An instance of streamsets.sdk.sch_models.OrganizationBuilder.

get_pipeline_builder(engine_type=None, engine_id=None, fragment=False, **kwargs)[source]#

Get a pipeline builder instance with which a pipeline can be created.

Parameters
  • engine_type (str) – The type of pipeline that will be created. The options are 'data_collector', 'snowflake' or 'transformer'.

  • engine_id (str) – The ID of the Data Collector or Transformer in which to author the pipeline if not using Transformer for Snowflake.

  • fragment (boolean, optional) – Specify if a fragment builder. Default: False.

Returns

An instance of streamsets.sdk.sch_models.PipelineBuilder or streamsets.sdk.sch_models.StPipelineBuilder.

get_protection_method_builder()[source]#

Get a protection method builder instance with which a protection method can be created.

Returns

An instance of streamsets.sdk.sch_models.ProtectionMethodBuilder.

get_protection_policy_builder()[source]#

Get a protection policy builder instance with which a protection policy can be created.

Returns

An instance of streamsets.sdk.sch_models.ProtectionPolicyBuilder.

get_saql_search_builder(saql_search_type, mode='BASIC')[source]#

Get SAQL Search Builder.

saql_search_type (str): Type of SAQL search object, limited to``’PIPELINE’`` or 'FRAGMENT',

'JOB_INSTANCE', 'JOB_TEMPLATE' and 'JOB_DRAFT_RUN'.

mode (str, optional): mode of SAQL Search. Default: 'BASIC'

Returns

An instance of streamsets.sdk.sch_models.SAQLSearchBuilder.

get_scheduled_task_builder()[source]#

Get a scheduled task builder instance with which a scheduled task can be created.

Returns

An instance of streamsets.sdk.sch_models.ScheduledTaskBuilder.

get_self_managed_deployment_install_script(deployment, install_mechanism='DEFAULT', java_version=None)[source]#

Get install script for a Self Managed deployment.

Parameters
  • deployment (streamsets.sdk.sch_models.Deployment) – deployment object.

  • install_mechanism (str, optional) – Possible values for install are “DEFAULT”, “BACKGROUND” and “FOREGROUND”. Default: DEFAULT

  • java_version (str, optional) – Java Development Kit Version to be used. Example: “8”. Default: None.

Returns

An instance of streamsets.sdk.sch_api.Command.

get_snowflake_pipeline_defaults()[source]#

Get the Snowflake pipeline defaults for this user (if it exists).

Returns

A dict of Snowflake pipeline defaults.

get_snowflake_user_credentials()[source]#

Get the Snowflake user credentials (if they exist). They will be redacted.

Returns

A dict of Snowflake user credentials (redacted).

get_subscription_builder()[source]#

Get Event Subscription Builder.

Returns

An instance of streamsets.sdk.sch_models.SubscriptionBuilder.

get_topology_builder()[source]#

Get a topology builder instance with which a topology can be created.

Returns

An instance of streamsets.sdk.sch_models.TopologyBuilder.

get_user_builder()[source]#

Get a user builder instance with which a user can be created.

Returns

An instance of streamsets.sdk.sch_models.UserBuilder.

property groups#

Groups.

Returns

An instance of streamsets.sdk.sch_models.Groups.

import_jobs(archive, pipeline=True, number_of_instances=False, labels=False, runtime_parameters=False, **kwargs)[source]#

Import jobs from archived zip directory.

Parameters
  • archive (file) – file containing the jobs.

  • pipeline (boolean, optional) – Indicate if pipeline should be imported. Default: True.

  • number_of_instances (boolean, optional) – Indicate if number of instances should be imported. Default: False.

  • labels (boolean, optional) – Indicate if labels should be imported. Default: False.

  • runtime_parameters (boolean, optional) – Indicate if runtime parameters should be imported. Default: False.

Returns

A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.Job.

import_pipeline(pipeline, commit_message, name=None, data_collector_instance=None)[source]#

Import pipeline from json file.

Parameters
  • pipeline (dict) – A python dict representation of ControlHub Pipeline.

  • commit_message (str) – Commit message.

  • name (str, optional) – Name of the pipeline. If left out, pipeline name from JSON object will be used. Default None.

  • data_collector_instance (streamsets.sdk.sch_models.DataCollector) – If excluded, system sdc will be used. Default None.

Returns

An instance of streamsets.sdk.sch_models.Pipeline.

import_pipelines_from_archive(archive, commit_message, fragments=False)[source]#

Import pipelines from archived zip directory.

Parameters
  • archive (file) – file containing the pipelines.

  • commit_message (str) – Commit message.

  • fragments (bool, optional) – Indicates if pipeline contains fragments.

Returns

A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.Pipeline.

import_protection_policies(policies_archive)[source]#

Import protection policies from a compressed archive.

Parameters

policies_archive (file) – file containing the protection policies.

Returns

A py:class:streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.ProtectionPolicy.

import_topologies(archive, import_number_of_instances=False, import_labels=False, import_runtime_parameters=False, **kwargs)[source]#

Import topologies from archived zip directory.

Parameters
  • archive (file) – file containing the topologies.

  • import_number_of_instances (boolean, optional) – Indicate if number of instances should be imported. Default: False.

  • import_labels (boolean, optional) – Indicate if labels should be imported. Default: False.

  • import_runtime_parameters (boolean, optional) – Indicate if runtime parameters should be imported. Default: False.

Returns

A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.Topology.

invite_user(user)[source]#

Invite a user to the StreamSets Platform.

Parameters

user (streamsets.sdk.sch_models.User) – User object.

Returns

An instance of streamsets.sdk.sch_api.Command.

property job_sequences#

Job Sequences.

Returns

An instance of streamsets.sdk.sch_models.JobSequences.

property jobs#

Jobs.

Returns

An instance of streamsets.sdk.sch_models.Jobs.

kill_scheduled_tasks(*scheduled_tasks)[source]#

Kill given scheduled tasks.

Parameters

*scheduled_tasks – One or more instances of streamsets.sdk.sch_models.ScheduledTask

Returns

An instance of streamsets.sdk.sch_api.Command.

property ldap_enabled#

Indication if LDAP is enabled or not.

Returns

An instance of boolean.

property legacy_deployments#

Deployments.

Returns

An instance of streamsets.sdk.sch_models.LegacyDeployments.

lock_deployment(deployment)[source]#

Lock deployment.

Parameters

deployment – An instance of streamsets.sdk.sch_models.Deployment.

property login_audits#

Login Audits.

Returns

An instance of streamsets.sdk.sch_models.LoginAudits.

mark_saql_search_as_favorite(saql_search)[source]#

Marks a saql search as a favorite. In other words, it stars an existing search query.

Parameters

saql_search (streamsets.sdk.sch_models.SAQLSearch) – The SAQL Search.

Returns

An instance of streamsets.sdk.sch_api.Command.

property metering_and_usage#

Metering and usage for the Organization. By default, this will return a report for the last 30 days of metering data. A report for a custom time window can be retrieved by indexing this object with a slice that contains a datetime object for the start (required) and stop (optional, defaults to datetime.now()).

ex. metering_and_usage[datetime(2022, 1, 1):datetime(2022, 1, 14)]

metering_and_usage[datetime.now() - timedelta(7):]

Returns

An instance of streamsets.sdk.sch_models.MeteringUsage

property organizations#

Organizations.

Returns

An instance of streamsets.sdk.sch_models.Organizations.

pause_scheduled_tasks(*scheduled_tasks)[source]#

Pause given scheduled tasks.

Parameters

*scheduled_tasks – One or more instances of streamsets.sdk.sch_models.ScheduledTask

Returns

An instance of streamsets.sdk.sch_api.Command.

property pipeline_labels#

Pipeline labels.

Returns

An instance of streamsets.sdk.sch_models.PipelineLabels.

property pipelines#

Pipelines.

Returns

An instance of streamsets.sdk.sch_models.Pipelines.

preview_classification_rule(classification_rule, parameter_data, data_collector=None)[source]#

Dynamic preview of a classification rule.

Parameters
Returns

An instance of streamsets.sdk.sdc_api.PreviewCommand.

property protection_policies#

Protection policies.

Returns

An instance of streamsets.sdk.sch_models.ProtectionPolicies.

property provisioning_agents#

Provisioning Agents registered to the ControlHub instance.

Returns

An instance of streamsets.sdk.sch_models.ProvisioningAgents.

publish_job_sequence(job_sequence)[source]#

Publishes an Job Sequence to ControlHub.

Parameters

job_sequence – An instance of streamsets.sdk.sch_models.JobSequence

Returns

An instance of streamsets.sdk.sch_api.Command.

publish_pipeline(pipeline, commit_message='New pipeline', draft=False, validate=False)[source]#

Publish a pipeline.

Parameters
  • pipeline (streamsets.sdk.sch_models.Pipeline) – Pipeline object.

  • commit_message (str, optional) – Default: 'New pipeline'.

  • draft (boolean, optional) – Default: False.

  • validate (boolean, optional) – Default: False.

Returns

An instance of streamsets.sdk.sch_api.Command.

publish_scheduled_task(task)[source]#

Send the scheduled task to Control Hub. DEPRECATED

Parameters

task (streamsets.sdk.sch_models.ScheduledTask) – Scheduled task object.

Returns

An instance of streamsets.sdk.sch_api.Command.

publish_topology(topology, commit_message=None)[source]#

Publish a topology.

Parameters
  • topology (streamsets.sdk.sch_models.Topology) – Topology object to publish.

  • commit_message (str, optional) – Commit message to supply with the Topology. Default: None

Returns

An instance of streamsets.sdk.sch_api.Command.

regenerate_api_credential_auth_token(api_credential)[source]#

Regenerate the auth token for an api credential.

Parameters

api_credential (streamsets.sdk.sch_models.ApiCredential) – ApiCredential object.

Returns

An instance of streamsets.sdk.sch_api.Command.

remove_administrator_role(user)[source]#

Remove the Organization Administrator status from the specified user.

Parameters

user (streamsets.sdk.sch_models.User) – User object.

Returns

An instance of streamsets.sdk.aster_api.Command.

Remove a saql search.

Parameters

saql_search (streamsets.sdk.sch_models.SAQLSearch) – The SAQL Search object.

Returns

An instance of streamsets.sdk.sch_api.Command.

rename_api_credential(api_credential)[source]#

Rename an api credential.

Parameters

api_credential (streamsets.sdk.sch_models.ApiCredential) – ApiCredential object.

Returns

An instance of streamsets.sdk.sch_api.Command.

property report_definitions#

Report Definitions.

Returns

An instance of streamsets.sdk.sch_models.ReportDefinitions.

reset_origin(*jobs)[source]#

Reset all pipeline offsets for given jobs.

Parameters

*jobs – One or more instances of streamsets.sdk.sch_models.Job.

Returns

A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.Job.

restart_engines(*engines)[source]#

Restart the given engine(s).

Parameters

*engines – One or more instances of streamsets.sdk.sch_models.DataCollector or streamsets.sdk.sch_models.Transformer.

Returns

An instance of streamsets.sdk.sch_api.Command.

resume_scheduled_tasks(*scheduled_tasks)[source]#

Resume given scheduled tasks.

Parameters

*scheduled_tasks – One or more instances of streamsets.sdk.sch_models.ScheduledTask

Returns

An instance of streamsets.sdk.sch_api.Command.

run_job_sequence(job_sequence)[source]#

Run the Job Sequence.

Parameters

job_sequence – An instance of streamsets.sdk.sch_models.JobSequence

Returns

An instance of streamsets.sdk.sch_api.Command.

run_pipeline_preview(pipeline, batches=1, batch_size=10, skip_targets=True, skip_lifecycle_events=True, end_stage=None, only_schema=None, timeout=120000, test_origin=False, read_policy=None, write_policy=None, executor=None, **kwargs)[source]#

Run pipeline preview.

Parameters
  • pipeline (streamsets.sdk.sch_models.Pipeline) – Pipeline object.

  • batches (int, optional) – Number of batches. Default: 1.

  • batch_size (int, optional) – Batch size. Default: 10.

  • skip_targets (bool, optional) – Skip targets. Default: True.

  • skip_lifecycle_events (bool, optional) – Skip life cycle events. Default: True.

  • end_stage (str, optional) – End stage. Default None.

  • only_schema (bool, optional) – Only schema. Default: None.

  • timeout (int, optional) – Timeout. Default: 120000.

  • test_origin (bool, optional) – Test origin. Default: False.

  • read_policy (streamsets.sdk.sch_models.ProtectionPolicy) – Read Policy for preview. If not provided, uses default read policy if one available. Default: None.

  • write_policy (streamsets.sdk.sch_models.ProtectionPolicy) – Write Policy for preview. If not provided, uses default write policy if one available. Default: None.

  • executor (streamsets.sdk.sch_models.DataCollector, optional) – The Data Collector in which to preview the pipeline. If omitted, ControlHub’s first executor SDC will be used. Default: None.

Returns

An instance of streamsets.sdk.sdc_api.PreviewCommand.

run_snowflake_pipeline_preview(pipeline_id, rev=0, batches=1, batch_size=10, skip_targets=True, end_stage=None, only_schema=None, timeout=2000, test_origin=False, stage_outputs_to_override_json=None, **kwargs)[source]#

Run Snowflake pipeline preview.

Parameters
  • pipeline_id (str) – The pipeline instance’s ID.

  • rev (int, optional) – Pipeline revision. Default: 0.

  • batches (int, optional) – Number of batches. Default: 1.

  • batch_size (int, optional) – Batch size. Default: 10.

  • skip_targets (bool, optional) – Skip targets. Default: True.

  • end_stage (str, optional) – End stage. Default: None.

  • only_schema (bool, optional) – Only schema. Default: None.

  • timeout (int, optional) – Timeout. Default: 2000.

  • test_origin (bool, optional) – Test origin. Default: False

  • stage_outputs_to_override_json (str, optional) – Stage outputs to override. Default: None.

  • wait (bool, optional) – Wait for pipeline preview to finish. Default: True.

Returns

An instance of streamsets.sdk.sdc_api.PreviewCommand.

property saql_saved_searches_draft_run#

Get SAQL Searches for type Draft Run.

Returns

An instance of streamsets.sdk.sch_models.SAQLSearches.

property saql_saved_searches_fragment#

Get SAQL Searches for type Fragment.

Returns

An instance of streamsets.sdk.sch_models.SAQLSearches.

property saql_saved_searches_job_instance#

Get SAQL Searches for type Job Instance.

Returns

An instance of streamsets.sdk.sch_models.SAQLSearches.

property saql_saved_searches_job_template#

Get SAQL Searches for type Job Template.

Returns

An instance of streamsets.sdk.sch_models.SAQLSearches.

property saql_saved_searches_pipeline#

Get SAQL Searches for type Pipeline.

Returns

An instance of streamsets.sdk.sch_models.SAQLSearches.

Save a saql search query.

Parameters

saql_search (streamsets.sdk.sch_models.SAQLSearch) – The SAQL Search object.

Returns

An instance of streamsets.sdk.sch_api.Command.

scale_legacy_deployment(deployment, num_instances)[source]#

Scale up/down active deployment.

Parameters
  • deployment (streamsets.sdk.sch_models.LegacyDeployment) – Deployment object.

  • num_instances (int) – Number of sdc instances.

Returns

An instance of streamsets.sdk.sch_api.Command.

property scheduled_tasks#

Scheduled Tasks.

Returns

An instance of streamsets.sdk.sch_models.ScheduledTasks.

set_default_read_protection_policy(protection_policy)[source]#

Set a default read protection policy.

Parameters

protection_policy

:param (streamsets.sdk.sch_models.ProtectionPolicy): Protection :param Policy object to be set as the default read policy.:

Returns

An updated instance of streamsets.sdk.sch_models.ProtectionPolicy.

set_default_write_protection_policy(protection_policy)[source]#

Set a default write protection policy.

Parameters

protection_policy

:param (streamsets.sdk.sch_models.ProtectionPolicy): Protection :param Policy object to be set as the default write policy.:

Returns

An updated instance of streamsets.sdk.sch_models.ProtectionPolicy.

shutdown_engines(*engines)[source]#

Call shutdown on the given engine(s).

Parameters

*engines – One or more instances of streamsets.sdk.sch_models.DataCollector or streamsets.sdk.sch_models.Transformer.

Returns

An instance of streamsets.sdk.sch_api.Command.

start_deployment(*deployments)[source]#

Start Deployments.

Parameters

*deployments – One or more instances of streamsets.sdk.sch_models.Deployment.

start_draft_run(pipeline, reset_origin=False, runtime_parameters=None)[source]#

Start a draft run.

Parameters
  • pipeline (streamsets.sdk.sch_models.Pipeline) – Pipeline object. (Must be in draft mode)

  • reset_origin (boolean, optional) – Default: False.

  • runtime_parameters (dict, optional) – Pipeline runtime parameters. Default: None.

Returns

An instance of streamsets.sdk.sch_api.Command.

start_job(*jobs, wait=True, **kwargs)[source]#

Start one or more jobs.

Parameters
  • *jobs – One or more instances of streamsets.sdk.sch_models.Job.

  • wait (bool, optional) – Wait for pipelines to reach RUNNING status before returning. Default: True.

  • **kwargs – Arbitrary keyword arguments.

Returns

An instance of streamsets.sdk.sch_api.StartJobsCommand.

start_job_template(job_template, name=None, description=None, attach_to_template=True, delete_after_completion=False, instance_name_suffix='COUNTER', number_of_instances=1, parameter_name=None, raw_job_tags=None, runtime_parameters=None, wait_for_data_collectors=False, inherit_permissions=False, wait=True, asynchronous=False)[source]#

Start Job instances from a Job Template.

Parameters
  • job_template (streamsets.sdk.sch_models.Job) – A Job instance with the property job_template set to True.

  • name (str, optional) – Name of the new job(s). Default: None. If not specified, name of the job template with ' copy' appended to the end will be used.

  • description (str, optional) – Description for new job(s). Default: None.

  • attach_to_template (bool, optional) – Default: True.

  • delete_after_completion (bool, optional) – Default: False.

  • instance_name_suffix (str, optional) – Suffix to be used for Job names in {‘COUNTER’, ‘TIME_STAMP’, ‘PARAM_VALUE’}. Default: COUNTER.

  • number_of_instances (int, optional) – Number of instances to be started using given parameters. Default: 1.

  • parameter_name (str, optional) – Specified when instance_name_suffix is ‘PARAM_VALUE’. Default: None.

  • raw_job_tags (list, optional) – Default: None.

  • runtime_parameters (dict) or (list) – Runtime Parameters to be used in the jobs. If a dict is specified, number_of_instances jobs will be started. If a list is specified, number_of_instances is ignored and job instances will be started using the elements of the list as Runtime Parameters for each job. If left out, Runtime Parameters from Job Template will be used. Default: None.

  • wait_for_data_collectors (bool, optional) – Wait for the created job instance(s) to be assigned an engine before proceeding. Default: False.

  • inherit_permissions (bool, optional) – Parameter to determine if the user wants to inherit the ACL from the template instead of getting the default ACL for it. Default: False.

  • wait (bool, optional) – Wait for jobs to reach ACTIVE status before returning. Default: True.

  • asynchronous (bool, optional) – Whether to start and create the jobs asynchronously or not. Default: False.

Returns

A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.Job instances.

start_legacy_deployment(deployment, **kwargs)[source]#

Start Deployment.

Parameters
  • deployment (streamsets.sdk.sch_models.LegacyDeployment) – Deployment instance.

  • wait (bool, optional) – Wait for deployment to start. Default: True.

  • wait_for_statuses (list, optional) – Deployment statuses to wait on. Default: ['ACTIVE'].

Returns

An instance of streamsets.sdk.sch_api.DeploymentStartStopCommand.

stop_deployment(*deployments)[source]#

Stop Deployments.

Parameters

*deployments – One or more instances of streamsets.sdk.sch_models.Deployment.

stop_draft_run(*draft_runs, force=False, timeout_sec=300)[source]#

Stop a draft run.

Parameters
  • *draft_runs – One or more instances of streamsets.sdk.sch_models.DraftRuns.

  • force (bool, optional) – Force draft run to stop. Default: False.

  • timeout_sec (int, optional) – Timeout in secs. Default: 300.

Returns

An instance of streamsets.sdk.sch_api.StopJobsCommand.

stop_job(*jobs, force=False, timeout_sec=300)[source]#

Stop one or more jobs.

Parameters
  • *jobs – One or more instances of streamsets.sdk.sch_models.Job.

  • force (bool, optional) – Force job to stop. Default: False.

  • timeout_sec (int, optional) – Timeout in secs. Default: 300.

Returns

An instance of streamsets.sdk.sch_api.StopJobsCommand.

stop_legacy_deployment(deployment, wait_for_statuses=['INACTIVE'])[source]#

Stop Deployment.

Parameters
  • deployment (streamsets.sdk.sch_models.LegacyDeployment) – Deployment instance.

  • wait_for_statuses (list, optional) – List of statuses to wait for. Default: ['INACTIVE'].

Returns

An instance of streamsets.sdk.sch_api.DeploymentStartStopCommand.

stop_test_pipeline_run(start_pipeline_command)[source]#

Stop the test run of pipeline.

Parameters

start_pipeline_command (streamsets.sdk.sdc_api.StartPipelineCommand) –

Returns

An instance of streamsets.sdk.sdc_api.StopPipelineCommand.

property subscription_audits#

Subscription audits.

Returns

A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.SubscriptionAudit.

property subscriptions#

Event Subscriptions.

Returns

An instance of streamsets.sdk.sch_models.Subscriptions.

sync_job(*jobs)[source]#

Sync one or more jobs.

Parameters

*jobs – One or more instances of streamsets.sdk.sch_models.Job.

test_pipeline_run(pipeline, reset_origin=False, parameters=None)[source]#

Test run a pipeline.

Parameters
  • pipeline (streamsets.sdk.sch_models.Pipeline) – Pipeline object.

  • reset_origin (boolean, optional) – Default: False.

  • parameters (dict, optional) – Pipeline parameters. Default: None.

Returns

An instance of streamsets.sdk.sdc_api.StartPipelineCommand.

property topologies#

Topologies.

Returns

An instance of streamsets.sdk.sch_models.Topologies.

property transformers#

Transformers registered to the ControlHub instance.

Returns

Returns a streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.Transformer instances.

unlock_deployment(deployment)[source]#

Unlock deployment.

Parameters

deployment – An instance of streamsets.sdk.sch_models.Deployment.

update_connection(connection)[source]#

Update a connection.

Parameters

connection (streamsets.sdk.sch_models.Connection) – Connection object.

Returns

An instance of streamsets.sdk.sch_api.Command.

update_deployment(deployment)[source]#

Update a deployment.

Parameters

deployment (streamsets.sdk.sch_models.Deployment) – deployment object.

Returns

An instance of streamsets.sdk.sch_api.Command.

update_engine_labels(engine)[source]#

Update an engine’s labels.

Parameters

engine (streamsets.sdk.sch_models.Engine) – Engine instance.

Returns

An instance of streamsets.sdk.sch_models.Engine.

update_engine_resource_thresholds(engine, max_cpu_load=None, max_memory_used=None, max_pipelines_running=None)[source]#

Updates engine resource thresholds.

Parameters
  • engine (streamsets.sdk.sch_models.Engine) – Engine instance.

  • max_cpu_load (float, optional) – Max CPU load in percentage. Default: None.

  • max_memory_used (int, optional) – Max memory used in MB. Default: None.

  • max_pipelines_running (int, optional) – Max pipelines running. Default: None.

Returns

An instance of streamsets.sdk.sch_api.Command.

update_environment(environment, timeout_sec=300)[source]#

Update an environment.

Parameters

environment (streamsets.sdk.sch_models.Environment) – environment object.

Returns

An instance of streamsets.sdk.sch_api.Command.

update_group(group)[source]#

Update a group.

Parameters

group (streamsets.sdk.sch_models.Group) – Group object.

Returns

An instance of streamsets.sdk.sch_api.Command.

update_job(job)[source]#

Update a job.

Parameters

job (streamsets.sdk.sch_models.Job) – Job object.

Returns

An instance of streamsets.sdk.sch_models.Job.

update_job_sequence_metadata(job_sequence)[source]#

Update Job Sequence metadata.

Parameters

job_sequence – An instance of streamsets.sdk.sch_models.JobSequence

Returns

An instance of streamsets.sdk.sch_api.Command.

update_legacy_deployment(deployment)[source]#

Update a deployment.

Parameters

deployment (streamsets.sdk.sch_models.LegacyDeployment) – Deployment object.

Returns

An instance of streamsets.sdk.sch_api.Command.

update_organization(organization)[source]#

Update an organization.

Parameters

organization (streamsets.sdk.sch_models.Organization) – Organization instance.

Returns

An instance of streamsets.sdk.sch_api.Command.

update_pipelines_with_different_fragment_version(pipelines, from_fragment_version, to_fragment_version)[source]#

Update pipelines with latest pipeline fragment commit version.

Parameters
Returns

An instance of streamsets.sdk.sch_api.Command

update_report_definition(report_definition)[source]#

Update an existing Report Definition.

Parameters

report_definition (streamsets.sdk.sch_models.ReportDefinition) – Report Definition instance.

Returns

An instance of streamsets.sdk.sch_api.Command.

Update a saql search query.

Parameters

saql_search (streamsets.sdk.sch_models.SAQLSearch) – The SAQL Search object.

Returns

An instance of streamsets.sdk.sch_api.Command.

update_scheduled_task(task)[source]#

Update an existing scheduled task in Control Hub.

Parameters

task (streamsets.sdk.sch_models.ScheduleTask) – Scheduled task object.

Returns

An instance of py:class:streamsets.sdk.sch_api.Command.

update_snowflake_pipeline_defaults(account_url=None, database=None, warehouse=None, schema=None, role=None)[source]#

Create or update the Snowflake pipeline defaults for this user.

Parameters
  • account_url (str, optional) – Snowflake account url. Default: None

  • database (str, optional) – Snowflake database to query against. Default: None

  • warehouse (str, optional) – Snowflake warehouse. Default: None

  • schema (str, optional) – Schema used. Default: None

  • role (str, optional) – Role used. Default: None

Returns

An instance of streamsets.sdk.sch_api.Command.

update_snowflake_user_credentials(username, snowflake_login_type, password=None, private_key=None, role=None)[source]#

Create or update the Snowflake user credentials.

Parameters
  • username (str) – Snowflake account username.

  • snowflake_login_type (str) – Snowflake login type to use. Options are password and privateKey.

  • password (str, optional) – Snowflake account password. Default: None

  • private_key (str, optional) – Snowflake account private key. Default: None

  • role (str, optional) – Snowflake role of the account. Default: None

Returns

An instance of streamsets.sdk.sch_api.Command.

update_subscription(subscription)[source]#

Update an existing Subscription.

Parameters

subscription (streamsets.sdk.sch_models.Subscription) – A Subscription instance.

Returns

An instance of streamsets.sdk.sch_api.Command.

update_user(user)[source]#
Update a user. Some user attributes are updated by ControlHub such as

last_modified_by, last_modified_on.

Parameters

user (streamsets.sdk.sch_models.User) – User object.

Returns

An instance of streamsets.sdk.sch_api.Command.

upgrade_job(*jobs)[source]#

Upgrade job(s) to latest pipeline version.

Parameters

*jobs – One or more instances of streamsets.sdk.sch_models.Job.

Returns

A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.Job.

upload_offset(job, offset_file=None, offset_json=None)[source]#

Upload offset for given job.

Parameters
  • job (streamsets.sdk.sch_models.Job) – Job object.

  • offset_file (file, optional) – File containing the offsets. Default: None. Exactly one of offset_file, offset_json should specified.

  • offset_json (dict, optional) – Contents of offset. Default: None. Exactly one of offset_file, offset_json should specified.

Returns

An instance of streamsets.sdk.sch_models.Job.

property users#

Users.

Returns

An instance of streamsets.sdk.sch_models.Users.

validate_pipeline(pipeline)[source]#

Validate pipeline. Only Transformer for Snowflake pipelines are supported at this time.

Parameters

pipeline (streamsets.sdk.sch_models.Pipeline) – Pipeline instance.

Raises

streamsets.sdk.exceptions.ValidationError – If validation fails.

validate_snowflake_account(account_url, mode, snowflake_login_type, username, password=None, private_key=None)[source]#

Start a snowflake account validation process.

Parameters
  • account_url (str) – Snowflake url where the account should exist.

  • mode (str) – Mode of the account. Possible values are “EDIT”, “PUBLISHED” and “CURRENT_CRED”.

  • snowflake_login_type (str) – Login type for the snowflake account. Possible values are “PASSWORD” and “PRIVATE_KEY”.

  • username (str) – Username of the snowflake account.

  • password (str, optional) – Password of the snowflake account. Default: None.

  • private_key (str, optional) – Private key of the snowflake account. Default: None.

Returns

An instance of streamsets.sdk.sch_api.Command.

verify_connection(connection, library=None)[source]#

Verify connection.

Parameters
Returns

An instance of streamsets.sdk.sch_models.ConnectionVerificationResult.

wait_for_job_metric(job, metric, value, timeout_sec=200)[source]#

Block until a job’s realtime summary reaches the desired value for the desired metric.

Parameters
  • job (streamsets.sdk.sch_models.Job) – The job instance.

  • metric (str) – The desired metric (e.g. 'output_record_count' or 'input_record_count').

  • value (int) – The desired value to wait for.

  • timeout (int, optional) – Timeout to wait for metric to reach value, in seconds. Default: streamsets.sdk.sch.DEFAULT_WAIT_FOR_METRIC_TIMEOUT.

Raises

TimeoutError – If timeout passes without metric reaching value.

wait_for_job_metrics_record_count(job, count, timeout_sec=200)[source]#

Block until a job’s metrics reaches the desired count.

Parameters
  • job (streamsets.sdk.sch_models.Job) – The job instance.

  • count (int) – The desired value to wait for.

  • timeout_sec (int, optional) – Timeout to wait for metric to reach count, in seconds. Default: streamsets.sdk.sch.DEFAULT_WAIT_FOR_METRIC_TIMEOUT.

Raises

TimeoutError – If timeout_sec passes without metric reaching value.

wait_for_job_sequence_status(job_sequence, expected_status, timeout_sec=300)[source]#

Block until the job sequence reaches the desired status.

Parameters
  • job_sequence (streamsets.sdk.sch_models.JobSequence) – The JobSequence instance.

  • expected_status (str) – The desired status to wait for.

  • timeout_sec (int, optional) – Timeout to wait for job_sequence to reach expected_status, in seconds. Default: streamsets.sdk.sch.DEFAULT_WAIT_FOR_STATUS_TIMEOUT.

Raises
wait_for_job_status(job, status, check_failures=False, timeout_sec=300)[source]#

Block until a job reaches the desired status.

Parameters
  • job (streamsets.sdk.sch_models.Job) – The job instance.

  • status (str) – The desired status to wait for.

  • check_failures (bool, optional) – Flag to check for job exceptions. Default: False

  • timeout_sec (int, optional) – Timeout to wait for job to reach status, in seconds. Default: streamsets.sdk.sch.DEFAULT_WAIT_FOR_STATUS_TIMEOUT.

Raises

TimeoutError – If timeout_sec passes without job reaching status.

Models#

These models wrap and provide useful functionality for interacting with common SCH abstractions.

ACLs#

class streamsets.sdk.sch_models.ACL(*args, **kwargs)[source]#

Represents an ACL.

Parameters
resource_id#

Resource ID of the ACL.

Type

str

resource_owner#

Resource owner of the ACL.

Type

str

resource_created_time#

Creation time of the resource.

Type

str

resource_type#

Resource type of the ACL.

Type

str

last_modified_by#

Who the resource was last modified by.

Type

str

last_modified_on#

Last modified time of the resource.

Type

str

permissions#

A Collection of Permissions.

Type

streamsets.sdk.sch_models.Permissions

permission_builder#

A Permission Builder instance.

Type

streamsets.sdk.sch_models.ACLPermissionBuilder

add_permission(permission)[source]#

Add new permission to the ACL.

Parameters

permission (streamsets.sdk.sch_models.Permission) – A permission object.

Returns

An instance of streamsets.sdk.sch_api.Command

property permission_builder#

Get a permission builder instance with which a pipeline can be created.

Returns

An instance of streamsets.sdk.sch_models.ACLPermissionBuilder.

remove_permission(permission)[source]#

Remove a permission from ACL.

Parameters

permission (streamsets.sdk.sch_models.Permission) – A permission object.

Returns

An instance of streamsets.sdk.sch_api.Command

class streamsets.sdk.sch_models.ACLPermissionBuilder(permission, acl)[source]#

Class to help build the ACL permission.

Parameters
build(subject_id, subject_type, actions)[source]#

Method to help build the ACL permission.

Parameters
  • subject_id (str) – ID of the USER or GROUP.

  • subject_type (str) – Type of the subject. Accepted Values are ‘USER’, ‘GROUP’.

  • actions (list) – A list of actions of type str e.g. [‘READ’, ‘WRITE’, ‘EXECUTE’].

Returns

An instance of streamsets.sdk.sch_models.Permission.

class streamsets.sdk.sch_models.Permission(*args, **kwargs)[source]#

A container for a permission.

Parameters
  • permission (dict) – A Python object representation of a permission.

  • resource_type (str) – String representing the type of resource e.g. ‘JOB’, ‘PIPELINE’.

  • api_client (streamsets.sdk.sch_api.ApiClient) – An instance of ApiClient.

resource_id#

Id of the resource e.g. Pipeline or Job.

Type

str

subject_id#

Id of the subject e.g. user id 'admin@admin'.

Type

str

subject_type#

Type of the subject e.g. 'USER'.

Type

str

last_modified_by#

User who last modified this permission e.g. 'admin@admin'.

Type

str

last_modified_on#

Timestamp at which this permission was last modified e.g. 1550785079811.

Type

int

class VALID_ACTIONS(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]#

Alerts#

class streamsets.sdk.sch_models.Alert(*args, **kwargs)[source]#

Model for Alerts.

message#

The Alert’s message text.

Type

str

alert_status#

The status of the Alert.

Type

str

Parameters
acknowledge()[source]#

Acknowledge an active Alert.

delete()[source]#

Delete an acknowledged Alert.

Api Credentials#

class streamsets.sdk.sch_models.ApiCredential(*args, **kwargs)[source]#

Model for ApiCredential.

Parameters
  • api_credential (dict) – A Python object representation of ApiCredential.

  • control_hub (streamsets.sdk.sch.ControlHub) – ControlHub object. Default: None

active#

Whether the API Credential is active or not.

Type

bool

auth_token#

Auth token to be used while making an API call.

Type

str

created_by#

User that created this API Credential.

Type

str

created_for#

The user who should use the API Credential.

Type

str

credential_id#

Credential ID to be used while making an API call.

Type

str

name#

Name of the API Credential.

Type

str

class streamsets.sdk.sch_models.ApiCredentials(control_hub)[source]#

Collection of streamsets.sdk.sch_models.ApiCredential instances.

Parameters

control_hub (streamsets.sdk.sch.ControlHub) – ControlHub object.

class streamsets.sdk.sch_models.ApiCredentialBuilder(api_credential, control_hub)[source]#

Class with which to build instances of streamsets.sdk.sch_models.ApiCredential.

Instead of instantiating this class directly, most users should use

streamsets.sdk.sch.ControlHub.get_api_credential_builder().

Parameters
  • api_credential (dict) – Python object built from our Swagger ApiCredentialJson definition.

  • control_hub (streamsets.sdk.sch.ControlHub) – ControlHub object.

build(name, user_id=None)[source]#

Build the ApiCredential.

Parameters
  • name (str) – ApiCredential name.

  • user_id (str, optional) – User ID for whom the credentials should be created, can only be used as an Org-Admin.

Returns

An instance of streamsets.sdk.sch_models.ApiCredential.

Classifiers#

class streamsets.sdk.sch_models.Classifier(*args, **kwargs)[source]#

Classifier model.

patterns#

Classifier patterns.

Type

str

Parameters

classifier (dict) – A Python dict representation of classifier.

Classification Rules#

class streamsets.sdk.sch_models.ClassificationRule(*args, **kwargs)[source]#

Classification Rule Model.

Parameters
class streamsets.sdk.sch_models.ClassificationRuleBuilder(classification_rule, classifier)[source]#

Class with which to build instances of streamsets.sdk.sch_models.ClassificationRule.

Instead of instantiating this class directly, most users should use

streamsets.sdk.sch.ControlHub.get_classification_rule_builder().

Parameters
  • classification_rule (dict) – Python object defining a classification rule.

  • classifier (dict) – Python object defining a classifier.

add_classifier(patterns=None, match_with=None, regular_expression_type='RE2/J', case_sensitive=False)[source]#

Add classifier to the classification rule.

Parameters
  • patterns (list, optional) – List of strings of patterns. Default: None.

  • match_with (str, optional) – Default: None.

  • regular_expression_type (str, optional) – Default: 'RE2/J'.

  • case_sensitive (bool, optional) – Default: False.

Returns

An instance of streamsets.sdk.sch_models.Classifier.

build(name, category, score)[source]#

Build the classification rule.

Parameters
  • name (str) – Classification Rule name.

  • category (str) – Classification Rule category.

  • score (float) – Classification Rule score.

Returns

An instance of streamsets.sdk.sch_models.ClassificationRule.

Connections#

class streamsets.sdk.sch_models.Connection(*args, **kwargs)[source]#

Model for connection.

Parameters
connection_definition#

The Connection Definition JSON.

Type

dict

library_definition#

The Library Definition JSON.

Type

dict

pipeline_commits (:py:class:`streamsets.sdk.utils.SeekableList` of

streamsets.sdk.sch_models.PipelineCommit instances): Pipeline commits using this connection.

acl#

Update Connection ACL.

Type

streamsets.sdk.sch_models.ACL

tags#

Connection tags.

Type

streamsets.sdk.utils.SeekableList of instances of streamsets.sdk.sch_models.Tag

property acl#

Get Connection ACL.

Returns

An instance of streamsets.sdk.sch_models.ACL.

add_tag(*tags)[source]#

Add a tag

Parameters

*tags – One or more instances of str

property pipeline_commits#

Get the pipeline commits using this connection.

Returns

A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.PipelineCommit

instances.

remove_tag(*tags)[source]#

Remove a tag

Parameters

*tags – One or more instances of str

property tags#

Get the connection tags.

Returns

A streamsets.sdk.utils.SeekableList of instances of streamsets.sdk.sch_models.Tag.

class streamsets.sdk.sch_models.Connections(control_hub, organization)[source]#

Collection of streamsets.sdk.sch_models.Connection instances.

Parameters
class streamsets.sdk.sch_models.ConnectionBuilder(connection, control_hub)[source]#

Class with which to build instances of streamsets.sdk.sch_models.Connection.

Instead of instantiating this class directly, most users should use

streamsets.sdk.sch.ControlHub.get_connection_builder().

Parameters
  • connection (dict) – Python object built from our Swagger ConnectionJson definition.

  • control_hub (streamsets.sdk.sch.ControlHub) – ControlHub object.

build(title, connection_type, authoring_data_collector, tags=None)[source]#

Define the connection.

Parameters
  • title (str) – Connection title.

  • connection_type (str) – Type of connection. The options are ‘STREAMSETS_AWS_EMR_CLUSTER’, ‘STREAMSETS_MYSQL’, ‘STREAMSETS_SNOWFLAKE’, ‘STREAMSETS_COAP_CLIENT’, ‘STREAMSETS_OPC_UA_CLIENT’, ‘STREAMSETS_GOOGLE_PUB_SUB’, ‘STREAMSETS_MQTT’, ‘STREAMSETS_POSTGRES’, ‘STREAMSETS_GOOGLE_CLOUD_STORAGE’, ‘STREAMSETS_AWS_REDSHIFT’, ‘STREAMSETS_GOOGLE_BIG_QUERY’, ‘STREAMSETS_ORACLE’, ‘STREAMSETS_AWS_S3’, ‘STREAMSETS_REMOTE_FILE’, ‘STREAMSETS_SQLSERVER’, ‘STREAMSETS_AWS_SQS’, ‘STREAMSETS_SNOWPIPE’, and ‘STREAMSETS_JDBC’.

  • authoring_data_collector (streamsets.sdk.sch.DataCollector) – Authoring Data Collector.

  • tags (list, optional) – List of tags (strings). Default: None.

Returns

An instance of streamsets.sdk.sch_models.Connection.

class streamsets.sdk.sch_models.ConnectionVerificationResult(*args, **kwargs)[source]#

Model for connection verification result.

Parameters

connection_preview_json (dict) – dynamic preview API response JSON.

issue_count#

The count of the number of issues for the connection verification result.

Type

int

issue_message#

The message provided for the connection verification result.

Type

str

property issue_count#

The count of the number of issues for the connection verification result.

Returns

A int that represents the number of issues.

property issue_message#

The message provided for the connection verification result.

Returns

A str message detailing the response for the connection verification result.

DataCollectors#

class streamsets.sdk.sch_models.DataCollector(*args, **kwargs)[source]#

Model for Data Collector.

execution_mode#

True for Edge and False for SDC.

Type

bool

id#

Data Collectort id.

Type

str

labels#

Labels for Data Collector.

Type

list

last_validated_on#

Last validated time for Data Collector.

Type

str

reported_labels#

Reported labels for Data Collector.

Type

list

url#

Data Collector’s url.

Type

str

accessible#

Whether the Data Collector instance is accessible.

Type

bool

responding#

Whether the Data Collector instance is responding.

Type

bool

attributes#

Data Collector’s attributes.

Type

dict

attributes_updated_on#

When the Data Collector’s attributes were last updated on.

Type

str

authentication_token_generated_on#

When the Data Collector authentication token was generated.

Type

str

jobs#

The Data Collector’s jobs.

Type

streamsets.sdk.sch_models.Job

job_ids#

Data Collector’s job ids.

Type

str

registered_by#

Who registered the Data Collector.

Type

str

pipelines_committed#

Pipelines that have been committed.

Type

list of (str objects)

resource_thresholds#

DataCollector resource thresholds.

Type

str

acl#

DataCollector ACL.

Type

streamsets.sdk.sch_models.ACL

property accessible#

Returns a bool for whether the Data Collector instance is accessible.

property acl#

Get DataCollector ACL.

Returns

An instance of streamsets.sdk.sch_models.ACL.

property attributes#

Returns a dict of Data Collector attributes.

property attributes_updated_on#

Returns when the Data Collector attributes was updated.

property authentication_token_generated_on#

Returns when the Data Collector authentication token was generated.

property job_ids#

Returns the Data Collector Job ids.

property jobs#

Returns the Data Collector streamsets.sdk.sch_models.Job instances.

property pipelines_committed#

Pipelines that have been committed.

Returns

A list of Job IDs (str objects).

property registered_by#

Returns who the Data Collector was registered by.

property resource_thresholds#

Return DataCollector resource thresholds.

Returns

A dict of DataCollector thresholds named as “max_memory_used”, “max_cpu_load”

and “max_pipelines_running”

property responding#

Returns a bool for whether the Data Collector instance is responding.

Deployments#

class streamsets.sdk.sch_models.AzureVMDeployment(deployment, control_hub=None, **kwargs)[source]#

Model for Azure VM Deployment.

attach_public_ip#
Type

bool

azure_tags#

Azure tags to apply to all provisioned resources in this Azure deployment.

Type

dict

desired_instances#

Number of engine instances to deploy.

Type

int

key_pair_name#

SSH key pair.

Type

str

managed_identity#
Type

str

public_ssh_key#
Type

str

resource_group#
Type

str

ssh_key_source#

SSH key source.

Type

str

vm_size#
Type

str

zones#
Type

list

static get_ui_value_mappings()[source]#

This method returns a map for the values shown in the UI and internal constants to be used.

is_complete()[source]#

Checks if all required fields are set in this deployment.

class streamsets.sdk.sch_models.Deployment(deployment, control_hub=None, **kwargs)[source]#

Model for Deployment. This is an abstract class.

acl#

The Deployment ACL instance.

Type

streamsets.sdk.sch_models.ACL

created_by#

User that created this deployment.

Type

str

created_on#

Time at which this deployment was created.

Type

int

deployment_events#

Name of the deployment.

Type

DeploymentEvents

deployment_id#

Id of the deployment.

Type

str

deployment_name#

Name of the deployment.

Type

str

deployment_tags#

Raw deployment tags of the deployment.

Type

list

deployment_type#

Type of the deployment.

Type

str

desired_instances#

The Deployment desired number of instances.

Type

int

engine_configuration#

The Deployment Engine Configuration.

Type

DeploymentEngineConfiguration

engine_type#

The Deployment Engine type.

Type

str

engine_version#

The Deployment Engine Version.

Type

str

environment_id#

Enabled environment where engine will be deployed.

Type

list

last_modified_by#

User that last modified this deployment.

Type

str

last_modified_on#

Time at which this deployment was last modified.

Type

int

network_tags#

The Deployment network tags.

Type

list

scala_binary_version#

scala Binary Version.

Type

str

organization#

The Deployment organization.

Type

str

json_state#

State of the deployment - this is json field called state.

Type

str

state#

State of the deployment - displayed as state on UI.

Type

str

status#

Status of the deployment.

Type

str

status_detail#

Status detail of the deployment.

Type

str

registered_engines#

Registered Engines of the deployment.

Type

RegisteredEngines

tags#

Tags of the deployment.

Type

a streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.Tag instances

class ENGINE_TYPES(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]#
class TYPES(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]#
property acl#

Get deployment ACL.

Returns

An instance of streamsets.sdk.sch_models.ACL.

property deployment_events#

Get the events for a deployment.

Returns

An instance of streamsets.sdk.sch_models.DeploymentEvents

property engine_configuration#

The engine configuration of deployment engine. :returns: An instance of streamsets.sdk.sch_models.DeploymentEngineConfiguration.

property registered_engines#

Get the registered engines.

Returns

An instance of streamsets.sdk.sch_models.RegisteredEngines

property tags#

Get the tags.

Returns

A streamsets.sdk.utils.SeekableList of str instances.

class streamsets.sdk.sch_models.DeploymentBuilder(deployment, control_hub)[source]#

Class with which to build instances of streamsets.sdk.sch_models.Deployment.

Instead of instantiating this class directly, most users should use

streamsets.sdk.sch.ControlHub.get_deployment_builder().

Parameters
  • deployment (dict) – Python object built from our Swagger CspDeploymentJson definition.

  • control_hub (streamsets.sdk.sch.ControlHub) – ControlHub object.

build(deployment_name, engine_type, engine_version, environment, external_resource_location=None, engine_labels=None, max_cpu_load=80.0, max_memory_used=100.0, max_pipelines_running=1000000, deployment_tags=None, engine_build=None, scala_binary_version=None, **kwargs)[source]#

Define the deployment.

Parameters
  • deployment_name (str) – deployment name.

  • engine_type (str) – Type of engine to deploy.

  • engine_version (str) – Version of engine to deploy.

  • environment (streamsets.sdk.sch_models.Environment) – The environment instance.

  • external_resource_location (str, optional) – External Resource Location URL. Default: None.

  • engine_labels (list, optional) – List of labels (strings). Default: None.

  • max_cpu_load (float or int, optional) – Max CPU Load (percent). Default: 80.0.

  • max_memory_used (float or int, optional) – Max Memory Used (percent). Default: 100.0.

  • max_pipelines_running (int, optional) – Max Running Pipeline Count. Default: 1000000.

  • deployment_tags (list, optional) – List of tags (strings). Default: None.

  • engine_build (str, optional) – Build of engine to deploy. Default: None.

  • scala_binary_version (str, optional) – Scala binary version required in case of engine_type=’TF’ Default: None.

Returns

An instance of subclass of streamsets.sdk.sch_models.Deployment.

class streamsets.sdk.sch_models.DeploymentEngineAdvancedConfiguration(deployment_engine_advanced_config, deployment=None)[source]#

Model for advanced configuration in Deployment engine configuration.

data_collector_configuration#

The Data Collector Configuration.

Type

dict

transformer_configuration#

The Transformer Configuration.

Type

dict

credential_stores#
Type

dict

proxy_properties#

Proxy Properties for the Deployment engine configuration.

Type

dict

security_policy#

Security Policy for the Deployment engine configuration.

Type

dict

class streamsets.sdk.sch_models.DeploymentEngineConfiguration(*args, **kwargs)[source]#

Model for Deployment engine configuration.

advanced_configuration#
Type

str

aws_image_id#
Type

str

azure_image_id#
Type

str

created_by#

User that created this deployment.

Type

str

created_on#

Time at which this deployment was created.

Type

int

download_url#
Type

list

engine_type#

engine type.

Type

list

engine_version#

Engine version to deploy.

Type

list

id#

User that created this deployment.

Type

str

gcp_image_id#
Type

str

last_modified_by#

User that last modified this deployment.

Type

str

last_modified_on#

Time at which this deployment was last modified.

Type

int

scala_binary_version#

scala Binary Version.

Type

str

java_configuration#

Java Configuration instance.

Type

DeploymentEngineJavaConfiguration

stage_libs#

DeploymentStageLibraries instance.

Type

DeploymentStageLibraries

class streamsets.sdk.sch_models.DeploymentEngineConfigurations(control_hub)[source]#

Collection of streamsets.sdk.sch_models.DeploymentEngineConfiguration instances.

Parameters

control_hub – An instance of streamsets.sdk.sch.ControlHub.

class streamsets.sdk.sch_models.DeploymentEngineJavaConfiguration(deployment_engine_java_configuration, deployment=None)[source]#

Model for Deployment engine Java configuration.

id#
Type

str

java_options#
Type

str

java_memory_strategy (

object:ABSOLUTE/PERCENTAGE)

maximum_java_heap_size_in_mb#
Type

long

minimum_java_heap_size_in_mb#
Type

long

maximum_java_heap_size_in_percent#
Type

int

minimum_java_heap_size_in_percent#
Type

int

class streamsets.sdk.sch_models.DeploymentStageLibraries(values, deployment_config)[source]#

Wrapper class for the list of stage libraries in a deployment.

Parameters
append(value)[source]#

Append object to the end of the list.

extend(libs)[source]#

Extend list by appending elements from the iterable.

remove(value)[source]#

Remove first occurrence of value.

Raises ValueError if the value is not present.

class streamsets.sdk.sch_models.EC2Deployment(deployment, control_hub=None, **kwargs)[source]#

Model for EC2 Deployment.

ec2_instance_type#

Type EC2 engine instance.

Type

str

desired_instances#

No. of EC2 engine instances.

Type

int

instance_profile#

arn of instance profile created as an environment prerequisite.

Type

str

key_pair_name#

SSH key pair.

Type

str

aws_tags#

AWS tags to apply to all provisioned resources in this AWS deployment.

Type

dict

ssh_key_source#

SSH key source.

Type

str

tracking_url#
Type

str

is_complete()[source]#

Checks if all required fields are set in this deployment.

class streamsets.sdk.sch_models.GCEDeployment(deployment, control_hub=None, **kwargs)[source]#

Model for GCE Deployment.

block_project_ssh_keys#

Blocks the use of project-wide public SSH keys to access the provisioned VM instances.

Type

bool

desired_instances#

Number of engine instances to deploy.

Type

int

gcp_labels#

GCP labels to apply to all provisioned resources in your GCP project.

Type

dict

instance_service_account#

Instance service account.

Type

str

machine_type#

machine type to use for the provisioned VM instances.

Type

str

public_ssh_key#

full contents of public SSH key associated with each VM instance.

Type

str

region#

Region to provision VM instances in.

Type

str

tracking_url#

Tracking URL.

Type

str

zone#

GCP zone

Type

list

is_complete()[source]#

Checks if all required fields are set in this deployment.

class streamsets.sdk.sch_models.KubernetesDeployment(deployment, control_hub=None, **kwargs)[source]#

Model for Kubernetes Deployment.

kubernetes_labels#

Kubernetes labels to apply to all provisioned resources in your Kubernetes project.

Type

dict

is_complete()[source]#

Checks if all required fields are set in this deployment.

property kubernetes_labels#

Kubernetes labels for the deployment.

Returns

value pairs of labels.

Return type

A (dict) of key

class streamsets.sdk.sch_models.SelfManagedDeployment(deployment, control_hub=None, **kwargs)[source]#

Model for self managed Deployment.

class InstallMechanism(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]#
install_script(install_mechanism='DEFAULT', java_version=None)[source]#

Gets the installation script to be run for this deployment.

Parameters
  • install_mechanism (str, optional) – Possible values for install are “DEFAULT”, “BACKGROUND” and “FOREGROUND”. Default: DEFAULT.

  • java_version (str, optional) – Java Development Kit Version to be used. Example: “8”. Default: None.

Returns

An str instance of the installation command.

is_complete()[source]#

Checks if all required fields are set in this deployment.

DraftRun#

class streamsets.sdk.sch_models.DraftRun(*args, **kwargs)[source]#

Model for DraftRun.

commit_id#

Pipeline commit id.

Type

str

commit_label#

Pipeline commit label.

Type

str

created_by#

User that created this job.

Type

str

created_on#

Time at which this job was created.

Type

int

data_collector_labels#

Labels of the data collectors.

Type

list

delete_after_completion#

Flag that indicates if this job should be deleted after completion.

Type

bool

description#

Job description.

Type

str

enable_failover#

Flag that indicates if failover is enabled.

Type

bool

enable_time_series_analysis#

Flag that indicates if time series is enabled.

Type

bool

execution_mode#

True for Edge and False for SDC.

Type

bool

job_id#

Id of the job.

Type

str

job_name#

Name of the job.

Type

str

last_modified_by#

User that last modified this job.

Type

str

last_modified_on#

Time at which this job was last modified.

Type

int

number_of_instances#

Number of instances.

Type

int

pipeline_force_stop_timeout#

Timeout for Pipeline force stop.

Type

int

pipeline_id#

Id of the pipeline that is running the job.

Type

str

pipeline_name#

Name of the pipeline that is running the job.

Type

str

pipeline_rule_id#

Rule Id of the pipeline that is running the job.

Type

str

read_policy#

Read Policy of the job.

Type

streamsets.sdk.sch_models.ProtectionPolicy

runtime_parameters#

Run-time parameters of the job.

Type

str

static_parameters#

List of parameters wjat cannot be overriden.

Type

list

statistics_refresh_interval_in_millisecs#

Refresh interval for statistics in milliseconds.

Type

int

status#

Status of the job.

Type

string

snapshots#

Snapshots belonging to the draft run.

Type

streamsets.sdk.utils.SeekableList) of (streamsets.sdk.sch_models.Snapshot

capture_snapshot()[source]#

Generate a new snapshot.

Returns

An instance of streamsets.sdk.sch_api.Command.

get_logs(ending_offset=- 1)[source]#

Retrieve the logs from the engine.

Parameters

ending_offset (int, optional) – The offset to capture logs up until. Default: -1

Returns

A list of dict instances, one dictionary per log line.

remove_snapshot(snapshot)[source]#

Remove a snapshot.

Parameters

snapshot (streamsets.sdk.sch_models.Snapshot) – Snapshot object.

Returns

An instance of streamsets.sdk.sch_api.Command.

property snapshots#

Snapshots belonging to the draft run.

Returns

A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.Snapshot instances.

DraftRuns#

class streamsets.sdk.sch_models.DraftRuns(control_hub)[source]#

Collection of streamsets.sdk.sch_models.DraftRun instances.

Parameters

control_hub (streamsets.sdk.sch.ControlHub) – ControlHub object.

Engines#

class streamsets.sdk.sch_models.Engine(*args, **kwargs)[source]#

Model for Platform Engines.

acl#

The engine ACL instance.

Type

streamsets.sdk.sch_models.ACL

configuration#

The engine Configuration instance.

Type

streamsets.sdk.models.Configuration

id#

ID of the engine.

Type

str

organization#

Organization that the engine belongs to.

Type

str

engine_url#

URL registered for the engine.

Type

str

version#

Version of the engine.

Type

str

labels#

Labels for the engine.

Type

list of str instances

last_reported_time#

The last time the engine contacted StreamSets Platform.

Type

str

start_up_time#

The time the engine was started.

Type

str

edge#

Whether this is an Edge engine or not.

Type

bool

cpu_load#

The percent utilization of the CPU by the engine.

Type

float

memory_used#

The amount of memory used by the engine in MB.

Type

float

total_memory#

The total amount of memory configured for the engine in MB.

Type

float

running_pipelines#

The total number of pipelines running on the engine.

Type

int

responding#

Whether the engine is responding or not.

Type

bool

engine_type#

The type of engine.

Type

str

max_cpu_load#

The percentage limit on CPU load configured for this engine.

Type

int

max_memory_used#

The percentage limit on memory configured for this engine.

Type

int

max_pipelines_running#

The limit on the number of concurrent pipelines for this engine.

Type

int

deployment_id#

The ID of the deployment that the engine belongs to.

Type

str

directories#

The directories of the engine.

Type

dict

external_libraries#

The External Library of the engine.

Type

streamsets.sdk.sch_models.ExternalLibrary

memory_used_mb#

The memory used in MB by the engine.

Type

float

resources#

The External Resource of the engine.

Type

streamsets.sdk.sch_models.ExternalResource

responding#

Whether the engine is responding.

Type

bool

running_pipelines#

The running pipelines of the engine.

Type

list of streamsets.sdk.sch_models.Pipeline instances

running_pipelines_count#

The running pipelines count of the engine.

Type

int

total_memory_mb#

The total memory in MB of the engine.

Type

int

user_stage_libraries#

The user stage libraries of the engine.

Type

dict

add_external_libraries(stage_library, *libraries)[source]#

Add external libraries to the engine.

Parameters
  • stage_library (str) – The stage library that includes the stage requiring the external library.

  • *libraries – One or more file instances to add to an engine, in binary format.

add_resources(*resources)[source]#

Add resource files to the engine.

Parameters

*resources – One or more file instances to add to an engine, in binary format.

delete_external_libraries(*libraries)[source]#

Delete external libraries from the engine.

Parameters

*libraries – One or more streamsets.sdk.sch_models.ExternalLibrary instances to delete from the engine.

delete_resources(*resources)[source]#

Delete resource files on the engine.

Parameters

*resources – One or more streamsets.sdk.sch_models.ExternalResource instances to delete from the engine.

property external_libraries#

Get the external libraries for the engine.

Returns

A list of streamsets.sdk.sch_models.ExternalLibrary instances.

get_logs(ending_offset=- 1)[source]#

Retrieve the logs for the engine.

Parameters

ending_offset (int, optional) – The offset to capture logs up until. Default: -1

Returns

A list of dict instances, one dictionary per log line.

get_thread_dump()[source]#

Generate a thread dump for the engine.

Returns

A list of dict instances, one dictionary per thread.

refresh()[source]#

Retrieve the latest state of the Engine from Platform, and update the in-memory representation.

property resources#

Get the external resources for the engine.

Returns

A list of streamsets.sdk.sch_models.ExternalResource instances.

class streamsets.sdk.sch_models.Engines(control_hub)[source]#

Collection of streamsets.sdk.sch_models.Engine instances.

Parameters

control_hub (streamsets.sdk.sch.ControlHub) – ControlHub object.

Environments#

class streamsets.sdk.sch_models.AWSEnvironment(environment, control_hub=None, **kwargs)[source]#

Model for AWS Environment.

access_key_id#

AWS access key id to access AWS account. Required when credential_type is AWS_STATIC_KEYS.

Type

str

aws_tags#

AWS tags to apply to all provisioned resources in this AWS deployment.

Type

dict

credentials#

Credentials of the environment.

Type

dict

credential_type#

Credential Type of the environment.

Type

str

default_instance_profile#

aarn of default instance profile created as an environment prerequisite.

Type

dict

region#

AWS region where VPC is located in case of AWS environment.

Type

str

role_arn#

Role AR of the cross-account role created as an environment prerequisite in user’s AWS account. Required when credential_type is AWS_CROSS_ACCOUNT_ROLE.

Type

str

secret_access_key#

AWS secret access key to access AWS account. Required when credential_type is AWS_STATIC_KEYS.

Type

str

subnet_ids#

Range of Subnet IDs in the VPC.

Type

str

security_group_id#

Security group ID.

Type

str

vpc_id#

Id of the Amazon VPC created as an environment prerequisite in user’s AWS account.

Type

str

static get_ui_value_mappings()[source]#

This method returns a map for the values shown in the UI and internal constants to be used.

is_complete()[source]#

Checks if all required fields are set in this environment.

class streamsets.sdk.sch_models.AzureEnvironment(environment, control_hub=None, **kwargs)[source]#

Model for Azure Environment.

azure_tags#

Azure tags for this environment.

Type

dict

credential_type#

Credential type of the environment.

Type

str

client_id#

Client ID of the environment.

Type

str

client_secret#

Client Secret of the environment.

Type

str

credentials#

Credentials of the environment.

Type

dict

default_managed_identity#

default managed identity.

Type

str

default_resource_group#

default resource group.

Type

str

region#

Azure region where vnet is located.

Type

str

subnet_id#

Subnet ID in the vnet.

Type

str

subscription_id#

Subscription ID in the vnet.

Type

str

security_group_id#

Security group ID.

Type

str

tenet_id#

Tenet ID of the environment.

Type

str

vnet_id#

Id of the vnet.

Type

str

static get_ui_value_mappings()[source]#

This method returns a map for the values shown in the UI and internal constants to be used.

is_complete()[source]#

Checks if all required fields are set in this environment.

class streamsets.sdk.sch_models.Environment(environment, control_hub=None, **kwargs)[source]#

Model for Environment. This is an abstract class.

acl#

Environment ACL.

Type

streamsets.sdk.sch_models.ACL

tags#

Environment tags.

Type

streamsets.sdk.utils.SeekableList of str instances

allow_nightly_engine_builds#

Whether or not the environment allows use of nightly engine build.

Type

bool

created_by#

User that created this environment.

Type

str

created_on#

Time at which this environment was created.

Type

int

environment_id#

Id of the environment.

Type

str

environment_name#

Name of the environment.

Type

str

environment_tags#

Raw environment tags of the environment.

Type

list

environment_type#

Type of the environment.

Type

str

last_modified_by#

User that last modified this environment.

Type

str

last_modified_on#

Time at which this environment was last modified.

Type

int

json_state#

State of the environment - which is stored in json field called state.

Type

str

state#

State of the environment stateDisplayLabel - shown on the UI as state.

Type

str

status#

Status of the environment.

Type

str

status_detail#

Status detail of the environment.

Type

str

class TYPES(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]#
property acl#

Get environment ACL.

Returns

An instance of streamsets.sdk.sch_models.ACL.

property tags#

Get the tags.

Returns

A streamsets.sdk.utils.SeekableList of str instances.

class streamsets.sdk.sch_models.EnvironmentBuilder(environment, control_hub)[source]#

Class with which to build instances of streamsets.sdk.sch_models.Environment.

Instead of instantiating this class directly, most users should use

streamsets.sdk.sch.ControlHub.get_environment_builder().

Parameters
  • environment (dict) – Python object built from our Swagger CspEnvironmentJson definition.

  • control_hub (streamsets.sdk.sch.ControlHub) – ControlHub object.

build(environment_name, **kwargs)[source]#

Define the environment.

Parameters
  • environment_name (str) – environment name.

  • allow_nightly_builds (bool, optional) – Default: False.

  • allow_nightly_engine_builds (bool, optional) – Default: False.

  • environment_tags (list, optional) – List of tags (strings). Default: None.

Returns

An instance of subclass of streamsets.sdk.sch_models.Environment.

class streamsets.sdk.sch_models.GCPEnvironment(environment, control_hub=None, **kwargs)[source]#

Model for GCP Environment.

credentials#

Credentials of the environment.

Type

dict

account_key_json#

JSON file contents for the Service Account Key. Required when credential_type is GCP_SERVICE_ACCOUNT_KEY.

Type

str

project#

Project where VPC is located.

Type

str

gcp_labels#

GCP labels for this environment.

Type

dict

gcp_tags#

GCP tags to apply to all resources provisioned in GCP account.

Type

str

service_account_email#

Service account ID. Required when credential_type is GCP_SERVICE_ACCOUNT_IMPERSONATION.

Type

str

vpc_id#

Id of the GCP VPC created as an environment prerequisite in user’s GCP account.

Type

str

fetch_available_machine_types(zone_id)[source]#

Returns the available machine types for the given Environment and GCP Project and GCP Zone.

fetch_available_projects()[source]#

Returns the available projects for the given Environment.

fetch_available_regions()[source]#

Returns the available regions for the given Environment and GCP Project.

fetch_available_service_accounts()[source]#

Returns the available service accounts for the given Environment and GCP Project.

fetch_available_vpcs()[source]#

Returns the available Networks for the given Environment and GCP Project. Here it is called vpcs since on UI it is shown as VPC.

fetch_available_zones(region_id)[source]#

Returns the available zones for the given environment and GCP project and GCP region.

static get_ui_value_mappings()[source]#

This method returns a map for the values shown in the UI and internal constants to be used.

is_complete()[source]#

Checks if all required fields are set in this environment.

class streamsets.sdk.sch_models.KubernetesEnvironment(environment, control_hub=None, **kwargs)[source]#

Model for Kubernetes Environment.

agent_event_logs (:obj:`list` of

py:class:`streamsets.sdk.sch_models.KubernetesAgentEvent instances): Agent logs for the environment.

agent_java_options#

Java configuration options set for the Agent.

Type

str

agent_status#

Status of the Agent.

Type

str

agent_status_detail#

Details pertaining to the status of the Agent.

Type

str

agent_version#

Version of the Agent.

Type

str

allow_nightly_engine_builds#

Whether or not the environment allows use of nightly engine build. versions.

Type

bool

created_by#

ID of the user who created the environment.

Type

str

created_on#

Millisecond timestamp of when the environment was created.

Type

int

environment_id#

ID of the environment.

Type

str

environment_name#

Name of the environment.

Type

str

environment_tags#

Tags assigned to the environment.

Type

list of str instances

environment_type (

obj:`str): Type of the environment.

kubernetes_labels#

Kubernetes labels to apply to all resources provisioned in the Kubernetes namespace.

Type

dict

kubernetes_namespace#

Name of the Kubernetes namespace to provision the resources in.

Type

str

last_modified_by#

ID of the user who last modified the environment.

Type

str

last_modified_on#

Millisecond timestamp of the last time the environment was modified.

Type

int

state#

The current state of the environment.

Type

str

property agent_event_logs#

Get the Agent event logs for this environment.

Returns

An instance of streamsets.sdk.sch_models.KubernetesAgentEvents

property agent_java_options#

Get the Agent’s extra java options.

Returns

A str instance of the agent’s extra java options.

property agent_status#

Get the Agent’s status.

Returns

A str instance of the agent’s status.

property agent_status_detail#

Get the details of the Agent’s status

Returns

A str instance of the agent’s status detail.

apply_agent_yaml_command()[source]#

Gets the installation script to be run for this environment.

Returns

An str instance of the installation command.

property kubernetes_labels#

Kubernetes labels for the environment.

Returns

value pairs of labels.

Return type

A (dict) of key

class streamsets.sdk.sch_models.SelfManagedEnvironment(environment, control_hub=None, **kwargs)[source]#

Model for Self Managed Environment.

Transformers#

class streamsets.sdk.sch_models.Transformer(*args, **kwargs)[source]#

Model for Transformer.

execution_mode#
Type

str

id#

Transformer id.

Type

str

labels#

Labels for Transformer.

Type

list

last_validated_on#

Last validated time for Transformer.

Type

str

reported_labels#

Reported labels for Transformer.

Type

list

url#

Transformer’s url.

Type

str

version#

Transformer’s version.

Type

str

accessible#

Whether the Transformer instance is accessible.

Type

bool

attributes#

Transformer’s attributes.

Type

dict

attributes_updated_on#

When the Transformer’s attributes were last updated on.

Type

str

authentication_token_generated_on#

When the Transformer authentication token was generated.

Type

str

registered_by#

Who registered the Transformer.

Type

str

acl#

Transformer ACL.

Type

streamsets.sdk.sch_models.ACL

property accessible#

Returns a bool for whether the Transformer instance is accessible.

property acl#

Get Transformer ACL.

Returns

An instance of streamsets.sdk.sch_models.ACL.

property attributes#

Returns a dict of Transformer attributes.

property attributes_updated_on#

Returns when the Transformer attributes was updated.

property authentication_token_generated_on#

Returns when the Transformer authentication token was generated.

property registered_by#

Returns who the Transformer was registered by.

Groups#

class streamsets.sdk.sch_models.Group(*args, **kwargs)[source]#

Model for Group.

Parameters
  • group (dict) – A Python object representation of Group.

  • roles (dict) – A mapping of role IDs to role labels.

  • control_hub (streamsets.sdk.ControlHub, optional) – An instance of Control Hub. Default: None

created_by#

Creator of this group.

Type

str

created_on#

Creation time of this group.

Type

str

display_name#

Display name of this group.

Type

str

group_id#

ID of this group.

Type

str

last_modified_by#

Group last modified by.

Type

str

last_modified_on#

Group last modification time.

Type

str

organization#

Organization this group belongs to.

Type

str

roles#

A set of role labels.

Type

set

users#

Users that are a member of this group.

Type

streamsets.sdk.util.SeekableList of streamsets.sdk.sch_models.User instances

property roles#

Get the roles for the group.

property users#

Get the users that are members of the group.

class streamsets.sdk.sch_models.Groups(control_hub, roles, organization)[source]#

Collection of streamsets.sdk.sch_models.Group instances.

Parameters
  • control_hub – An instance of streamsets.sdk.sch.ControlHub.

  • roles (dict) – A mapping of role IDs to role labels.

  • organization (str) – Organization ID.

class streamsets.sdk.sch_models.GroupBuilder(group, roles, control_hub=None)[source]#

Class with which to build instances of streamsets.sdk.sch_models.Group.

Instead of instantiating this class directly, most users should use

streamsets.sdk.sch.ControlHub.get_group_builder().

Parameters
  • group (dict) – Python object built from our Swagger GroupJson definition.

  • roles (dict) – A mapping of role IDs to role labels.

  • control_hub (streamsets.sdk.sch.ControlHub, optional) – ControlHub object. Default: None

build(display_name, group_id=None, ldap_groups=None)[source]#

Build the group.

Parameters
  • display_name (str) – Group display name.

  • group_id (str, optional) – Group ID. Can only include letters, numbers, underscores and periods. Default: None.

  • ldap_groups (list, optional) – List of LDAP groups (strings).

Returns

An instance of streamsets.sdk.sch_models.Group.

Jobs#

class streamsets.sdk.sch_models.Job(*args, **kwargs)[source]#

Model for Job.

archived#

Flag that indicates if this job is archived.

Type

bool

commit_id#

Pipeline commit id.

Type

str

commit_label#

Pipeline commit label.

Type

str

created_by#

User that created this job.

Type

str

created_on#

Time at which this job was created.

Type

int

data_collector_labels#

Labels of the data collectors.

Type

list

delete_after_completion#

Flag that indicates if this job should be deleted after completion.

Type

bool

description#

Job description.

Type

str

destroyer#

Job destroyer.

Type

str

enable_failover#

Flag that indicates if failover is enabled.

Type

bool

enable_time_series_analysis#

Flag that indicates if time series is enabled.

Type

bool

execution_mode#

True for Edge and False for SDC.

Type

bool

job_deleted#

Flag that indicates if this job is deleted.

Type

bool

job_id#

Id of the job.

Type

str

job_name#

Name of the job.

Type

str

last_modified_by#

User that last modified this job.

Type

str

last_modified_on#

Time at which this job was last modified.

Type

int

number_of_instances#

Number of instances.

Type

int

pipeline_force_stop_timeout#

Timeout for Pipeline force stop.

Type

int

pipeline_id#

Id of the pipeline that is running the job.

Type

str

pipeline_name#

Name of the pipeline that is running the job.

Type

str

pipeline_rule_id#

Rule Id of the pipeline that is running the job.

Type

str

read_policy#

Read Policy of the job.

Type

streamsets.sdk.sch_models.ProtectionPolicy

runtime_parameters#

Run-time parameters of the job.

Type

str

static_parameters#

List of parameters that cannot be overriden.

Type

list

statistics_refresh_interval_in_millisecs#

Refresh interval for statistics in milliseconds.

Type

int

status#

Status of the job.

Type

string

history#

Status and run count history for the Job.

Type

streamsets.sdk.utils.SeekableList) of (streamsets.sdk.sch_models.JobStatus

template_run_history_list#

List of job template run history.

Type

list

write_policy#

Write Policy of the job.

Type

streamsets.sdk.sch_models.ProtectionPolicy

property acl#

Get job ACL.

Returns

An instance of streamsets.sdk.sch_models.ACL.

add_tag(*tags)[source]#

Add a tag

Parameters

*tags – One or more instances of str

property commit#

Get pipeline commit of the job.

Returns

An instance of streamsets.sdk.sch_models.PipelineCommit.

property committed_offsets#

Get the committed offsets for a given job id.

Returns

An instance of streamsets.sdk.sch_models.JobCommittedOffset.

get_run_logs()[source]#

Retrieve the logs for the last run of the job.

Returns

A list of dict instances, one dictionary per log line.

get_snowflake_generated_queries()[source]#

Retrieve the Snowflake generated queries of the last run of the job.

Returns

A list of dict instances, one dictionary per query.

property history#

Status and run count history for the Job.

property job_sequence#

Get the Job Sequence that this job is a part of.

Returns

An instance of streamsets.sdk.sch_models.JobSequence or None.

property latest_committed_offsets#

Get the latest committed offsets for a given job id.

Returns

A (dict) object.

property metrics#

The metrics from all runs of a Job.

Returns

A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.JobMetrics instances.

property pipeline#

Get the pipeline object corresponding to this job.

remove_tag(*tags)[source]#

Remove a tag

Parameters

*tags – One or more instances of str

property status#

Current job status.

property system_job#

Get the sytem Job for this job if exists.

Returns

An instance of streamsets.sdk.sch_models.Job.

property tag#

Get pipeline tag of the job.

Returns

An instance of streamsets.sdk.sch_models.PipelineTag.

property tags#

Get the job tags.

Returns

A streamsets.sdk.utils.SeekableList of instances of streamsets.sdk.sch_models.Tag.

time_series_metrics(metric_type, time_filter_condition='LAST_5M', **kwargs)[source]#

Get historic time series metrics for the job.

Parameters
  • metric_type (str) – metric type in {‘Record Count Time Series’, ‘Record Throughput Time Series’, ‘Batch Throughput Time Series’, ‘Stage Batch Processing Timer seconds’}.

  • time_filter_condition (str, optional) – Default: 'LAST_5M'.

Returns

An instance of streamsets.sdk.sch_models.JobTimeSeriesMetrics.

class streamsets.sdk.sch_models.Jobs(control_hub)[source]#

Collection of streamsets.sdk.sch_models.Job instances.

Parameters

control_hub (streamsets.sdk.sch.ControlHub) – ControlHub object.

count(status)[source]#

Get job counts by status.

Parameters

status (str) – Status of the jobs in {‘ACTIVE’, ‘INACTIVE’, ‘ACTIVATING’, ‘DEACTIVATING’, ‘INACTIVE_ERROR’, ‘ACTIVE_GREEN’, ‘ACTIVE_RED’, ‘’}

Returns

An instance of int indicating the count of jobs with specified status.

class streamsets.sdk.sch_models.JobBuilder(job, control_hub)[source]#

Class with which to build instances of streamsets.sdk.sch_models.Job.

Instead of instantiating this class directly, most users should use

streamsets.sdk.sch.ControlHub.get_job_builder().

Parameters

job (dict) – Python object built from our Swagger JobJson definition.

build(job_name, pipeline, job_template=False, runtime_parameters=None, pipeline_commit=None, pipeline_tag=None, pipeline_commit_or_tag=None, tags=None)[source]#

Build the job.

Parameters
  • job_name (str) – Name of the job.

  • pipeline (streamsets.sdk.sch_models.Pipeline) – Pipeline object.

  • job_template (boolean, optional) – Indicate if it is a Job Template. Default: False.

  • runtime_parameters (dict, optional) – Runtime Parameters for the Job or Job Template. Default: None.

  • pipeline_commit (streamsets.sdk.sch_models.PipelineCommit) – Default: ``None`, which resolves to the latest pipeline commit.

  • pipeline_tag (streamsets.sdk.sch_models.PipelineTag) – Default: ``None`, which resolves to the latest pipeline tag.

  • pipeline_commit_or_tag (str, optional) – Default: None, which resolves to the latest pipeline commit.

  • tags (list, optional) – Job tags. Default: None.

Returns

An instance of streamsets.sdk.sch_models.Job.

class streamsets.sdk.sch_models.JobMetrics(*args, **kwargs)[source]#

Model for job metrics.

error_count#

The number of error records generated by this run of the Job.

Type

int

error_records_per_sec#

The number of error records per second generated by this run of the Job.

Type

float

input_count#

The number of records ingested by this run of the Job.

Type

int

input_records_per_sec#

The number of records per second ingested by this run of the Job.

Type

float

output_count#

The number of records output by this run of the Job.

Type

int

output_records_per_sec#

The number of records output per second by the run of the Job.

Type

float

pipeline_version#

The version of the pipeline that was used in this Job run.

Type

str

run_count#

The count corresponding to this Job run.

Type

int

sdc_id#

The ID of the SDC instance on which this Job run was executed.

Type

str

stage_errors_count#

The number of stage error records generated by this run of the Job.

Type

int

stage_error_records_per_sec#

The number of stage error records generated per second by this run of the job.

Type

float

total_error_count#

The total number of both error records and stage errors generated by this run of the job.

Type

int

Parameters

metrics (dict) – Metrics counts in JSON format.

class streamsets.sdk.sch_models.JobOffset(*args, **kwargs)[source]#

Model for offset.

Parameters

offset (dict) – Offset in JSON format.

class streamsets.sdk.sch_models.JobRunEvent(*args, **kwargs)[source]#

Model for an event in a Job Run.

Parameters

event (dict) – Job Run Event in JSON format.

class streamsets.sdk.sch_models.JobStatus(*args, **kwargs)[source]#

Model for Job Status.

color#

Job status color.

Type

str

run_history#

(streamsets.sdk.utils.JobRunHistoryEvent): History of a particular job run.

Type

streamsets.sdk.utils.SeekableList

offsets#

(streamsets.sdk.utils.JobPipelineOffset): Offsets after the job run.

Type

streamsets.sdk.utils.SeekableList

Parameters
property color#

Get the color of the Job Status.

property offsets#

Get the offsets of the Job Status.

property run_history#

Get the run history of the Job Status.

class streamsets.sdk.sch_models.JobTimeSeriesMetric(*args, **kwargs)[source]#

Model for job metrics.

name#

Name of measurement.

Type

str

values#

Timeseries data.

Type

list

time_series#

Timeseries data with timestamp as key and metric value as value.

Type

dict

Parameters

metric (dict) – Metrics in JSON format.

property time_series#

Get the time series (dict).

class streamsets.sdk.sch_models.JobTimeSeriesMetrics(*args, **kwargs)[source]#

Model for job metrics.

input_records#

Appears when queried for ‘Record Count Time Series’ or ‘Record Throughput Time Series’.

Type

streamsets.sdk.sch_models.JobTimeSeriesMetric

output_records#

Appears when queried for ‘Record Count Time Series’ or ‘Record Throughput Time Series’.

Type

streamsets.sdk.sch_models.JobTimeSeriesMetric

error_records#

Appears when queried for ‘Record Count Time Series’ or ‘Record Throughput Time Series’.

Type

streamsets.sdk.sch_models.JobTimeSeriesMetric

batch_counter#

Appears when queried for ‘Batch Throughput Time Series’.

Type

streamsets.sdk.sch_models.JobTimeSeriesMetric

batch_processing_timer#

Appears when queried for ‘Batch Processing Timer seconds’.

Type

streamsets.sdk.sch_models.JobTimeSeriesMetric

Parameters

metrics (dict) – Metrics in JSON format.

class streamsets.sdk.sch_models.RuntimeParameters(runtime_parameters, job)[source]#

Wrapper for ControlHub job runtime parameters.

Parameters

Job Sequences#

class streamsets.sdk.sch_models.JobSequence(*args, **kwargs)[source]#

Model for Job Sequence

Parameters
id#

Job Sequence ID.

Type

str

name#

Job Sequence name.

Type

str

description#

Job Sequence description.

Type

str

organization#

Organization that the Job Sequence is a part of.

Type

str

created_by#

Who created the Job Sequence.

Type

str

create_time#

When the Job Sequence was created.

Type

str

last_modified_by#

Who the Job Sequence was last modified by.

Type

str

last_modified_time#

When the Job Sequence was last modified.

Type

str

start_time#

Start time of the Job Sequence.

Type

str

end_time#

End time of the Job Sequence.

Type

str

time_zone#

Job Sequence timezone.

Type

str

cron_tab_mask#

Cron tab mask of the Job Sequence.

Type

str

steps#

Job Sequence steps.

Type

str

status#

Job Sequence status.

Type

str

class JobSequenceStatusType(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]#
class LogLevel(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]#
class LogType(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]#
add_step_with_jobs(jobs, parallel_jobs=False, ignore_error=True)[source]#

Generate one or more Steps by providing new Job objects to be added to the Job Sequence.

Parameters
  • jobs (list of streamsets.sdk.sch_models.Job) – Jobs to be added to the Job Sequence as part of the step.

  • parallel_jobs (bool) – Whether the passed in Jobs will be a part of the same step or not. Default: False.

  • ignore_error (bool) – Whether to ignore Job errors or not. Default: True.

delete_history_logs(history_logs)[source]#

Delete history logs from the Job Sequence.

Parameters

history_logs (list of streamsets.sdk.sch_models.JobSequenceHistoryLog) – History Logs to delete.

get_history_log(log_type=None, log_level=None, last_run_only=None, run_id=None, from_date=None, to_date=None)[source]#

Get the history log of the Job Sequence.

Args:
log_type (str, optional): Accepted values are “SEQUENCE_START”, “SCHEDULER_TRIGGER_ERROR”,

“STEP_START”. Default: None.

log_level (str, optional): Accepted values are “INFO”, “WARN”, “ERROR”. Default: None. last_run_only (bool, optional): Whether to return History Log for the last run only. Default: None. run_id (str, optional): The desired Run ID to get the History Log for. Default: None. from_date (str, optional): The starting date from which we’d like to see the History Log for. Default: None. to_date (str, optional): The end date till which we’d like to see the History Log for. Default: None.

Returns

A SeekableList of streamsets.sdk.sch_models.JobSequenceHistoryLog objects.

property history_logs#

Get the history logs of the Job Sequence.

Returns

An instance of streamsets.sdk.sch_models.JobSequenceHistoryLogs.

mark_job_as_finished(job)[source]#

Send an event in Control Hub indicating that a job has finished.

Parameters

job (streamsets.sdk.sch_models.Job) – Job to mark as finished.

Returns

An instance of streamsets.sdk.sch_api.Command.

move_step(step, target_step_number, swap=False)[source]#

Move one Step to another index within the Job Sequence.

Parameters
  • step (streamsets.sdk.sch_models.Step) – The step to be moved.

  • target_step_number (int) – The step number to move the step to.

  • swap (bool) – Whether to swap the step with the step in the given index, if False, this will move

  • Default (the step to the index and shift the steps to the right.) – False.

remove_step(step)[source]#

Remove a Step from a Job Sequence.

Parameters

step – A streamsets.sdk.sch_models.Step instance.

property run_ids#

Get all the run IDs of the Job Sequence.

Returns

An list of int.

property steps#

The Steps that are part of the Job Sequence.

Returns

A (streamsets.sdk.utils.SeekableList) of (streamsets.sdk.sch_models.Step) objects.

class streamsets.sdk.sch_models.JobSequences(control_hub)[source]#

Collection of streamsets.sdk.sch_models.JobSequence instances.

Parameters

control_hub (streamsets.sdk.sch.ControlHub) – An instance of the ControlHub.

class streamsets.sdk.sch_models.JobSequenceBuilder(job_sequence, control_hub=None)[source]#

Class with which to build instances of streamsets.sdk.sch_models.JobSequence.

Instead of instantiating this class directly, most users should use

streamsets.sdk.sch.ControlHub.get_job_sequence_builder().

Parameters

job_sequence (dict) – Python object built from our Swagger PipelineJson definition.

add_start_condition(start_time=0, end_time=0, time_zone='UTC', crontab_mask='0/1 * 1/1 * ? *')[source]#

Add a start condition.

Parameters
  • start_time (int, optional) – Unix timestamp representing the start time for the Job Sequence. Default: 0.

  • end_time (int, optional) – Unix timestamp representing the end time for the Job Sequence. Default: 0.

  • time_zone (str, optional) – Time Zone of the Job Sequence. Default: UTC.

  • crontab_mask (str, optional) – Cron Tab Mask of the Job Sequence. Default: None.

build(name='Job Sequence', description=None)[source]#

Build the Job Sequence.

Parameters
  • name (str, optional) – Name of the Job Sequence. Default: Job Sequence.

  • description (str, optional) – End time for the Job Sequence. Default: None.

Returns

A streamsets.sdk.sch_models.JobSequence instances.

class streamsets.sdk.sch_models.JobSequenceHistoryLog(*args, **kwargs)[source]#

Wrapper class for representing JobSequence History Log.

Parameters

history_log (dict) – JSON representation of a History Log.

class streamsets.sdk.sch_models.Step(*args, **kwargs)[source]#

Model for Job Step.

id#

Step ID.

Type

str

job_id#

Job ID.

Type

str

sequence_id#

Job Sequence ID.

Type

str

job_name#

Job Sequence name.

Type

str

organization#

Organization that the Job Sequence is a part of.

Type

str

status#

Job Sequence status.

Type

str

step_number#

Job Sequence status.

Type

str

add_jobs(jobs, ignore_error=True)[source]#

List of Jobs to add to the Step.

Parameters
  • jobs (list of streamsets.sdk.sch_models.Job) – Job to add to the step.

  • ignore_error (bool) – Whether to ignore Job errors or not. Default: True.

remove_jobs(*jobs)[source]#

Removes Jobs from Step.

Parameters

jobs – One or more streamsets.sdk.sch_models.Job instances.

property step_jobs#

The Jobs that are part of the Step.

Returns

A (streamsets.sdk.utils.SeekableList) of (streamsets.sdk.sch_models.Job) objects.

Organizations#

class streamsets.sdk.sch_models.Organization(*args, **kwargs)[source]#

Model for Organization.

Parameters
  • organization (str) – Organization Id.

  • organization_admin_user (str, optional) – Default: None.

  • api_client (streamsets.sdk.sch_api.ApiClient, optional) – Default: None.

default_user_password_expiry_time_in_days#

Default expiry time of the user password for the organization.

Type

str

configuration#

Configuration for the organization.

Type

streamsets.sdk.models.Configuration

admin_user_id#

Organization admin’s user ID.

Type

str

created_by#

Who the Organization was created by.

Type

str

saml_intergration_enabled#

Whether SAML integration is enabled.

Type

bool

property configuration#

Get the streamsets.sdk.models.Configuration instance for the organization.

property default_user_password_expiry_time_in_days#

Get the default expiry time of the user password for the organization.

class streamsets.sdk.sch_models.Organizations(control_hub)[source]#

Collection of streamsets.sdk.sch_models.Organization instances.

class streamsets.sdk.sch_models.OrganizationBuilder(organization, organization_admin_user)[source]#

Class with which to build instances of streamsets.sdk.sch_models.Organization.

Instead of instantiating this class directly, most users should use

streamsets.sdk.sch.ControlHub.get_organization_builder().

Parameters

organization (dict) – Python object built from our Swagger UserJson definition.

build(id, name, admin_user_id, admin_user_display_name, admin_user_email_address, admin_user_ldap_user_name=None)[source]#

Build the organization.

Parameters
  • id (str) – Organization ID.

  • name (str) – Organization name.

  • admin_user_id (str) – User Id of the admin of this organization.

  • admin_user_display_name (str) – User display name of admin of this organization.

  • admin_user_email_address (str) – User email address of admin of this organization.

  • admin_user_ldap_user_name (str, optional) – LDAP username. Default: None.

Returns

An instance of streamsets.sdk.sch_models.Organization.

Pipelines#

class streamsets.sdk.sch_models.Pipeline(*args, **kwargs)[source]#

Model for Pipeline.

Parameters
  • builder (streamsets.sdk.sch_models.PipelineBuilder) – Pipeline Builder object.

  • control_hub (streamsets.sdk.sch.ControlHub, optional) – ControlHub object. Default: None.

  • engine_id (str) – The ID of the authoring engine for this pipeline.

  • library_definitions (dict, optional) – Library Definition in JSON format. Default: None.

  • pipeline (dict) – Pipeline in JSON format.

  • pipeline_definition (dict) – Pipeline Definition in JSON format.

  • rules_definition (dict) – Rules Definition in JSON format.

library_definitions#

Pipeline’s Library Definitions.

Type

dict

pipeline_definition#

Pipeline’s definitions.

Type

dict

commits#

Pipeline’s commits.

Type

list of streamsets.sdk.sch_models.PipelineCommit instances

tags#

Pipeline’s tags.

Type

list of streamsets.sdk.sch_models.Tag instances

configuration#

Pipeline’s configuration.

Type

streamsets.sdk.models.Configuration

acl#

Pipeline’s ACL.

Type

streamsets.sdk.sch_models.ACL

stages#

Pipeline’s Stages.

error_stage#

Pipeline’s Error Stage.

stats_aggregator_stage#

Pipeline’s States Aggregator Stage.

parameters#

Pipeline’s parameters.

Type

str

name#

Pipeline’s name.

Type

str

description#

Pipeline’s description.

Type

str

labels#

Pipeline’s labels.

Type

str

engine_id#

Pipeline’s SDC ID.

Type

str

property acl#

Get pipeline ACL.

Returns

An instance of streamsets.sdk.sch_models.ACL.

add_fragment(fragment, parameter_name_prefix=None)[source]#

Add a fragment to the pipeline.

Parameters
  • fragment (py:obj:streamsets.sdk.sch_models.Pipeline) – Fragment to add.

  • parameter_name_prefix (str, optional) – Prefix name for the parameters of fragment. Default: None.

Returns

An instance of streamsets.sdk.sdc_models.Stage.

add_label(*labels)[source]#

Add a label

Parameters

*labels – One or more instances of str

add_stage(label=None, name=None, type=None, library=None)[source]#

Add a stage to the pipeline.

When specifying a stage, either label or name must be used. type and library may also be used to select a particular stage if ambiguities exist. If type and/or library are omitted, the first stage definition matching the given label or name will be used.

Parameters
  • label (str, optional) – Stage label to use when selecting stage from definitions. Default: None.

  • name (str, optional) – Stage name to use when selecting stage from definitions. Default: None.

  • type (str, optional) – Stage type to use when selecting stage from definitions (e.g. origin, destination, processor, executor). Default: None.

  • library (str, optional) – Stage library to use when selecting stage from definitions. Default: None.

Returns

An instance of streamsets.sdk.sch_models.SchSdcStage

or streamsets.sdk.sch_models.SchStStage.

property commits#

Get commits for this pipeline.

Returns

A streamsets.sdk.utils.SeekableList of instances of streamsets.sdk.sch_models.PipelineCommit.

property configuration#

Get pipeline’s configuration.

Returns

An instance of streamsets.sdk.sch_models.Configuration.

property description#

Get the description of the Pipeline.

property error_stage#

Get the error stage of the Pipeline.

get_jobs_using_pipeline()[source]#

Get the jobs that are running on this pipeline.

Returns

An instance of streamsets.sdk.utils.SeekableList containing instances of streamsets.sdk.sch_models.Job that run on this pipeline.

property labels#

Get the pipeline labels.

Returns

A streamsets.sdk.utils.SeekableList of instances of streamsets.sdk.sch_models.PipelineLabel.

property library_definitions#

Get the Pipeline’s (dict) Library Definitions.

property name#

Get the name of the Pipeline.

property parameters#

Get the pipeline parameters.

Returns

A dict like, streamsets.sdk.sch_models.PipelineParameters object of parameter key-value pairs.

property pipeline_definition#

Get the Pipeline’s (dict) Pipeline Definitions.

remove_label(*labels)[source]#

Remove a label

Parameters

*labels – One or more instances of str

remove_stages(*stages)[source]#

Remove one or more stages from a Pipeline.

Parameters
  • stages (streamsets.sdk.sdc_models.Stage or streamsets.sdk.st_models.Stage) –

  • objects (One or more stage) –

property sdc_id#

Get the SDC ID of the Pipeline.

property stages#

Get the stages of the Pipeline.

property stats_aggregator_stage#

Get the stats aggregator stage of the Pipeline.

property tags#

Get tags for this pipeline.

Returns

A streamsets.sdk.utils.SeekableList of instances of streamsets.sdk.sch_models.PipelineTag.

class streamsets.sdk.sch_models.Pipelines(control_hub, organization)[source]#

Collection of streamsets.sdk.sch_models.Pipeline instances.

Parameters
class streamsets.sdk.sch_models.PipelineBuilder(pipeline, data_collector_pipeline_builder, control_hub=None, fragment=False, engine_start_up_time=0)[source]#

Class with which to build instances of streamsets.sdk.sch_models.Pipeline.

Instead of instantiating this class directly, most users should use

streamsets.sdk.sch.ControlHub.get_pipeline_builder().

Parameters
  • pipeline (dict) – Python object built from our Swagger PipelineJson definition.

  • data_collector_pipeline_builder (streamsets.sdk.sdc_models.PipelineBuilder) – Data Collector Pipeline Builder object.

  • control_hub (streamsets.sdk.sch.ControlHub) – Default: None.

  • fragment (boolean, optional) – Specify if a fragment builder. Default: False.

  • engine_start_up_time (int, optional) – Engine’s start up time. Default: 0.

add_stage(label=None, name=None, type=None, library=None)[source]#

Add a stage to the pipeline.

When specifying a stage, either label or name must be used. type and library may also be used to select a particular stage if ambiguities exist. If type and/or library are omitted, the first stage definition matching the given label or name will be used.

Parameters
  • label (str, optional) – Stage label to use when selecting stage from definitions. Default: None.

  • name (str, optional) – Stage name to use when selecting stage from definitions. Default: None.

  • type (str, optional) – Stage type to use when selecting stage from definitions (e.g. origin, destination, processor, executor). Default: None.

  • library (str, optional) – Stage library to use when selecting stage from definitions. Default: None.

Returns

An instance of streamsets.sdk.sch_models.SchSdcStage.

build(title='Pipeline', description='', labels=None, build_from_imported=False, **kwargs)[source]#

Build the pipeline.

Parameters
  • title (str, optional) – Title of the pipeline.

  • description (str, optional) – Description of the pipeline. Default: ````.

  • labels (list, optional) – List of pipeline labels of type str. Default: None.

  • build_from_imported (boolean, optional) – Whether we want to build a pipeline from an imported pipeline. Default: False

Returns

py:class`streamsets.sdk.sch_models.Pipeline`.

Return type

An instance of

import_pipeline(pipeline, commit_id_regeneration=True, regenerate_id=True, **kwargs)[source]#

Import a pipeline into the PipelineBuilder to use as a starting point based off of an existing Pipeline.

Parameters
  • pipeline( – py:class`streamsets.sdk.sch_models.Pipeline`): Pipeline object.

  • commit_id_regeneration (bool, optional) – Whether to use the imported pipeline’s commit ID. When set to False, the imported pipeline will be edited as-is without a fresh pipeline being created. Default: True

  • regenerate_id (bool, optional) – Whether to use the imported pipeline’s pipeline_id. When set to False, the imported pipeline’s pipeline_id will be used. Default: True

Returns

An instance of streamsets.sdk.sdc_models.PipelineBuilder.

remove_stage(stage)[source]#

Remove a stage from the pipeline builder.

Parameters

stage (streamsets.sdk.sdc_models.Stage) – Stage to disconnect.

class streamsets.sdk.sch_models.PipelineCommit(*args, **kwargs)[source]#

Model for pipeline commit.

Parameters
pipeline#

The commit’s pipeline.

Type

streamsets.sdk.sch_models.Pipeline

property pipeline#

Get the commits pipeline.

class streamsets.sdk.sch_models.PipelineLabel(*args, **kwargs)[source]#

Model for pipeline label.

Parameters

pipeline_label (dict) – Pipeline label in JSON format.

label#

The pipeline label.

Type

str

property label#

Get the pipeline label.

class streamsets.sdk.sch_models.PipelineLabels(control_hub, organization)[source]#

Collection of streamsets.sdk.sch_models.PipelineLabel instances.

Parameters
class streamsets.sdk.sch_models.PipelineParameters(pipeline)[source]#

Parameters for pipelines.

Parameters

pipeline (streamsets.sdk.sch_models.Pipeline) – Pipeline Instance.

update(parameters_dict)[source]#

Update existing parameters. Works similar to Python dictionary update.

Parameters

parameters_dict (dict) – Dictionary of key-value pairs to be used as parameters.

class streamsets.sdk.sch_models.StPipelineBuilder(pipeline, transformer_pipeline_builder, control_hub=None, fragment=False, engine_start_up_time=0)[source]#

Class with which to build instances of streamsets.sdk.sch_models.Pipeline.

Instead of instantiating this class directly, most users should use

streamsets.sdk.sch.ControlHub.get_pipeline_builder().

Parameters
  • pipeline (dict) – Python object built from our Swagger PipelineJson definition.

  • transformer_pipeline_builder (streamsets.sdk.sdc_models.PipelineBuilder) – Transformer Pipeline Builder object.

  • control_hub (streamsets.sdk.sch.ControlHub) – Default: None.

  • fragment (boolean, optional) – Specify if a fragment builder. Default: False.

  • engine_start_up_time (int, optional) – Engine’s start up time. Default: 0.

add_stage(label=None, name=None, type=None, library=None)[source]#

Add a stage to the pipeline.

When specifying a stage, either label or name must be used. type and library may also be used to select a particular stage if ambiguities exist. If type and/or library are omitted, the first stage definition matching the given label or name will be used.

Parameters
  • label (str, optional) – Transformer stage label to use when selecting stage from definitions. Default: None.

  • name (str, optional) – Transformer stage name to use when selecting stage from definitions. Default: None.

  • type (str, optional) – Transformer stage type to use when selecting stage from definitions (e.g. origin, destination, processor, executor). Default: None.

  • library (str, optional) – Transformer stage library to use when selecting stage from definitions. Default: None.

Returns

An instance of streamsets.sdk.sch_models.SchStStage.

build(title='Pipeline', description='', build_from_imported=False, **kwargs)[source]#

Build the pipeline.

Parameters
  • title (str) – Title of the pipeline.

  • description (str, optional) – Description of the pipeline. Default: ````.

  • build_from_imported (boolean, optional) – Whether we want to build a pipeline from an imported pipeline. Default: False

Returns

py:class`streamsets.sdk.sch_models.Pipeline`.

Return type

An instance of

import_pipeline(pipeline, commit_id_regeneration=True, regenerate_id=True, **kwargs)[source]#

Import a pipeline into the PipelineBuilder to use as a starting point based off of an existing Pipeline.

Parameters
  • pipeline( – py:class`streamsets.sdk.sch_models.Pipeline`): Pipeline object.

  • commit_id_regeneration (bool, optional) – Whether to use the imported pipeline’s commit ID. When set to False, the imported pipeline will be edited as-is without a fresh pipeline being created. Default: True

  • regenerate_id (bool, optional) – Whether to use the imported pipeline’s pipeline_id. When set to False, the imported pipeline’s pipeline_id will be used. Default: True

Returns

An instance of streamsets.sdk.sdc_models.PipelineBuilder.

remove_stage(stage)[source]#

Remove a stage from the pipeline builder.

Parameters

stage (streamsets.sdk.sdc_models.Stage) – Stage to disconnect.

Protection Methods#

class streamsets.sdk.sch_models.ProtectionMethod(stage)[source]#

Protection Method Model.

Parameters

stage (dict) – JSON representation of a stage.

class streamsets.sdk.sch_models.ProtectionMethodBuilder(pipeline_builder)[source]#

Class with which to build instances of streamsets.sdk.sch_models.ProtectionMethod.

Instead of instantiating this class directly, most users should use

streamsets.sdk.sch.ControlHub.get_protection_method_builder().

Parameters

pipeline_builder (streamsets.sdk.sch_models.PipelineBuilder) – Pipeline Builder object.

Protection Policies#

class streamsets.sdk.sch_models.ProtectionPolicy(*args, **kwargs)[source]#

Model for Protection Policy.

Parameters
procedures#

Procedures for the Protection Policy.

Type

PolicyProcedure

default_setting#

Sefault Setting for the Protection Policy.

Type

str

enactment#
Type

str

sampling#

Sampling for the Protection Policy.

Type

str

class streamsets.sdk.sch_models.ProtectionPolicies(control_hub)[source]#

Collection of streamsets.sdk.sch_models.ProtectionPolicy instances.

class streamsets.sdk.sch_models.ProtectionPolicyBuilder(control_hub, protection_policy, policy_procedure)[source]#

Class with which to build instances of streamsets.sdk.sch_models.ProtectionPolicy.

Instead of instantiating this class directly, most users should use

streamsets.sdk.sch.ControlHub.get_protection_policy_builder().

Parameters
  • protection_policy (dict) – Python object defining a protection policy.

  • policy_procedure (dict) – Python object defining a policy procedure.

class streamsets.sdk.sch_models.PolicyProcedure(*args, **kwargs)[source]#

Model for Policy Procedure.

Parameters

policy_procedure (dict) – JSON representation of Policy Procedure.

ProvisioningAgents#

class streamsets.sdk.sch_models.Deployment(deployment, control_hub=None, **kwargs)[source]#

Model for Deployment. This is an abstract class.

acl#

The Deployment ACL instance.

Type

streamsets.sdk.sch_models.ACL

created_by#

User that created this deployment.

Type

str

created_on#

Time at which this deployment was created.

Type

int

deployment_events#

Name of the deployment.

Type

DeploymentEvents

deployment_id#

Id of the deployment.

Type

str

deployment_name#

Name of the deployment.

Type

str

deployment_tags#

Raw deployment tags of the deployment.

Type

list

deployment_type#

Type of the deployment.

Type

str

desired_instances#

The Deployment desired number of instances.

Type

int

engine_configuration#

The Deployment Engine Configuration.

Type

DeploymentEngineConfiguration

engine_type#

The Deployment Engine type.

Type

str

engine_version#

The Deployment Engine Version.

Type

str

environment_id#

Enabled environment where engine will be deployed.

Type

list

last_modified_by#

User that last modified this deployment.

Type

str

last_modified_on#

Time at which this deployment was last modified.

Type

int

network_tags#

The Deployment network tags.

Type

list

scala_binary_version#

scala Binary Version.

Type

str

organization#

The Deployment organization.

Type

str

json_state#

State of the deployment - this is json field called state.

Type

str

state#

State of the deployment - displayed as state on UI.

Type

str

status#

Status of the deployment.

Type

str

status_detail#

Status detail of the deployment.

Type

str

registered_engines#

Registered Engines of the deployment.

Type

RegisteredEngines

tags#

Tags of the deployment.

Type

a streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.Tag instances

class ENGINE_TYPES(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]#
class TYPES(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]#
property acl#

Get deployment ACL.

Returns

An instance of streamsets.sdk.sch_models.ACL.

property deployment_events#

Get the events for a deployment.

Returns

An instance of streamsets.sdk.sch_models.DeploymentEvents

property engine_configuration#

The engine configuration of deployment engine. :returns: An instance of streamsets.sdk.sch_models.DeploymentEngineConfiguration.

property registered_engines#

Get the registered engines.

Returns

An instance of streamsets.sdk.sch_models.RegisteredEngines

property tags#

Get the tags.

Returns

A streamsets.sdk.utils.SeekableList of str instances.

class streamsets.sdk.sch_models.Deployments(control_hub, organization)[source]#

Collection of streamsets.sdk.sch_models.Deployment instances.

Parameters
class streamsets.sdk.sch_models.DeploymentBuilder(deployment, control_hub)[source]#

Class with which to build instances of streamsets.sdk.sch_models.Deployment.

Instead of instantiating this class directly, most users should use

streamsets.sdk.sch.ControlHub.get_deployment_builder().

Parameters
  • deployment (dict) – Python object built from our Swagger CspDeploymentJson definition.

  • control_hub (streamsets.sdk.sch.ControlHub) – ControlHub object.

build(deployment_name, engine_type, engine_version, environment, external_resource_location=None, engine_labels=None, max_cpu_load=80.0, max_memory_used=100.0, max_pipelines_running=1000000, deployment_tags=None, engine_build=None, scala_binary_version=None, **kwargs)[source]#

Define the deployment.

Parameters
  • deployment_name (str) – deployment name.

  • engine_type (str) – Type of engine to deploy.

  • engine_version (str) – Version of engine to deploy.

  • environment (streamsets.sdk.sch_models.Environment) – The environment instance.

  • external_resource_location (str, optional) – External Resource Location URL. Default: None.

  • engine_labels (list, optional) – List of labels (strings). Default: None.

  • max_cpu_load (float or int, optional) – Max CPU Load (percent). Default: 80.0.

  • max_memory_used (float or int, optional) – Max Memory Used (percent). Default: 100.0.

  • max_pipelines_running (int, optional) – Max Running Pipeline Count. Default: 1000000.

  • deployment_tags (list, optional) – List of tags (strings). Default: None.

  • engine_build (str, optional) – Build of engine to deploy. Default: None.

  • scala_binary_version (str, optional) – Scala binary version required in case of engine_type=’TF’ Default: None.

Returns

An instance of subclass of streamsets.sdk.sch_models.Deployment.

class streamsets.sdk.sch_models.ProvisioningAgent(*args, **kwargs)[source]#

Model for Provisioning Agent.

Parameters
  • provisioning_agent (dict) – A Python object representation of Provisioning Agent.

  • control_hub – An instance of streamsets.sdk.sch.ControlHub.

deployments#

ProvisioningAgent deployments.

Type

streamsets.sdk.sch_models.LegacyDeployments

acl#

ProvisioningAgent ACL

Type

streamsets.sdk.sch_models.ACL

property acl#

Get the ACL of a Provisioning Agent.

Returns

An instance of streamsets.sdk.sch_models.ACL.

property deployments#

Get the deployments associated with the Provisioning Agent.

Returns

A list of streamsets.sdk.sch_models.LegacyDeployment instances.

class streamsets.sdk.sch_models.ProvisioningAgents(control_hub, organization)[source]#

Collection of streamsets.sdk.sch_models.ProvisioningAgent instances.

Parameters

Roles#

class streamsets.sdk.sch_models.Roles(values, entity, role_label_to_id)[source]#

Wrapper class over the list of Roles.

Parameters
append(value)[source]#

The method appends a role or an iterable of roles

Parameters

value (str, iterable) – Role string or an iterable of roles

Returns

None

Raises

TypeError – This exception is raised if the value argument is not a str or iterable of str.

clear()[source]#

Remove all items from list.

extend(value)[source]#

Extend list by appending elements from the iterable.

insert(ind, value)[source]#

Insert object before index.

pop(value=- 1)[source]#

Remove and return item at index (default last).

Raises IndexError if list is empty or index is out of range.

remove(value)[source]#

The method removes a role or an iterable of roles

Parameters

value (str, iterable) – Role string or an iterable of roles

Returns

None

Raises

TypeError – This exception is raised if the value argument is not a str or iterable of str.

reverse()[source]#

Reverse IN PLACE.

sort()[source]#

Sort the list in ascending order and return None.

The sort is in-place (i.e. the list itself is modified) and stable (i.e. the order of two equal elements is maintained).

If a key function is given, apply it once to each list item and sort them, ascending or descending, according to their function values.

The reverse flag can be set to sort in descending order.

SAQL Searches#

class streamsets.sdk.sch_models.SAQLSearch(*args, **kwargs)[source]#

Model for SAQL Searches.

Parameters
id#

ID of the SAQL Search.

Type

str

orgId#

ID of the Organization.

Type

str

type#

Type of SAQL search.

Type

str

name#

Name of SAQL search.

Type

str

mode#

Mode of SAQL search.

Type

str

query#

SAQL search query.

Type

str

creator#

Creator of SAQL search.

Type

str

createTime#

Time of creation of SAQL search.

Type

str

lastModifiedBy#

Last author to modify SAQL search.

Type

str

lastModifiedOn#

Last Time of modification of SAQL search.

Type

str

favorite#

Represents whether an SAQL Search is marked as a favorite or not.

Type

boolean

Returns

An instance of streamsets.sdk.sch_models.SAQLSearch.

class JobType(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]#
class ModeType(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]#
class PipelineType(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]#
class streamsets.sdk.sch_models.SAQLSearches(control_hub, saql_search_type)[source]#

Collection of streamsets.sdk.sch_models.SAQLSearch instances.

Parameters
class streamsets.sdk.sch_models.SAQLSearchBuilder(saql_search, control_hub, mode='BASIC')[source]#

Class with which to build instances of streamsets.sdk.sch_models.SAQLSearch.

Instead of instantiating this class directly, most users should use

streamsets.sdk.sch.ControlHub.get_saql_search_builder().

Parameters
  • saql_search (dict) – Python object built from expected SAQL JSON structure

  • control_hub (streamsets.sdk.sch.ControlHub) – ControlHub object.

  • mode (str, optional) – mode of SAQL Search. Default: 'BASIC'

add_filter(property_name='name', property_operator='contains', property_value='', property_condition_combiner='AND')[source]#

Adds a filter to the SAQL Search Query.

Parameters
  • property_name (str, optional) – Default: 'name'

  • property_operator (str, optional) – Default: 'contains'

  • property_value (str, optional) – Default: ''

  • property_condition_combiner (str, optional) – Default: 'AND'

build(name)[source]#

Builder for SAQL Search.

Parameters

name (str) – Name of SAQL Search.

Returns

An instance of streamsets.sdk.sch_models.SAQLSearch.

Scheduler#

class streamsets.sdk.sch_models.ScheduledTask(*args, **kwargs)[source]#

Model for Scheduled Task.

Parameters
runs#

Scheduled Task Runs.

Type

streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.ScheduledTaskRun

audits (:py:obj:`streamsets.sdk.utils.SeekableList` of

streamsets.sdk.sch_models.ScheduledTaskAudit): Scheduled Task Audits.

acl#

ACL of a Scheduled Task.

Type

streamsets.sdk.sch_models.ACL

job_id#

Job ID for which this task is scheduled.

Type

str

property acl#

Get the ACL of a Scheduled Task.

Returns

An instance of streamsets.sdk.sch_models.ACL.

property audits#

Get Scheduled Task Audits.

Returns

A streamsets.sdk.utils.SeekableList of inherited instances of streamsets.sdk.sch_models.ScheduledTaskAudit.

property job_id#

Job ID for which this task is scheduled.

Returns

An instance of str.

property runs#

Get Scheduled Task Runs.

Returns

A streamsets.sdk.utils.SeekableList of inherited instances of streamsets.sdk.sch_models.ScheduledTaskRun.

class streamsets.sdk.sch_models.ScheduledTaskAudit(*args, **kwargs)[source]#

Scheduled Task Audit.

Parameters

run (dict) – JSON representation of scheduled task audit.

class streamsets.sdk.sch_models.ScheduledTaskBaseModel(*args, **kwargs)[source]#

Base Model for Scheduled Task related classes.

class streamsets.sdk.sch_models.ScheduledTaskBuilder(job_selection_types, control_hub)[source]#

Builder for Scheduled Task.

Instead of instantiating this class directly, most users should use

streamsets.sdk.sch.ControlHub.get_scheduled_task_builder().

Parameters
  • job_selection_types (dict) – JSON representation of job selection types.

  • control_hub (streamsets.sdk.sch.ControlHub) – ControlHub instance.

build(task_object, action='START', name=None, description=None, cron_expression='0 0 1/1 * ? *', time_zone='UTC', status='RUNNING', start_time=None, end_time=None, missed_execution_handling='IGNORE')[source]#

Builder for Scheduled Task.

Parameters
  • task_object (streamsets.sdk.sch_models.Job) – (streamsets.sdk.sch_models.ReportDefinition): Job or ReportDefinition object.

  • action (str, optional) – One of the {‘START’, ‘STOP’, ‘UPGRADE’} actions. Default: START.

  • name (str, optional) – Name of the task. Default: None.

  • description (str, optional) – Description of the task. Default: None.

  • crontab_mask (str, optional) – Schedule in cron syntax. Default: "0 0 1/1 * ? *". (Daily at 12).

  • time_zone (str, optional) – Time zone. Default: "UTC".

  • status (str, optional) – One of the {‘RUNNING’, ‘PAUSED’} statuses. Default: RUNNING.

  • start_time (str, optional) – Start time of task. Default: None.

  • end_time (str, optional) – End time of task. Default: None.

  • missed_trigger_handling (str, optional) – One of {‘IGNORE’, ‘RUN ALL’, ‘RUN ONCE’}. Default: IGNORE.

Returns

An instance of streamsets.sdk.sch_models.ScheduledTask.

class streamsets.sdk.sch_models.ScheduledTaskRun(*args, **kwargs)[source]#

Scheduled Task Run.

Parameters

run (dict) – JSON representation if scheduled task run.

class streamsets.sdk.sch_models.ScheduledTasks(control_hub)[source]#

Collection of streamsets.sdk.sch_models.ScheduledTask instances.

SeekableList#

class streamsets.sdk.utils.SeekableList(iterable=(), /)[source]#
get(**kwargs)[source]#

Retrieve the first instance that matches the supplied arguments.

Parameters

**kwargs – Optional arguments to be passed to filter the results offline.

Returns

The first instance from the group that matches the supplied arguments.

get_all(**kwargs)[source]#

Retrieve all instances that match the supplied arguments.

Parameters

**kwargs – Optional arguments to be passed to filter the results offline.

Returns

A streamsets.sdk.utils.SeekableList of results that match the supplied arguments.

Snapshot#

class streamsets.sdk.sch_models.Snapshot(*args, **kwargs)[source]#

Model for Snapshot.

Parameters
  • snapshot (dict) – Python object representation of the snapshot.

  • draft_run (streamsets.sdk.sch.DraftRun) – DraftRun object.

batch_number#

The number of the batch that the snapshot captured.

Type

int

batches#

A list of streamsets.sdk.sdc_models.Batch instances.

Type

list

id#

The snapshot’s ID.

Type

str

name#

The snapshot’s name.

Type

str

pipeline_id#

The ID of the pipeline that the snapshot belongs to.

Type

str

time_stamp#

The creation date of the snapshot as a unix timestamp.

Type

int

Stage#

class streamsets.sdk.sch_models.SchSdcStage(stage, pipeline=None, output_streams=None, supported_connection_types=None)[source]#
add_output(*other_stages, event_lane=False, allow_empty_stages=False, output_lane_index=None)#

Connect output of this stage to another stage.

The __rshift__ operator (>>) has been overloaded to invoke this method.

Parameters
  • other_stages (streamsets.sdk.sdc_models.Stage) – Stage object.

  • event_lane (boolean, optional) – Default: False.

  • allow_empty_stages (bool, optional) – Disable the check for empty stages. Default: False.

  • output_lane_index (int, optional) – The index of the output lane to attach an output to. Default: None.

Returns

This stage as an instance of streamsets.sdk.sdc_models.Stage).

connect_inputs(stages=None, event_lane=False, target_stage_output_lane_index=None)#

Connect inputs of this stage.

Parameters
  • stages – A list of streamsets.sdk.sdc_models.Stage objects to connect as inputs. Default: None.

  • event_lane (boolean, optional) – Default: False.

  • target_stage_output_lane_index (int, optional) – Index of the target stage’s output lane to attach to. Default: None.

connect_outputs(stages=None, event_lane=False, output_lane_index=None)#

Connect outputs of this stage.

Parameters
  • stages – A list of streamsets.sdk.sdc_models.Stage objects to connect as outputs. Default: None.

  • event_lane (boolean, optional) – Default: False.

  • output_lane_index (int, optional) – The index of THIS stage’s output lane to attach an output to. Default: None.

copy_inputs(stage, override=False)#

Copies inputs of the given stage.

Parameters
  • stage – The streamsets.sdk.sdc_models.Stage object whose inputs we’d like to copy.

  • override (bool, optional) – Whether to completely wipe and override the current stage’s input_lanes with the stage that has been passed.

copy_outputs(stage)#

Copies outputs of the given stage.

Parameters

stage – The streamsets.sdk.sdc_models.Stage object whose outputs we’d like to copy.

property description#

The stage’s description.

Type

str

disconnect_input_lanes(stages=None, all_stages=False)#

Disconnect input lanes of this stage from the lanes of the stages that are provided.

Example:

stage_one.disconnect_inputs(stages=[stage_two, stage_three]) Will disconnect the incoming input lanes from stage_two and stage_three to stage_one.

Parameters
  • stages – A list of streamsets.sdk.sdc_models.Stage objects to disconnect. Default: None.

  • all_stages (bool, optional) – Disconnect all incoming input lanes to this stage. Default: False.

disconnect_output_lanes(stages=None, all_stages=False, output_lane_index=None)#

Disconnect output lanes of this stage from the lanes of the stages that are provided.

Example:

stage_one.disconnect_outputs(stages=[stage_two, stage_three]) Will disconnect the outgoing output lanes FROM stage_one to stage_two and stage_three.

Parameters
  • stages – A list of streamsets.sdk.sdc_models.Stage objects to disconnect. Default: None.

  • all_stages (bool, optional) – Disconnect all outgoing output lanes to this stage. Default: False.

  • output_lane_index (int, optional) – Index of this stage’s output lane to detach. Default: None.

property event_lanes#

Get the stage’s list of event lanes.

Returns

A list of event lanes.

property input_lanes#

Get the stage’s list of input lanes.

Returns

A list of input lanes.

property label#

The stage’s label.

Type

str

property library#

Get the stage’s library.

Returns

The stage library as a str.

property output_lanes#

Get the stage’s list of output lanes.

Returns

A list of output lanes.

set_attributes(**attributes)#

Set one or more stage attributes.

Parameters

**attributes – Attributes to set.

Returns

This stage as an instance of streamsets.sdk.sdc_models.Stage.

property stage_on_record_error#

The stage’s on record error configuration value.

property stage_record_preconditions#

The stage’s record preconditions configuration value.

property stage_required_fields#

The stage’s required fields configuration value.

use_connection(connection)#

Specify a Connection instance to use for this stage.

Parameters

connection – One streamsets.sdk.sch_models.Connection instance. Only one Connection per stage is currently supported.

class streamsets.sdk.sch_models.SchStStage(stage, pipeline=None, output_streams=None, supported_connection_types=None)[source]#
add_output(*other_stages, event_lane=False, output_lane_index=None)#

Connect output of this stage to another stage.

The __rshift__ operator (>>) has been overloaded to invoke this method.

Parameters
  • other_stages (streamsets.sdk.st_models.Stage) – Stage object.

  • event_lane (boolean, optional) – Default: False.

  • output_lane_index (int, optional) – The index of the output lane to attach an output to. Default: None.

Returns

This stage as an instance of streamsets.sdk.st_models.Stage).

connect_inputs(stages=None, event_lane=False, target_stage_output_lane_index=None)#

Connect inputs of this stage.

Parameters
  • stages – A list of streamsets.sdk.st_models.Stage objects to connect as inputs. Default: None.

  • event_lane (boolean, optional) – Default: False.

  • target_stage_output_lane_index (int, optional) – Index of the target stage’s output lane to attach to. Default: None.

connect_outputs(stages=None, event_lane=False, output_lane_index=None)#

Connect outputs of this stage.

Parameters
  • stages – A list of streamsets.sdk.st_models.Stage objects to connect as outputs. Default: None.

  • event_lane (boolean, optional) – Default: False.

  • output_lane_index (int, optional) – The index of THIS stage’s output lane to attach an output to. Default: None.

copy_inputs(stage, override=False)#

Copies inputs of the given stage.

Parameters
  • stage – The streamsets.sdk.sdc_models.Stage object whose inputs we’d like to copy.

  • override (bool, optional) – Whether to completely wipe and override the current stage’s input_lanes with the stage that has been passed.

copy_outputs(stage)#

Copies outputs of the given stage.

Parameters

stage – The streamsets.sdk.sdc_models.Stage object whose outputs we’d like to copy.

disconnect_input_lanes(stages=None, all_stages=False)#

Disconnect input lanes of this stage from the lanes of the stages that are provided.

Example:

stage_one.disconnect_inputs(stages=[stage_two, stage_three]) Will disconnect the incoming input lanes from stage_two and stage_three to stage_one.

Parameters
  • stages – A list of streamsets.sdk.st_models.Stage objects to disconnect. Default: None.

  • all_stages (bool, optional) – Disconnect all incoming input lanes to this stage. Default: False.

disconnect_output_lanes(stages=None, all_stages=False, output_lane_index=None)#

Disconnect output lanes of this stage from the lanes of the stages that are provided.

Example:

stage_one.disconnect_outputs(stages=[stage_two, stage_three]) Will disconnect the outgoing output lanes FROM stage_one to stage_two and stage_three. stage_one.disconnect_outputs(output_lane_index=0) Will disconnect the first output lane of this stage.

Parameters
  • stages – A list of streamsets.sdk.st_models.Stage objects to disconnect. Default: None.

  • all_stages (bool, optional) – Disconnect all outgoing output lanes to this stage. Default: False.

  • output_lane_index (int, optional) – Index of this stage’s output lane to detach. Default: None.

property event_lanes#

Get the stage’s list of event lanes.

Returns

A list of event lanes.

property input_lanes#

Get the stage’s list of input lanes.

Returns

A list of input lanes.

property label#

The stage’s label.

Type

str

property library#

Get the stage’s library.

Returns

The stage library as a str.

property output_lanes#

Get the stage’s list of output lanes.

Returns

A list of output lanes.

set_attributes(**attributes)#

Set one or more stage attributes.

Parameters

**attributes – Attributes to set.

Returns

This stage as an instance of streamsets.sdk.st_models.Stage.

property stage_on_record_error#

The stage’s on record error configuration value.

property stage_record_preconditions#

The stage’s record preconditions configuration value.

property stage_required_fields#

The stage’s required fields configuration value.

use_connection(connection)#

Specify a Connection instance to use for this stage.

Parameters

connection – One streamsets.sdk.sch_models.Connection instance. Only one Connection per stage is currently supported.

Subscriptions#

class streamsets.sdk.sch_models.Subscription(*args, **kwargs)[source]#

Subscription.

Parameters
events ( A :py:obj:`streamsets.sdk.utils.SeekableList` of

streamsets.sdk.sch_models.SubscriptionEvent instances): The Subscription’s events.

action#

Action of a Subscription.

Type

streamsets.sdk.sch_models.SubscriptionAction

acl#

ACL of a Subscription.

Type

streamsets.sdk.sch_models.ACL

property acl#

Get the ACL of an Event Subscription.

Returns

An instance of streamsets.sdk.sch_models.ACL.

property action#

Action of the Subscription.

property events#

Events of the Subscription.

class streamsets.sdk.sch_models.Subscriptions(control_hub)[source]#

Collection of streamsets.sdk.sch_models.Subscription instances.

Parameters

control_hub (streamsets.sdk.sch.ControlHub) – ControlHub object.

class streamsets.sdk.sch_models.SubscriptionAction(*args, **kwargs)[source]#

Action to take when the Subscription is triggered.

Parameters

action (dict) – JSON representation of an Action for a Subscription.

class streamsets.sdk.sch_models.SubscriptionAudit(*args, **kwargs)[source]#

Model for subscription audit.

Parameters

audit (dict) – JSON representation of a subscription audit.

class streamsets.sdk.sch_models.SubscriptionBuilder(subscription, control_hub)[source]#

Builder for Subscription.

Instead of instantiating this class directly, most users should use

streamsets.sdk.sch.ControlHub.get_subscription_builder().

Parameters
add_event(event_type, filter='')[source]#

Add event to the Subscription.

Parameters
  • event_type (str) – Type of event in {‘Job Status Change’, ‘Data SLA Triggered’, ‘Pipeline Committed’, ‘Pipeline Status Change’, ‘Report Generated’, ‘Data Collector not Responding’}.

  • filter (str, optional) – Filter to be applied on event. Default: "".

build(name, description=None)[source]#

Builder for Scheduled Task.

Parameters
  • name (str) – Name of Subscription.

  • description (str, optional) – Description of subscription. Default: None.

Returns

An instance of streamsets.sdk.sch_models.Subscription.

import_subscription(subscription)[source]#

Import an existing Subscription into the builder to update it.

Parameters

subscription (streamsets.sdk.sch_models.Subscription) – Subscription instance.

remove_event(event_type)[source]#

Remove event from the subscription.

Parameters

event_type (str) – Type of event in {‘Job Status Change’, ‘Data SLA Triggered’, ‘Pipeline Committed’, ‘Pipeline Status Change’, ‘Report Generated’, ‘Data Collector not Responding’}.

Returns

An instance of streamsets.sdk.sch_models.SubscriptionEvent.

set_email_action(recipients, subject=None, body=None, external_smtp_url=None, external_smtp_port=None)[source]#

Set the Email action.

Parameters
  • recipients (list) – List of email addresses.

  • subject (str, optional) – Subject of the email. Default: None.

  • body (str, optional) – Body of the email. Default: None.

  • external_smtp_url (str, optional) – SMTP URL to route. Default: None.

  • external_smtp_port (str, optional) – SMTP Port to route. Default: None.

set_webhook_action(uri, method='GET', content_type=None, payload=None, auth_type=None, username=None, password=None, timeout=30000, headers=None, bearer_token=None, api_key_key=None, api_key_value=None, api_key_location=None, token_url=None, client_id=None, client_secret=None, scope=None, location=None, resources=None, audiences=None)[source]#

Set the Webhook action.

Parameters
  • uri (str) – URI for the Webhook.

  • method (str, optional) – HTTP method to use. Default: 'GET'.

  • content_type (str, optional) – Content Type of the request. Default: None.

  • payload (str, optional) – Payload of the request. Default: None.

  • auth_type (str, optional) – 'basic' or None. Default: None.

  • username (str, optional) – Username for the authentication. Default: None.

  • password (str, optional) – Password for the authentication. Default: None.

  • timeout (int, optional) – Timeout for the Webhook action. Default: 30000.

  • headers (dict, optional) – Headers to be sent to the Webhook call. Default: None.

  • bearer_token (str, optional) – Bearer token for the authentication. Default: None.

  • token_url – (str, optional): Token URL for the authentication. Default: None.

  • api_key_key (str, optional) – API key name for the authentication. Default: None.

  • api_key_value (str, optional) – API key value for the authentication. Default: None.

  • api_key_location (str, optional) – Where to send the api key values. Default: None.

  • audiences – (list, optional): Audiences for OAuth2 auth method. Default: None.

  • resources – (list, optional): Resources for OAuth2 auth method. Default: None.

  • location – (str, optional): Location of OAuth2 credentials (body or header). Default: None.

  • scope – (str, optional): Scope for OAuth2 auth method. Default: None.

  • client_secret – (str, optional): Client secret for OAuth2 auth method. Default: None.

  • client_id – (str, optional): Client id for OAuth2 auth method. Default: None.

class streamsets.sdk.sch_models.SubscriptionEvent(*args, **kwargs)[source]#

An Event of a Subscription.

Parameters

Tags#

class streamsets.sdk.sch_models.Tag(*args, **kwargs)[source]#

Model for tag.

Parameters

tag (dict) – tag in JSON format.

id#

The full ID of the tag, in 'tag:organization' format.

Type

str

organization#

The ID of the organization that the tag belongs to.

Type

str

parent_id#

ID of the parent tag if this tag is a child.

Type

str

tag#

The tag’s label.

Type

str

property tag#

Get the Tag.

Topologies#

class streamsets.sdk.sch_models.DataSla(*args, **kwargs)[source]#

Model for DataSla.

Parameters
class streamsets.sdk.sch_models.DataSlaBuilder(data_sla, control_hub)[source]#

Class with which to build instances of streamsets.sdk.sch_models.DataSla.

Instead of instantiating this class directly, most users should use

streamsets.sdk.sch.ControlHub.get_data_sla_builder().

Parameters
  • data_sla (dict) – Python object built from our Swagger DataSlaJson definition.

  • control_hub (streamsets.sdk.sch.ControlHub) – An instance of the Control Hub.

build(topology, label, job, alert_text, qos_parameter='THROUGHPUT_RATE', function_type='Max', min_max_value=100, enabled=True)[source]#

Build the Data Sla.

Parameters
  • topology (streamsets.sdk.sch_models.Topology) – Topology object.

  • label (str) – Label for the SLA.

  • job (list) – List of streamsets.sdk.sch_models.Job objects.

  • alert_text (str) – Alert text.

  • qos_parameter (str, optional) – paramter in {‘THROUGHPUT_RATE’, ‘ERROR_RATE’}. Default: 'THROUGHPUT_RATE'.

  • function_type (str, optional) – paramter in {‘Max’, ‘Min’}. Default: 'Max'.

  • min_max_value (str, optional) – Default: 100.

  • enabled (boolean, optional) – Default: True.

Returns

An instance of streamsets.sdk.sch_models.DataSla.

class streamsets.sdk.sch_models.Topology(*args, **kwargs)[source]#

Model for Topology.

Parameters

topology (dict) – JSON representation of Topology.

acl#

Topology ACL.

Type

streamsets.sdk.sch_models.ACL

commit_id#

Pipeline commit id.

Type

str

commit_message#

Commit Message.

Type

str

commit_time#

Time at which commit was made.

Type

int

committed_by#

User that made the commit.

Type

str

data_slas#

Data SLAs currently part of the topology.

Type

streamsets.sdk.utils.SeekableList) of (streamsets.sdk.sch_models.DataSla

default_topology#

Default Topology.

Type

bool

description#

Topology description.

Type

str

draft#

Indicates whether this topology is a draft.

Type

bool

jobs#

Jobs currently part of the topology.

Type

streamsets.sdk.utils.SeekableList) of (streamsets.sdk.sch_models.Job

last_modified_by#

User that last modified this topology.

Type

str

last_modified_on#

Time at which this topology was last modified.

Type

int

new_pipeline_version_available#

Whether any job in the topology has a new pipeline version to be updated to.

Type

bool

nodes#

Nodes currently part of the topology.

Type

streamsets.sdk.utils.SeekableList) of (streamsets.sdk.sch_models.TopologyNode

organization#

Id of the organization.

Type

str

parent_version#

Version of the parent topology.

Type

str

topology_definition#

Definition of the topology.

Type

str

topology_id#

Id of the topology.

Type

str

topology_name#

Name of the topology.

Type

str

validation_issues#

Any validation issues that exist for this Topology.

Type

dict

version#

Version of this topology.

Type

str

acknowledge_job_errors()[source]#

Acknowledge all errors for the jobs in a topology.

Returns

An instance of streamsets.sdk.sch_api.Command.

property acl#

Get the ACL of a Topology.

Returns

An instance of streamsets.sdk.sch_models.ACL.

activate_data_sla(*data_slas)[source]#

Activate Data SLAs.

Parameters

*data_slas – One or more instances of streamsets.sdk.sch_models.DataSla.

Returns

An instance of streamsets.sdk.sch_api.Command.

add_data_sla(data_sla)[source]#

Add SLA.

Parameters

data_sla (streamsets.sdk.sch_models.DataSla) – Data SLA object.

Returns

An instance of streamsets.sdk.sch_api.Command.

auto_discover_connections()[source]#

Auto discover connecting systems between nodes in a Topology.

Returns

An instance of streamsets.sdk.sch_api.Command.

auto_fix()[source]#

Auto-fix a topology by rectifying invalid or removed jobs, outdated jobs, etc.

Returns

An instance of streamsets.sdk.sch_api.Command.

deactivate_data_sla(*data_slas)[source]#

Deactivate Data SLAs.

Parameters

*data_slas – One or more instances of streamsets.sdk.sch_models.DataSla.

Returns

An instance of streamsets.sdk.sch_api.Command.

delete_data_sla(*data_slas)[source]#

Delete Data SLAs.

Parameters

*data_slas – One or more instances of streamsets.sdk.sch_models.DataSla.

Returns

An instance of streamsets.sdk.sch_api.Command.

property jobs#

Get the jobs that are contained within the Topology.

Returns

A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.Job instances.

property new_pipeline_version_available#

Determine if a new pipeline version is available for any jobs in the Topology.

Returns

A (bool) value.

property nodes#

Get the job and system nodes that make up the Topology.

Returns

A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.TopologyNode

instances.

start_all_jobs()[source]#

Start all jobs of a topology.

Returns

An instance of streamsets.sdk.sch_api.Command.

stop_all_jobs(force=False)[source]#

Stop all jobs of a topology.

Parameters

force (bool, optional) – Force topology jobs to stop. Default: False.

Returns

An instance of streamsets.sdk.sch_api.Command.

update_jobs_to_latest_change()[source]#

Upgrade a topology’s job(s) to the latest corresponding pipeline change.

Returns

An instance of streamsets.sdk.sch_api.Command.

property validation_issues#

Get any validation issues that are detected for the Topology.

Returns

A (list) of validation issues in JSON format.

class streamsets.sdk.sch_models.Topologies(control_hub)[source]#

Collection of streamsets.sdk.sch_models.Topology instances. :param control_hub: An instance of the ControlHub. :type control_hub: streamsets.sdk.sch.ControlHub

class streamsets.sdk.sch_models.TopologyBuilder(topology, control_hub=None)[source]#

Class with which to build instances of streamsets.sdk.sch_models.Topology.

Instead of instantiating this class directly, most users should use

streamsets.sdk.sch.ControlHub.get_topology_builder().

topology_nodes#

(streamsets.sdk.sch_models.TopologyNode): Nodes currently part of the topology held by the TopologyBuilder.

Type

streamsets.sdk.utils.SeekableList

Parameters
  • topology (dict) – Python object built from our Swagger TopologyJson definition.

  • control_hub (streamsets.sdk.ControlHub, optional) – Control Hub instance. Default: None

add_job(job)[source]#

Add a job node to the Topology being built.

Parameters

job (streamsets.sdk.sch_models.Job) – An instance of a job to be added.

add_system(name)[source]#

Add a system node to the Topology being built.

Parameters

name (str) – The name of the system to add to the topology.

build(topology_name=None, description=None)[source]#

Build the topology.

Parameters
  • topology_name (str, optional) – Name of the topology. This parameter is required when building a new topology.

  • description (str, optional) – Description of the topology. Default: None.

Returns

An instance of streamsets.sdk.sch_models.Topology.

delete_node(topology_node)[source]#

Delete a system or job node from the topology.

Parameters

topology_node (streamsets.sdk.sch_models.TopologyNode) – An instance of a TopologyNode to delete from the topology.

import_topology(topology)[source]#

Import an existing topology to be used in the builder.

Parameters

topology (streamsets.sdk.sch_models.Topology) – An existing Topology instance to modify.

property topology_nodes#

Get all of the nodes currently part of the topology held by the TopologyBuilder.

Returns

A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.TopologyNode

instances.

class streamsets.sdk.sch_models.TopologyNode(*args, **kwargs)[source]#

Model for a node within a Topology.

Parameters

topology_node_json (dict) – JSON representation of a Topology Node.

node_type#

The type of this node, i.e. SYSTEM, JOB, etc.

Type

str

instance_name#

The name of this node instance.

Type

str

stage_name#

The name of the stage in this node.

Type

str

stage_version#

The version of the stage in this node.

Type

str

job_id#

The ID of the job in this node.

Type

str

pipeline_id#

The pipeline ID associated with the job in this node.

Type

str

pipeline_commit_id#

The commit ID of the pipeline.

Type

str

pipeline_version#

The version of the pipeline.

Type

str

input_lanes#

A list of str representing the input lanes for this node.

Type

list

output_lanes#

A list of str representing the output lanes for this node.

Type

list

name#

Name of the Topology Node.

Type

str

Users#

class streamsets.sdk.sch_models.User(*args, **kwargs)[source]#

Model for User.

Parameters
  • user (dict) – JSON representation of User.

  • roles (dict) – A mapping of role IDs to role labels.

  • control_hub (streamsets.sdk.ControlHub, optional) – An instance of Control Hub. Default: None

active#

Whether the user is active or not.

Type

bool

created_by#

Creator of this user.

Type

str

created_on#

Creation time of this user.

Type

str

display_name#

Display name of this user.

Type

str

email_address#

Email address of this user.

Type

str

id#

Id of this user.

Type

str

groups#

Groups this user belongs to.

Type

streamsets.sdk.util.SeekableList of streamsets.sdk.sch_models.Group instances

last_modified_by#

User last modified by.

Type

str

last_modified_on#

User last modification time.

Type

str

organization#

Organization ID that the user is part of.

Type

str

password_expires_on#

User’s password expiration time.

Type

str

password_system_generated#

Whether User’s password is system generated or not.

Type

bool

roles#

A set of role labels.

Type

set

saml_user_name#

SAML username of user.

Type

str

status#

The status of the user.

Type

str

property groups#

Get the group memberships for the user.

property roles#

Get the roles for the user.

property status#

Get the status of the user. Values that can be returned are either ‘ACTIVE’ or ‘DEACTIVATED’.

class streamsets.sdk.sch_models.Users(control_hub, roles, organization)[source]#

Collection of streamsets.sdk.sch_models.User instances.

Parameters
  • control_hub – An instance of streamsets.sdk.sch.ControlHub.

  • roles (dict) – A mapping of role IDs to role labels.

  • organization (str) – Organization ID.

class streamsets.sdk.sch_models.UserBuilder(user, roles, control_hub=None)[source]#

Class with which to build instances of streamsets.sdk.sch_models.User.

Instead of instantiating this class directly, most users should use

streamsets.sdk.sch.ControlHub.get_user_builder().

Parameters
  • user (dict) – Python object built from our Swagger UserJson definition.

  • roles (dict) – A mapping of role IDs to role labels.

  • control_hub – An instance of streamsets.sdk.sch.ControlHub. Default: None.

build(email_address)[source]#

Build the user.

Parameters

email_address (str) – User Email Address.

Returns

An instance of streamsets.sdk.sch_models.User.

Common#

Models used by StreamSets Data Collector and StreamSets Control Hub:

class streamsets.sdk.models.Configuration(configuration=None, compatibility_map=None, property_key='name', property_value='value', update_callable=None, update_callable_kwargs=None, id_to_remap=None)[source]#

Abstraction for configurations.

This class enables easy access to and modification of data stored as a list of dictionaries. A Configuration is stored in the form:

[{"name" : "<name_1>","value" : "<value_1>"}, {"name" : "<name_2>", value" : "<value_2>"},...]

However, the passed in configuration parameter can be a list of Configurations such as:

[[{"name" : "<name_1>","value" : "<value_1>"}, {"name" : "<name_2>", value" : "<value_2>"},...],
[{"name" : "<name_3>","value" : "<value_3>"}, {"name" : "<name_4>", value" : "<value_4>"},...],...]
Parameters
  • compatibility_map (dict, optional) – A dictionary mapping values used for backwards compatibility.

  • configuration (list) – List of configurations (see above for format).

  • property_key (str, optional) – The dictionary entry denoting the property key. Default: name

  • property_value (str, optional) – The dictionary entry denoting the property value. Default: value

  • update_callable (optional) – A callable to which self._data will be passed as part of __setitem__.

  • update_callable_kwargs (dict, optional) – A dictionary of kwargs to pass (along with a body) to the callable.

  • id_to_remap (dict, optional) – A dictionary mapping configuration IDs to human-readable container keys. Example: {‘custom_region’:’googleCloudConfig.customRegion’, … }

get(key, default=None)[source]#

Return the value of key or, if not in the configuration, the default value.

items()[source]#

Gets the configuration’s items.

Returns

A new view of the configuration’s items ((key, value) pairs).

update(configs)[source]#

Update instance with a collection of configurations.

Parameters

configs (dict) – Dictionary of configurations to use.

Exceptions#

Common exceptions.

exception streamsets.sdk.exceptions.BadRequestError(response)[source]#

Bad request error (HTTP 400).

exception streamsets.sdk.exceptions.ConnectError(response)[source]#

Pipeline connect error.

exception streamsets.sdk.exceptions.ConnectionError(code, message)[source]#

Connection Catalog errors.

exception streamsets.sdk.exceptions.EnginelessError(message)[source]#

Publishing an engineless designed pipeline error

exception streamsets.sdk.exceptions.InternalServerError(response)[source]#

Internal server error.

exception streamsets.sdk.exceptions.InvalidCredentialsError(message)[source]#

Invalid credentials error.

exception streamsets.sdk.exceptions.InvalidError(response)[source]#

Invalid stage config error.

exception streamsets.sdk.exceptions.InvalidVersionError(input)[source]#

The Version number could not be parsed.

exception streamsets.sdk.exceptions.JobInactiveError(message)[source]#

Job status changed into INACTIVE_ERROR.

exception streamsets.sdk.exceptions.JobRunnerError(code, message)[source]#

JobRunner errors.

exception streamsets.sdk.exceptions.LegacyDeploymentInactiveError(message)[source]#

Legacy deployment status changed into INACTIVE_ERROR.

exception streamsets.sdk.exceptions.MultipleIssuesError(errors)[source]#

Multiple errors were returned.

exception streamsets.sdk.exceptions.RunError(response)[source]#

Pipeline running error.

exception streamsets.sdk.exceptions.RunningError(response)[source]#

Pipeline run error.

exception streamsets.sdk.exceptions.ServiceDefinitionNotFound(message)[source]#

ServiceDefinition errors.

exception streamsets.sdk.exceptions.StartError(response)[source]#

Pipeline start error.

exception streamsets.sdk.exceptions.StartingError(response)[source]#

Pipeline starting error.

exception streamsets.sdk.exceptions.StatusError(response)[source]#

Parent class for pipeline status errors.

exception streamsets.sdk.exceptions.TopologyIssuesError(issues)[source]#

Topology has some issues.

exception streamsets.sdk.exceptions.UnprocessableEntityError(response)[source]#

Unprocessable Entity Error (HTTP 422).

exception streamsets.sdk.exceptions.UnsupportedMethodError(message)[source]#

An unsupported method was called.

exception streamsets.sdk.exceptions.ValidationError(issues)[source]#

Validation issues.