StreamSets Control Hub
Section Contents
StreamSets Control Hub#
Main interface#
This is the main entry point used by users when interacting with SCH instances.
- class streamsets.sdk.ControlHub(credential_id, token, use_websocket_tunneling=True, **kwargs)[source]#
Class to interact with StreamSets Control Hub.
- Parameters
credential_id (
str
) – ControlHub credential ID.token (
str
) – ControlHub token.use_websocket_tunneling (
bool
, optional) – use websocket tunneling. Default:True
.
- version#
Version of ControlHub.
- Type
str
- ldap_enabled#
Indication if LDAP is enabled or not.
- Type
bool
- organization_global_configuration#
Organization’s Global Configuration instance.
- Type
:py:class:`streamsets.sdk.models.Configuration`
- pipeline_labels#
Pipeline Labels.
- users#
Organization’s Users.
- login_audits#
ControlHub Login Audits.
- Type
streamsets.sdk.sch_models.LoginAudits
- action_audits#
ControlHub Action Audits.
- Type
streamsets.sdk.sch_models.ActionAudits
- connection_audits#
ControlHub Connection Audits.
- Type
streamsets.sdk.sch_models.ConnectionAudits
- groups#
ControlHub Groups.
- data_collectors#
Data Collector instances registered under ControlHub.
- Type
Returns a
streamsets.sdk.utils.SeekableList
ofstreamsets.sdk.sch_models.DataCollector
instances.
- transformers#
Transformer instances registered under ControlHub.
- Type
Returns a
streamsets.sdk.utils.SeekableList
ofstreamsets.sdk.sch_models.Transformer
instances.
- provisioning_agents (
py:class:`streamsets.sdk.sch_models.ProvisioningAgents): Provisioning Agents registered to the ControlHub instance.
- legacy_deployments#
LegacyDeployments instances.
- Type
streamsets.sdk.sch_models.LegacyDeployments
- organizations#
All the organizations that the current user belongs to.
- api_credentials#
ControlHub Api Credentials.
- pipelines#
ControlHub Pipelines.
- draft_runs#
ControlHub Draft Runs.
- jobs#
ControlHub Jobs.
- data_protector_enabled (
bool
): Whether Data Protector is enabled for the current organization.
- connection_tags#
ControlHub Connection Tags.
- Type
streamsets.sdk.sch_models.ConnectionTags
- alerts#
ControlHub Alerts.
- protection_policies#
ControlHub Protection Policies.
- scheduled_tasks#
ControlHub Scheduled Tasks.
- subscriptions#
ControlHub Subscriptions.
- subscription_audits#
ControlHub Subscription audits.
- topologies#
ControlHub Topologies.
- connections#
ControlHub Connections.
- environments#
ControlHub Environments.
- Type
streamsets.sdk.sch_models.Environments
- engine_versions#
ControlHub Deployment Engine Configuration.
- deployments#
ControlHub Deployments.
- engines#
ControlHub Engines.
- saql_saved_searches_pipeline#
ControlHub SAQL Searches for type Pipeline.
- saql_saved_searches_fragment#
ControlHub SAQL Searches for type Fragment.
- saql_saved_searches_job_instance#
ControlHub SAQL Searches for type Job Instance.
- saql_saved_searches_job_template#
ControlHub SAQL Searches for type Job Template.
- saql_saved_searches_draft_run#
ControlHub SAQL Searches for type Draft Run.
- acknowledge_event_subscription_error(subscription)[source]#
Acknowledge an error on given Event Subscription.
- Parameters
subscription (
streamsets.sdk.sch_models.Subscription
) – A Subscription instance.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- acknowledge_job_error(*jobs)[source]#
Acknowledge errors for one or more jobs.
- Parameters
*jobs – One or more instances of
streamsets.sdk.sch_models.Job
.
- acknowledge_legacy_deployment_error(*deployments)[source]#
Acknowledge errors for one or more deployments.
- Parameters
*deployments – One or more instances of
streamsets.sdk.sch_models.LegacyDeployment
.
- property action_audits#
Action Audits.
- Returns
An instance of
streamsets.sdk.sch_models.ActionAudits
.
- activate_api_credential(api_credential)[source]#
Activate an api credential.
- Parameters
api_credential (
streamsets.sdk.sch_models.ApiCredential
) – ApiCredential object.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- activate_engine(engine)[source]#
Activate an engine.
- Parameters
engine (
streamsets.sdk.sch_models.Engine
) – Engine instance.
- activate_environment(*environments, timeout_sec=300)[source]#
Activate environments.
- Parameters
*environments – One or more instances of
streamsets.sdk.sch_models.Environment
.- Returns
An instance of
streamsets.sdk.sch_api.Command
or None.
- activate_provisioning_agent(provisioning_agent)[source]#
Activate provisioning agent.
- Parameters
provisioning_agent (
streamets.sdk.sch_models.ProvisioningAgent
) –- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- activate_user(*users)[source]#
Activate users for all given User IDs.
- Parameters
*users – One or more instance of
streamsets.sdk.sch_models.User
.- Returns
An instance of
streamsets.sdk.aster_api.Command
.
- add_api_credential(api_credential)[source]#
- Add an api credential. Some api credential attributes are updated by ControlHub such as
created_by.
- Parameters
api_credential (
streamsets.sdk.sch_models.ApiCredential
) – An API credential instance, built via thestreamsets.sdk.sch_models.ApiCredentialBuilder.build()
method.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- add_classification_rule(classification_rule, commit=False)[source]#
Add a classification rule.
- Parameters
classification_rule (
streamsets.sdk.sch_models.ClassificationRule
) – Classification Rule object.commit (
bool
, optional) – Whether to commit the rule after adding it. Default:False
.
- add_connection(connection)[source]#
Add a connection.
- Parameters
connection (
streamsets.sdk.sch_models.Connection
) – Connection object.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- add_deployment(deployment)[source]#
Add a deployment.
- Parameters
deployment (
streamsets.sdk.sch_models.Deployment
) – Deployment object.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- add_environment(environment)[source]#
Add an environment.
- Parameters
environment (
streamsets.sdk.sch_models.Environment
) – Environment object.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- add_group(group)[source]#
Add a group.
- Parameters
group (
streamsets.sdk.sch_models.Group
) – Group object.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- add_job(job)[source]#
Add a job.
- Parameters
job (
streamsets.sdk.sch_models.Job
) – Job object.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- add_legacy_deployment(deployment)[source]#
Add a legacy deployment.
- Parameters
deployment (
streamsets.sdk.sch_models.LegacyDeployment
) – Deployment object.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- add_organization(organization)[source]#
Add an organization.
- Parameters
organization (
streamsets.sdk.sch_models.Organization
) – Organization object.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- add_protection_policy(protection_policy)[source]#
Add a protection policy.
- Parameters
protection_policy (
streamsets.sdk.sch_models.ProtectionPolicy
) – Protection Policy object.
- add_report_definition(report_definition)[source]#
Add Report Definition to ControlHub.
- Parameters
report_definition (
streamsets.sdk.sch_models.ReportDefinition
) – Report Definition instance.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- add_scheduled_task(task)[source]#
Add the scheduled task to Control Hub.
- Parameters
task (
streamsets.sdk.sch_models.ScheduledTask
) – Scheduled task object.- Returns
An instance of
streamsets.sdk.sch_api.Command
.- Raises
Exception – Thrown if publishing the task was unsuccessful
- add_subscription(subscription)[source]#
Add Subscription to ControlHub.
- Parameters
subscription (
streamsets.sdk.sch_models.Subscription
) – A Subscription instance.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- property alerts#
Alerts.
- property api_credentials#
Api Credentials.
- assign_administrator_role(user)[source]#
Designate the specified user as an Organization Administrator.
- Parameters
user (
streamsets.sdk.sch_models.User
) – User object.- Returns
An instance of
streamsets.sdk.aster_api.Command
.
- balance_engines(*engines)[source]#
Balance all jobs running on given Engine.
- Parameters
*engines – One or more instances of
streamsets.sdk.sch_models.Engine
.
- balance_job(*jobs)[source]#
Balance one or more jobs.
- Parameters
*jobs – One or more instances of
streamsets.sdk.sch_models.Job
.
- check_snowflake_account_validation(token)[source]#
Check the status of a snowflake account validation process.
- Parameters
token (
str
) – Token of the validation process.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- clone_deployment(deployment, name=None, deployment_tags=None, engine_labels=None, engine_version=None)[source]#
Clone a deployment.
- Parameters
deployment (
streamsets.sdk.sch_models.Deployment
) – Deployment object.name (
str
, optional) – Name of the new deployment. Default:None
.deployment_tags (
list
, optional) – Tags to add to the cloned deployment. Default:`None
.engine_labels (
list
, optional) – Labels to assign to engines belonging to this deployment. Default:None
.engine_version (
str
, optional) – Engine Version ID Default:None
.
- Returns
An instance of
streamsets.sdk.sch_models.Deployment
.
- property connection_audits#
Connection Audits.
- Returns
An instance of
streamsets.sdk.sch_models.ConnectionAudits
.
- property connection_tags#
Connection Tags.
- Returns
An instance of
streamsets.sdk.sch_models.ConnectionTags
.
- property connections#
Connections.
- Returns
An instance of
streamsets.sdk.sch_models.Connections
.
- create_components(component_type, number_of_components=1, active=True)[source]#
Create components.
- Parameters
component_type (
str
) – Component type.number_of_components (
int
, optional) – Default:1
.active (
bool
, optional) – Default:True
.
- Returns
An instance of
streamsets.sdk.sch_api.CreateComponentsCommand
.
- property data_collectors#
Data Collectors registered to the ControlHub instance.
- Returns
Returns a
streamsets.sdk.utils.SeekableList
ofstreamsets.sdk.sch_models.DataCollector
instances.
- property data_protector_enabled#
Whether Data Protector is enabled for the current organization.
- Type
bool
- deactivate_api_credential(api_credential)[source]#
Deactivate an api credential.
- Parameters
api_credential (
streamsets.sdk.sch_models.ApiCredential
) – ApiCredential object.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- deactivate_engine(engine)[source]#
Deactivate an engine.
- Parameters
engine (
streamsets.sdk.sch_models.Engine
) – Engine instance.
- deactivate_environment(*environments)[source]#
Deactivate environments.
- Parameters
*environments – One or more instances of
streamsets.sdk.sch_models.Environment
.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- deactivate_provisioning_agent(provisioning_agent)[source]#
Deactivate provisioning agent.
- Parameters
provisioning_agent (
streamets.sdk.sch_models.ProvisioningAgent
) –- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- deactivate_user(*users)[source]#
Deactivate Users for all given User IDs.
- Parameters
*users – One or more instances of
streamsets.sdk.sch_models.User
.- Returns
An instance of
streamsets.sdk.aster_api.Command
.
- delete_and_unregister_engine(engine)[source]#
Delete and Unregister an engine.
- Parameters
engine (
streamsets.sdk.sch_models.Engine
) – Engine instance.
- delete_api_credentials(*api_credentials)[source]#
Delete api_credentials.
- Parameters
*api_credentials – One or more instances of
streamsets.sdk.sch_models.ApiCredential
.
- delete_connection(*connections)[source]#
Delete connections.
- Parameters
*connections – One or more instances of
streamsets.sdk.sch_models.Connection
.
- delete_deployment(*deployments)[source]#
Delete deployments.
- Parameters
*deployments (
streamsets.sdk.sch_models.Deployment
) – One or more deployments.
- delete_draft_run(*draft_runs)[source]#
Remove a draft run.
- Parameters
*draft_runs – One or more instances of
streamsets.sdk.sch_models.DraftRun
.
- delete_engine(engine)[source]#
Delete an engine.
- Parameters
engine (
streamsets.sdk.sch_models.Engine
) – Engine instance.
- delete_environment(*environments)[source]#
Delete environments.
- Parameters
*environments – One or more instances of
streamsets.sdk.sch_models.Environment
.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- delete_group(*groups)[source]#
Delete groups.
- Parameters
*groups – One or more instances of
streamsets.sdk.sch_models.Group
.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- delete_job(*jobs)[source]#
Delete one or more jobs.
- Parameters
*jobs – One or more instances of
streamsets.sdk.sch_models.Job
.
- delete_job_sequences(*job_sequences)[source]#
Delete Job Sequences.
- Parameters
*job_sequences – One or more instances of
streamsets.sdk.sch_models.JobSequence
- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- delete_legacy_deployment(*deployments)[source]#
Delete deployments.
- Parameters
*deployments – One or more instances of
streamsets.sdk.sch_models.LegacyDeployment
.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- delete_pipeline(pipeline, only_selected_version=False)[source]#
Delete a pipeline.
- Parameters
pipeline (
streamsets.sdk.sch_models.Pipeline
) – Pipeline object.only_selected_version (
boolean
) – Delete only current commit.
- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- delete_pipeline_labels(*pipeline_labels)[source]#
Delete pipeline labels.
- Parameters
*pipeline_labels – One or more instances of
streamsets.sdk.sch_models.PipelineLabel
.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- delete_provisioning_agent(provisioning_agent)[source]#
Delete provisioning agent.
- Parameters
provisioning_agent (
streamets.sdk.sch_models.ProvisioningAgent
) –- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- delete_provisioning_agent_token(provisioning_agent)[source]#
Delete provisioning agent token.
- Parameters
provisioning_agent (
streamets.sdk.sch_models.ProvisioningAgent
) –- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- delete_report_definition(report_definition)[source]#
Delete an existing Report Definition.
- Parameters
report_definition (
streamsets.sdk.sch_models.ReportDefinition
) – Report Definition instance.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- delete_scheduled_tasks(*scheduled_tasks)[source]#
Delete given scheduled tasks.
- Parameters
*scheduled_tasks – One or more instances of
streamsets.sdk.sch_models.ScheduledTask
- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- delete_snowflake_pipeline_defaults()[source]#
Delete the Snowflake pipeline defaults for this user.
- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- delete_snowflake_user_credentials()[source]#
Delete the Snowflake user credentials.
- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- delete_subscription(subscription)[source]#
Delete an exisiting Subscription.
- Parameters
subscription (
streamsets.sdk.sch_models.Subscription
) – A Subscription instance.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- delete_topology(topology, only_selected_version=False)[source]#
Delete a topology.
- Parameters
topology (
streamsets.sdk.sch_models.Topology
) – Topology object.only_selected_version (
boolean
) – Delete only current commit.
- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- delete_unregistered_auth_tokens(executor_type)[source]#
Delete auth tokens for engines that have been unregistered.
- Parameters
executor_type (
str
) – The executor type. Acceptable values are ‘DATACOLLECTOR’ or ‘TRANSFORMER’.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- delete_user(*users, deactivate=False)[source]#
Delete users. Deactivate users before deleting if configured.
- Parameters
*users – One or more instances of
streamsets.sdk.sch_models.User
.deactivate (
bool
, optional) – Default:False
.
- Returns
An instance of
streamsets.sdk.aster_api.Command
.
- property deployments#
Deployments.
- Returns
An instance of
streamsets.sdk.sch_models.Deployments
.
- disable_job_sequence(job_sequence)[source]#
Disable the Job Sequence.
- Parameters
job_sequence – An instance of
streamsets.sdk.sch_models.JobSequence
- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- property draft_runs#
Draft Runs.
- Returns
An instance of
streamsets.sdk.sch_models.DraftRuns
.
- duplicate_job(job, name=None, description=None, number_of_copies=1)[source]#
Duplicate an existing job.
- Parameters
job (
streamsets.sdk.sch_models.Job
) – Job object.name (
str
, optional) – Name of the new job(s). Default:None
. If not specified, name of the job with' copy'
appended to the end will be used.description (
str
, optional) – Description for new job(s). Default:None
.number_of_copies (
int
, optional) – Number of copies. Default:1
.
- Returns
A
streamsets.sdk.utils.SeekableList
ofstreamsets.sdk.sch_models.Job
.
- duplicate_pipeline(pipeline, name=None, description='New Pipeline', number_of_copies=1)[source]#
Duplicate an existing pipeline.
- Parameters
pipeline (
streamsets.sdk.sch_models.Pipeline
) – Pipeline object.name (
str
, optional) – Name of the new pipeline(s). Default:None
.description (
str
, optional) – Description for new pipeline(s). Default:'New Pipeline'
.number_of_copies (
int
, optional) – Number of copies. Default:1
.
- Returns
A
streamsets.sdk.utils.SeekableList
ofstreamsets.sdk.sch_models.Pipeline
.
- edit_job(job)[source]#
Edit a job.
- Parameters
job (
streamsets.sdk.sch_models.Job
) – Job object.- Returns
An instance of
streamsets.sdk.sch_models.Job
.
- enable_job_sequence(job_sequence)[source]#
Enable the Job Sequence.
- Parameters
job_sequence – An instance of
streamsets.sdk.sch_models.JobSequence
- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- property engine_versions#
- Deployment engine version. The engine_versions and engine_configuration (Deployment) properties
have the same object structure represented by DeploymentEngineConfiguration.
- Returns
An instance of
streamsets.sdk.sch_models.DeploymentEngineConfiguration
.
- property environments#
environments.
- Returns
An instance of
streamsets.sdk.sch_models.Environments
.
- export_jobs(jobs)[source]#
Export jobs to a compressed archive.
- Parameters
jobs (
list
) – A list ofstreamsets.sdk.sch_models.Job
instances.- Returns
An instance of type
bytes
indicating the content of zip file with job json files.
- export_pipelines(pipelines, fragments=False, include_plain_text_credentials=False)[source]#
Export pipelines.
- Parameters
pipelines (
list
) – A list ofstreamsets.sdk.sch_models.Pipeline
instances.fragments (
bool
) – Indicates if exporting fragments is needed.include_plain_text_credentials (
bool
) – Indicates if plain text credentials should be included.
- Returns
An instance of type
bytes
indicating the content of zip file with pipeline json files.
- export_protection_policies(protection_policies)[source]#
Export protection policies to a compressed archive.
- Parameters
protection_policies (
list
) – A list ofstreamsets.sdk.sch_models.ProtectionPolicy
instances. –
- Returns
An instance of type
bytes
indicating the content of zip file with protection policy json files.
- export_topologies(topologies)[source]#
Export topologies.
- Parameters
topologies (
list
) – A list ofstreamsets.sdk.sch_models.Topology
instances.- Returns
An instance of type
bytes
indicating the content of zip file with pipeline json files.
- get_admin_tool(base_url, username, password)[source]#
Get ControlHub admin tool.
- Returns
An instance of
streamsets.sdk.sch_models.AdminTool
.
- get_api_credential_builder()[source]#
Get api credential Builder.
- Returns
An instance of
streamsets.sdk.sch_models.ApiCredentialBuilder
.
- get_classification_rule_builder()[source]#
Get a classification rule builder instance with which a classification rule can be created.
- Returns
An instance of
streamsets.sdk.sch_models.ClassificationRuleBuilder
.
- get_components(component_type_id, offset=None, len_=None, order_by='LAST_VALIDATED_ON', order='ASC')[source]#
Get components.
- Parameters
component_type_id (
str
) – Component type id.offset (
str
, optional) – Default:None
.len (
str
, optional) – Default:None
.order_by (
str
, optional) – Default:'LAST_VALIDATED_ON'
.order (
str
, optional) – Default:'ASC'
.
- get_connection_builder()[source]#
Get a connection builder instance with which a connection can be created.
- Returns
An instance of
streamsets.sdk.sch_models.ConnectionBuilder
.
- get_current_job_status(job)[source]#
Returns the current job status for given job id.
- Parameters
job (
streamsets.sdk.sch_models.Job
) – Job object.
- get_data_sla_builder()[source]#
Get a Data SLA builder instance with which a Data SLA can be created.
- Returns
An instance of
streamsets.sdk.sch_models.DataSlaBuilder
.
- get_deployment_builder(deployment_type='SELF')[source]#
Get a deployment builder instance with which a deployment can be created.
- Parameters
( (deployment_type) – obj: str, optional) Valid values are ‘AWS’, ‘GCP’ and ‘SELF’. Default
'SELF'
- Returns
An instance of
streamsets.sdk.sch_models.DeploymentBuilder
.
- get_engine_labels(engine)[source]#
Returns all labels assigned to an engine.
- Parameters
engine (
streamsets.sdk.sch_models.Engine
) – Engine instance.- Returns
A
list
of engine assigned labels.
- get_environment_builder(environment_type='SELF')[source]#
Get an environment builder instance with which an environment can be created.
- Parameters
( (environment_type) – obj: str, optional) Valid values are ‘AWS’, ‘GCP’, ‘KUBERNETES’ and ‘SELF’. Default
'SELF'
- Returns
An instance of
streamsets.sdk.sch_models.EnvironmentBuilder
.
- get_group_builder()[source]#
Get a group builder instance with which a group can be created.
- Returns
An instance of
streamsets.sdk.sch_models.GroupBuilder
.
- get_job_builder()[source]#
Get a job builder instance with which a job can be created.
- Returns
An instance of
streamsets.sdk.sch_models.JobBuilder
.
- get_job_sequence_builder()[source]#
Get a job sequence builder instance with which a job sequence can be created.
- Returns
An instance of
streamsets.sdk.sch_models.JobSequenceBuilder
.
- get_kubernetes_apply_agent_yaml_command(environment)[source]#
Get install script for a Kubernetes environment.
- Parameters
environment (
streamsets.sdk.sch_models.Environment
) – Environment object.- Returns
An instance of
str
.
- get_kubernetes_environment_yaml(environment)[source]#
Get the YAML file for a Kubernetes environment.
- Parameters
environment (
streamsets.sdk.sch_models.Environment
) – Environment for which the YAML file should be fetched.- Returns
An instance of
str
.
- get_legacy_deployment_builder()[source]#
Get a deployment builder instance with which a legacy deployment can be created.
- Returns
An instance of
streamsets.sdk.sch_models.LegacyDeploymentBuilder
.
- get_organization_builder()[source]#
Get an organization builder instance with which an organization can be created.
- Returns
An instance of
streamsets.sdk.sch_models.OrganizationBuilder
.
- get_pipeline_builder(engine_type=None, engine_id=None, fragment=False, **kwargs)[source]#
Get a pipeline builder instance with which a pipeline can be created.
- Parameters
engine_type (
str
) – The type of pipeline that will be created. The options are'data_collector'
,'snowflake'
or'transformer'
.engine_id (
str
) – The ID of the Data Collector or Transformer in which to author the pipeline if not using Transformer for Snowflake.fragment (
boolean
, optional) – Specify if a fragment builder. Default:False
.
- Returns
An instance of
streamsets.sdk.sch_models.PipelineBuilder
orstreamsets.sdk.sch_models.StPipelineBuilder
.
- get_protection_method_builder()[source]#
Get a protection method builder instance with which a protection method can be created.
- Returns
An instance of
streamsets.sdk.sch_models.ProtectionMethodBuilder
.
- get_protection_policy_builder()[source]#
Get a protection policy builder instance with which a protection policy can be created.
- Returns
An instance of
streamsets.sdk.sch_models.ProtectionPolicyBuilder
.
- get_saql_search_builder(saql_search_type, mode='BASIC')[source]#
Get SAQL Search Builder.
- saql_search_type (
str
): Type of SAQL search object, limited to``’PIPELINE’`` or'FRAGMENT'
, 'JOB_INSTANCE'
,'JOB_TEMPLATE'
and'JOB_DRAFT_RUN'
.
mode (
str
, optional): mode of SAQL Search. Default:'BASIC'
- Returns
An instance of
streamsets.sdk.sch_models.SAQLSearchBuilder
.
- saql_search_type (
- get_scheduled_task_builder()[source]#
Get a scheduled task builder instance with which a scheduled task can be created.
- Returns
An instance of
streamsets.sdk.sch_models.ScheduledTaskBuilder
.
- get_self_managed_deployment_install_script(deployment, install_mechanism='DEFAULT', java_version=None)[source]#
Get install script for a Self Managed deployment.
- Parameters
deployment (
streamsets.sdk.sch_models.Deployment
) – deployment object.install_mechanism (
str
, optional) – Possible values for install are “DEFAULT”, “BACKGROUND” and “FOREGROUND”. Default:DEFAULT
java_version (
str
, optional) – Java Development Kit Version to be used. Example: “8”. Default:None
.
- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- get_snowflake_pipeline_defaults()[source]#
Get the Snowflake pipeline defaults for this user (if it exists).
- Returns
A
dict
of Snowflake pipeline defaults.
- get_snowflake_user_credentials()[source]#
Get the Snowflake user credentials (if they exist). They will be redacted.
- Returns
A
dict
of Snowflake user credentials (redacted).
- get_subscription_builder()[source]#
Get Event Subscription Builder.
- Returns
An instance of
streamsets.sdk.sch_models.SubscriptionBuilder
.
- get_topology_builder()[source]#
Get a topology builder instance with which a topology can be created.
- Returns
An instance of
streamsets.sdk.sch_models.TopologyBuilder
.
- get_user_builder()[source]#
Get a user builder instance with which a user can be created.
- Returns
An instance of
streamsets.sdk.sch_models.UserBuilder
.
- property groups#
Groups.
- Returns
An instance of
streamsets.sdk.sch_models.Groups
.
- import_jobs(archive, pipeline=True, number_of_instances=False, labels=False, runtime_parameters=False, **kwargs)[source]#
Import jobs from archived zip directory.
- Parameters
archive (
file
) – file containing the jobs.pipeline (
boolean
, optional) – Indicate if pipeline should be imported. Default:True
.number_of_instances (
boolean
, optional) – Indicate if number of instances should be imported. Default:False
.labels (
boolean
, optional) – Indicate if labels should be imported. Default:False
.runtime_parameters (
boolean
, optional) – Indicate if runtime parameters should be imported. Default:False
.
- Returns
A
streamsets.sdk.utils.SeekableList
ofstreamsets.sdk.sch_models.Job
.
- import_pipeline(pipeline, commit_message, name=None, data_collector_instance=None)[source]#
Import pipeline from json file.
- Parameters
pipeline (
dict
) – A python dict representation of ControlHub Pipeline.commit_message (
str
) – Commit message.name (
str
, optional) – Name of the pipeline. If left out, pipeline name from JSON object will be used. DefaultNone
.data_collector_instance (
streamsets.sdk.sch_models.DataCollector
) – If excluded, system sdc will be used. DefaultNone
.
- Returns
An instance of
streamsets.sdk.sch_models.Pipeline
.
- import_pipelines_from_archive(archive, commit_message, fragments=False)[source]#
Import pipelines from archived zip directory.
- Parameters
archive (
file
) – file containing the pipelines.commit_message (
str
) – Commit message.fragments (
bool
, optional) – Indicates if pipeline contains fragments.
- Returns
A
streamsets.sdk.utils.SeekableList
ofstreamsets.sdk.sch_models.Pipeline
.
- import_protection_policies(policies_archive)[source]#
Import protection policies from a compressed archive.
- Parameters
policies_archive (
file
) – file containing the protection policies.- Returns
A py:class:streamsets.sdk.utils.SeekableList of
streamsets.sdk.sch_models.ProtectionPolicy
.
- import_topologies(archive, import_number_of_instances=False, import_labels=False, import_runtime_parameters=False, **kwargs)[source]#
Import topologies from archived zip directory.
- Parameters
archive (
file
) – file containing the topologies.import_number_of_instances (
boolean
, optional) – Indicate if number of instances should be imported. Default:False
.import_labels (
boolean
, optional) – Indicate if labels should be imported. Default:False
.import_runtime_parameters (
boolean
, optional) – Indicate if runtime parameters should be imported. Default:False
.
- Returns
A
streamsets.sdk.utils.SeekableList
ofstreamsets.sdk.sch_models.Topology
.
- invite_user(user)[source]#
Invite a user to the StreamSets Platform.
- Parameters
user (
streamsets.sdk.sch_models.User
) – User object.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- property job_sequences#
Job Sequences.
- Returns
An instance of
streamsets.sdk.sch_models.JobSequences
.
- property jobs#
Jobs.
- Returns
An instance of
streamsets.sdk.sch_models.Jobs
.
- kill_scheduled_tasks(*scheduled_tasks)[source]#
Kill given scheduled tasks.
- Parameters
*scheduled_tasks – One or more instances of
streamsets.sdk.sch_models.ScheduledTask
- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- property ldap_enabled#
Indication if LDAP is enabled or not.
- Returns
An instance of
boolean
.
- property legacy_deployments#
Deployments.
- Returns
An instance of
streamsets.sdk.sch_models.LegacyDeployments
.
- lock_deployment(deployment)[source]#
Lock deployment.
- Parameters
deployment – An instance of
streamsets.sdk.sch_models.Deployment
.
- property login_audits#
Login Audits.
- Returns
An instance of
streamsets.sdk.sch_models.LoginAudits
.
- mark_saql_search_as_favorite(saql_search)[source]#
Marks a saql search as a favorite. In other words, it stars an existing search query.
- Parameters
saql_search (
streamsets.sdk.sch_models.SAQLSearch
) – The SAQL Search.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- property metering_and_usage#
Metering and usage for the Organization. By default, this will return a report for the last 30 days of metering data. A report for a custom time window can be retrieved by indexing this object with a slice that contains a datetime object for the start (required) and stop (optional, defaults to datetime.now()).
- ex. metering_and_usage[datetime(2022, 1, 1):datetime(2022, 1, 14)]
metering_and_usage[datetime.now() - timedelta(7):]
- Returns
An instance of
streamsets.sdk.sch_models.MeteringUsage
- property organizations#
Organizations.
- Returns
An instance of
streamsets.sdk.sch_models.Organizations
.
- pause_scheduled_tasks(*scheduled_tasks)[source]#
Pause given scheduled tasks.
- Parameters
*scheduled_tasks – One or more instances of
streamsets.sdk.sch_models.ScheduledTask
- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- property pipeline_labels#
Pipeline labels.
- Returns
An instance of
streamsets.sdk.sch_models.PipelineLabels
.
- property pipelines#
Pipelines.
- Returns
An instance of
streamsets.sdk.sch_models.Pipelines
.
- preview_classification_rule(classification_rule, parameter_data, data_collector=None)[source]#
Dynamic preview of a classification rule.
- Parameters
classification_rule (
streamsets.sdk.sch_models.ClassificationRule
) – Classification Rule object.parameter_data (
dict
) – A python dict representation of raw JSON parameters required for preview.data_collector (
streamsets.sdk.sch_models.DataCollector
, optional) – The Data Collector in which to preview the pipeline. If omitted, ControlHub’s first executor SDC will be used. Default:None
.
- Returns
An instance of
streamsets.sdk.sdc_api.PreviewCommand
.
- property protection_policies#
Protection policies.
- Returns
An instance of
streamsets.sdk.sch_models.ProtectionPolicies
.
- property provisioning_agents#
Provisioning Agents registered to the ControlHub instance.
- Returns
An instance of
streamsets.sdk.sch_models.ProvisioningAgents
.
- publish_job_sequence(job_sequence)[source]#
Publishes an Job Sequence to ControlHub.
- Parameters
job_sequence – An instance of
streamsets.sdk.sch_models.JobSequence
- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- publish_pipeline(pipeline, commit_message='New pipeline', draft=False, validate=False)[source]#
Publish a pipeline.
- Parameters
pipeline (
streamsets.sdk.sch_models.Pipeline
) – Pipeline object.commit_message (
str
, optional) – Default:'New pipeline'
.draft (
boolean
, optional) – Default:False
.validate (
boolean
, optional) – Default:False
.
- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- publish_scheduled_task(task)[source]#
Send the scheduled task to Control Hub. DEPRECATED
- Parameters
task (
streamsets.sdk.sch_models.ScheduledTask
) – Scheduled task object.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- publish_topology(topology, commit_message=None)[source]#
Publish a topology.
- Parameters
topology (
streamsets.sdk.sch_models.Topology
) – Topology object to publish.commit_message (
str
, optional) – Commit message to supply with the Topology. Default:None
- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- regenerate_api_credential_auth_token(api_credential)[source]#
Regenerate the auth token for an api credential.
- Parameters
api_credential (
streamsets.sdk.sch_models.ApiCredential
) – ApiCredential object.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- remove_administrator_role(user)[source]#
Remove the Organization Administrator status from the specified user.
- Parameters
user (
streamsets.sdk.sch_models.User
) – User object.- Returns
An instance of
streamsets.sdk.aster_api.Command
.
- remove_saql_search(saql_search)[source]#
Remove a saql search.
- Parameters
saql_search (
streamsets.sdk.sch_models.SAQLSearch
) – The SAQL Search object.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- rename_api_credential(api_credential)[source]#
Rename an api credential.
- Parameters
api_credential (
streamsets.sdk.sch_models.ApiCredential
) – ApiCredential object.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- property report_definitions#
Report Definitions.
- Returns
An instance of
streamsets.sdk.sch_models.ReportDefinitions
.
- reset_origin(*jobs)[source]#
Reset all pipeline offsets for given jobs.
- Parameters
*jobs – One or more instances of
streamsets.sdk.sch_models.Job
.- Returns
A
streamsets.sdk.utils.SeekableList
ofstreamsets.sdk.sch_models.Job
.
- restart_engines(*engines)[source]#
Restart the given engine(s).
- Parameters
*engines – One or more instances of
streamsets.sdk.sch_models.DataCollector
orstreamsets.sdk.sch_models.Transformer
.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- resume_scheduled_tasks(*scheduled_tasks)[source]#
Resume given scheduled tasks.
- Parameters
*scheduled_tasks – One or more instances of
streamsets.sdk.sch_models.ScheduledTask
- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- run_job_sequence(job_sequence)[source]#
Run the Job Sequence.
- Parameters
job_sequence – An instance of
streamsets.sdk.sch_models.JobSequence
- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- run_pipeline_preview(pipeline, batches=1, batch_size=10, skip_targets=True, skip_lifecycle_events=True, end_stage=None, only_schema=None, timeout=120000, test_origin=False, read_policy=None, write_policy=None, executor=None, **kwargs)[source]#
Run pipeline preview.
- Parameters
pipeline (
streamsets.sdk.sch_models.Pipeline
) – Pipeline object.batches (
int
, optional) – Number of batches. Default:1
.batch_size (
int
, optional) – Batch size. Default:10
.skip_targets (
bool
, optional) – Skip targets. Default:True
.skip_lifecycle_events (
bool
, optional) – Skip life cycle events. Default:True
.end_stage (
str
, optional) – End stage. DefaultNone
.only_schema (
bool
, optional) – Only schema. Default:None
.timeout (
int
, optional) – Timeout. Default:120000
.test_origin (
bool
, optional) – Test origin. Default:False
.read_policy (
streamsets.sdk.sch_models.ProtectionPolicy
) – Read Policy for preview. If not provided, uses default read policy if one available. Default:None
.write_policy (
streamsets.sdk.sch_models.ProtectionPolicy
) – Write Policy for preview. If not provided, uses default write policy if one available. Default:None
.executor (
streamsets.sdk.sch_models.DataCollector
, optional) – The Data Collector in which to preview the pipeline. If omitted, ControlHub’s first executor SDC will be used. Default:None
.
- Returns
An instance of
streamsets.sdk.sdc_api.PreviewCommand
.
- run_snowflake_pipeline_preview(pipeline_id, rev=0, batches=1, batch_size=10, skip_targets=True, end_stage=None, only_schema=None, timeout=2000, test_origin=False, stage_outputs_to_override_json=None, **kwargs)[source]#
Run Snowflake pipeline preview.
- Parameters
pipeline_id (
str
) – The pipeline instance’s ID.rev (
int
, optional) – Pipeline revision. Default:0
.batches (
int
, optional) – Number of batches. Default:1
.batch_size (
int
, optional) – Batch size. Default:10
.skip_targets (
bool
, optional) – Skip targets. Default:True
.end_stage (
str
, optional) – End stage. Default:None
.only_schema (
bool
, optional) – Only schema. Default:None
.timeout (
int
, optional) – Timeout. Default:2000
.test_origin (
bool
, optional) – Test origin. Default:False
stage_outputs_to_override_json (
str
, optional) – Stage outputs to override. Default:None
.wait (
bool
, optional) – Wait for pipeline preview to finish. Default:True
.
- Returns
An instance of
streamsets.sdk.sdc_api.PreviewCommand
.
- property saql_saved_searches_draft_run#
Get SAQL Searches for type Draft Run.
- Returns
An instance of
streamsets.sdk.sch_models.SAQLSearches
.
- property saql_saved_searches_fragment#
Get SAQL Searches for type Fragment.
- Returns
An instance of
streamsets.sdk.sch_models.SAQLSearches
.
- property saql_saved_searches_job_instance#
Get SAQL Searches for type Job Instance.
- Returns
An instance of
streamsets.sdk.sch_models.SAQLSearches
.
- property saql_saved_searches_job_template#
Get SAQL Searches for type Job Template.
- Returns
An instance of
streamsets.sdk.sch_models.SAQLSearches
.
- property saql_saved_searches_pipeline#
Get SAQL Searches for type Pipeline.
- Returns
An instance of
streamsets.sdk.sch_models.SAQLSearches
.
- save_saql_search(saql_search)[source]#
Save a saql search query.
- Parameters
saql_search (
streamsets.sdk.sch_models.SAQLSearch
) – The SAQL Search object.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- scale_legacy_deployment(deployment, num_instances)[source]#
Scale up/down active deployment.
- Parameters
deployment (
streamsets.sdk.sch_models.LegacyDeployment
) – Deployment object.num_instances (
int
) – Number of sdc instances.
- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- property scheduled_tasks#
Scheduled Tasks.
- Returns
An instance of
streamsets.sdk.sch_models.ScheduledTasks
.
- set_default_read_protection_policy(protection_policy)[source]#
Set a default read protection policy.
- Parameters
protection_policy –
:param (
streamsets.sdk.sch_models.ProtectionPolicy
): Protection :param Policy object to be set as the default read policy.:- Returns
An updated instance of
streamsets.sdk.sch_models.ProtectionPolicy
.
- set_default_write_protection_policy(protection_policy)[source]#
Set a default write protection policy.
- Parameters
protection_policy –
:param (
streamsets.sdk.sch_models.ProtectionPolicy
): Protection :param Policy object to be set as the default write policy.:- Returns
An updated instance of
streamsets.sdk.sch_models.ProtectionPolicy
.
- shutdown_engines(*engines)[source]#
Call shutdown on the given engine(s).
- Parameters
*engines – One or more instances of
streamsets.sdk.sch_models.DataCollector
orstreamsets.sdk.sch_models.Transformer
.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- start_deployment(*deployments)[source]#
Start Deployments.
- Parameters
*deployments – One or more instances of
streamsets.sdk.sch_models.Deployment
.
- start_draft_run(pipeline, reset_origin=False, runtime_parameters=None)[source]#
Start a draft run.
- Parameters
pipeline (
streamsets.sdk.sch_models.Pipeline
) – Pipeline object. (Must be in draft mode)reset_origin (
boolean
, optional) – Default:False
.runtime_parameters (
dict
, optional) – Pipeline runtime parameters. Default:None
.
- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- start_job(*jobs, wait=True, **kwargs)[source]#
Start one or more jobs.
- Parameters
*jobs – One or more instances of
streamsets.sdk.sch_models.Job
.wait (
bool
, optional) – Wait for pipelines to reach RUNNING status before returning. Default:True
.**kwargs – Arbitrary keyword arguments.
- Returns
An instance of
streamsets.sdk.sch_api.StartJobsCommand
.
- start_job_template(job_template, name=None, description=None, attach_to_template=True, delete_after_completion=False, instance_name_suffix='COUNTER', number_of_instances=1, parameter_name=None, raw_job_tags=None, runtime_parameters=None, wait_for_data_collectors=False, inherit_permissions=False, wait=True, asynchronous=False)[source]#
Start Job instances from a Job Template.
- Parameters
job_template (
streamsets.sdk.sch_models.Job
) – A Job instance with the property job_template set toTrue
.name (
str
, optional) – Name of the new job(s). Default:None
. If not specified, name of the job template with' copy'
appended to the end will be used.description (
str
, optional) – Description for new job(s). Default:None
.attach_to_template (
bool
, optional) – Default:True
.delete_after_completion (
bool
, optional) – Default:False
.instance_name_suffix (
str
, optional) – Suffix to be used for Job names in {‘COUNTER’, ‘TIME_STAMP’, ‘PARAM_VALUE’}. Default:COUNTER
.number_of_instances (
int
, optional) – Number of instances to be started using given parameters. Default:1
.parameter_name (
str
, optional) – Specified when instance_name_suffix is ‘PARAM_VALUE’. Default:None
.raw_job_tags (
list
, optional) – Default:None
.runtime_parameters (
dict
) or (list
) – Runtime Parameters to be used in the jobs. If a dict is specified,number_of_instances
jobs will be started. If a list is specified,number_of_instances
is ignored and job instances will be started using the elements of the list as Runtime Parameters for each job. If left out, Runtime Parameters from Job Template will be used. Default:None
.wait_for_data_collectors (
bool
, optional) – Wait for the created job instance(s) to be assigned an engine before proceeding. Default:False
.inherit_permissions (
bool
, optional) – Parameter to determine if the user wants to inherit the ACL from the template instead of getting the default ACL for it. Default:False
.wait (
bool
, optional) – Wait for jobs to reach ACTIVE status before returning. Default:True
.asynchronous (
bool
, optional) – Whether to start and create the jobs asynchronously or not. Default:False
.
- Returns
A
streamsets.sdk.utils.SeekableList
ofstreamsets.sdk.sch_models.Job
instances.
- start_legacy_deployment(deployment, **kwargs)[source]#
Start Deployment.
- Parameters
deployment (
streamsets.sdk.sch_models.LegacyDeployment
) – Deployment instance.wait (
bool
, optional) – Wait for deployment to start. Default:True
.wait_for_statuses (
list
, optional) – Deployment statuses to wait on. Default:['ACTIVE']
.
- Returns
An instance of
streamsets.sdk.sch_api.DeploymentStartStopCommand
.
- stop_deployment(*deployments)[source]#
Stop Deployments.
- Parameters
*deployments – One or more instances of
streamsets.sdk.sch_models.Deployment
.
- stop_draft_run(*draft_runs, force=False, timeout_sec=300)[source]#
Stop a draft run.
- Parameters
*draft_runs – One or more instances of
streamsets.sdk.sch_models.DraftRuns
.force (
bool
, optional) – Force draft run to stop. Default:False
.timeout_sec (
int
, optional) – Timeout in secs. Default:300
.
- Returns
An instance of
streamsets.sdk.sch_api.StopJobsCommand
.
- stop_job(*jobs, force=False, timeout_sec=300)[source]#
Stop one or more jobs.
- Parameters
*jobs – One or more instances of
streamsets.sdk.sch_models.Job
.force (
bool
, optional) – Force job to stop. Default:False
.timeout_sec (
int
, optional) – Timeout in secs. Default:300
.
- Returns
An instance of
streamsets.sdk.sch_api.StopJobsCommand
.
- stop_legacy_deployment(deployment, wait_for_statuses=['INACTIVE'])[source]#
Stop Deployment.
- Parameters
deployment (
streamsets.sdk.sch_models.LegacyDeployment
) – Deployment instance.wait_for_statuses (
list
, optional) – List of statuses to wait for. Default:['INACTIVE']
.
- Returns
An instance of
streamsets.sdk.sch_api.DeploymentStartStopCommand
.
- stop_test_pipeline_run(start_pipeline_command)[source]#
Stop the test run of pipeline.
- Parameters
start_pipeline_command (
streamsets.sdk.sdc_api.StartPipelineCommand
) –- Returns
An instance of
streamsets.sdk.sdc_api.StopPipelineCommand
.
- property subscription_audits#
Subscription audits.
- property subscriptions#
Event Subscriptions.
- Returns
An instance of
streamsets.sdk.sch_models.Subscriptions
.
- sync_job(*jobs)[source]#
Sync one or more jobs.
- Parameters
*jobs – One or more instances of
streamsets.sdk.sch_models.Job
.
- test_pipeline_run(pipeline, reset_origin=False, parameters=None)[source]#
Test run a pipeline.
- Parameters
pipeline (
streamsets.sdk.sch_models.Pipeline
) – Pipeline object.reset_origin (
boolean
, optional) – Default:False
.parameters (
dict
, optional) – Pipeline parameters. Default:None
.
- Returns
An instance of
streamsets.sdk.sdc_api.StartPipelineCommand
.
- property topologies#
Topologies.
- Returns
An instance of
streamsets.sdk.sch_models.Topologies
.
- property transformers#
Transformers registered to the ControlHub instance.
- Returns
Returns a
streamsets.sdk.utils.SeekableList
ofstreamsets.sdk.sch_models.Transformer
instances.
- unlock_deployment(deployment)[source]#
Unlock deployment.
- Parameters
deployment – An instance of
streamsets.sdk.sch_models.Deployment
.
- update_connection(connection)[source]#
Update a connection.
- Parameters
connection (
streamsets.sdk.sch_models.Connection
) – Connection object.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- update_deployment(deployment)[source]#
Update a deployment.
- Parameters
deployment (
streamsets.sdk.sch_models.Deployment
) – deployment object.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- update_engine_labels(engine)[source]#
Update an engine’s labels.
- Parameters
engine (
streamsets.sdk.sch_models.Engine
) – Engine instance.- Returns
An instance of
streamsets.sdk.sch_models.Engine
.
- update_engine_resource_thresholds(engine, max_cpu_load=None, max_memory_used=None, max_pipelines_running=None)[source]#
Updates engine resource thresholds.
- Parameters
engine (
streamsets.sdk.sch_models.Engine
) – Engine instance.max_cpu_load (
float
, optional) – Max CPU load in percentage. Default:None
.max_memory_used (
int
, optional) – Max memory used in MB. Default:None
.max_pipelines_running (
int
, optional) – Max pipelines running. Default:None
.
- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- update_environment(environment, timeout_sec=300)[source]#
Update an environment.
- Parameters
environment (
streamsets.sdk.sch_models.Environment
) – environment object.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- update_group(group)[source]#
Update a group.
- Parameters
group (
streamsets.sdk.sch_models.Group
) – Group object.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- update_job(job)[source]#
Update a job.
- Parameters
job (
streamsets.sdk.sch_models.Job
) – Job object.- Returns
An instance of
streamsets.sdk.sch_models.Job
.
- update_job_sequence_metadata(job_sequence)[source]#
Update Job Sequence metadata.
- Parameters
job_sequence – An instance of
streamsets.sdk.sch_models.JobSequence
- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- update_legacy_deployment(deployment)[source]#
Update a deployment.
- Parameters
deployment (
streamsets.sdk.sch_models.LegacyDeployment
) – Deployment object.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- update_organization(organization)[source]#
Update an organization.
- Parameters
organization (
streamsets.sdk.sch_models.Organization
) – Organization instance.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- update_pipelines_with_different_fragment_version(pipelines, from_fragment_version, to_fragment_version)[source]#
Update pipelines with latest pipeline fragment commit version.
- Parameters
pipelines (
list
) – List ofstreamsets.sdk.sch_models.Pipeline
instances.from_fragment_version (
streamsets.sdk.sch_models.PipelineCommit
) – commit of fragment from which the pipeline needs to be updated.to_fragment_version (
streamsets.sdk.sch_models.PipelineCommit
) – commit of fragment to which the pipeline needs to be updated.
- Returns
An instance of
streamsets.sdk.sch_api.Command
- update_report_definition(report_definition)[source]#
Update an existing Report Definition.
- Parameters
report_definition (
streamsets.sdk.sch_models.ReportDefinition
) – Report Definition instance.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- update_saql_search(saql_search)[source]#
Update a saql search query.
- Parameters
saql_search (
streamsets.sdk.sch_models.SAQLSearch
) – The SAQL Search object.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- update_scheduled_task(task)[source]#
Update an existing scheduled task in Control Hub.
- Parameters
task (
streamsets.sdk.sch_models.ScheduleTask
) – Scheduled task object.- Returns
An instance of py:class:streamsets.sdk.sch_api.Command.
- update_snowflake_pipeline_defaults(account_url=None, database=None, warehouse=None, schema=None, role=None)[source]#
Create or update the Snowflake pipeline defaults for this user.
- Parameters
account_url (
str
, optional) – Snowflake account url. Default:None
database (
str
, optional) – Snowflake database to query against. Default:None
warehouse (
str
, optional) – Snowflake warehouse. Default:None
schema (
str
, optional) – Schema used. Default:None
role (
str
, optional) – Role used. Default:None
- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- update_snowflake_user_credentials(username, snowflake_login_type, password=None, private_key=None, role=None)[source]#
Create or update the Snowflake user credentials.
- Parameters
username (
str
) – Snowflake account username.snowflake_login_type (
str
) – Snowflake login type to use. Options arepassword
andprivateKey
.password (
str
, optional) – Snowflake account password. Default:None
private_key (
str
, optional) – Snowflake account private key. Default:None
role (
str
, optional) – Snowflake role of the account. Default:None
- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- update_subscription(subscription)[source]#
Update an existing Subscription.
- Parameters
subscription (
streamsets.sdk.sch_models.Subscription
) – A Subscription instance.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- update_user(user)[source]#
- Update a user. Some user attributes are updated by ControlHub such as
last_modified_by, last_modified_on.
- Parameters
user (
streamsets.sdk.sch_models.User
) – User object.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- upgrade_job(*jobs)[source]#
Upgrade job(s) to latest pipeline version.
- Parameters
*jobs – One or more instances of
streamsets.sdk.sch_models.Job
.- Returns
A
streamsets.sdk.utils.SeekableList
ofstreamsets.sdk.sch_models.Job
.
- upload_offset(job, offset_file=None, offset_json=None)[source]#
Upload offset for given job.
- Parameters
job (
streamsets.sdk.sch_models.Job
) – Job object.offset_file (
file
, optional) – File containing the offsets. Default:None
. Exactly one ofoffset_file
,offset_json
should specified.offset_json (
dict
, optional) – Contents of offset. Default:None
. Exactly one ofoffset_file
,offset_json
should specified.
- Returns
An instance of
streamsets.sdk.sch_models.Job
.
- property users#
Users.
- Returns
An instance of
streamsets.sdk.sch_models.Users
.
- validate_pipeline(pipeline)[source]#
Validate pipeline. Only Transformer for Snowflake pipelines are supported at this time.
- Parameters
pipeline (
streamsets.sdk.sch_models.Pipeline
) – Pipeline instance.- Raises
streamsets.sdk.exceptions.ValidationError – If validation fails.
- validate_snowflake_account(account_url, mode, snowflake_login_type, username, password=None, private_key=None)[source]#
Start a snowflake account validation process.
- Parameters
account_url (
str
) – Snowflake url where the account should exist.mode (
str
) – Mode of the account. Possible values are “EDIT”, “PUBLISHED” and “CURRENT_CRED”.snowflake_login_type (
str
) – Login type for the snowflake account. Possible values are “PASSWORD” and “PRIVATE_KEY”.username (
str
) – Username of the snowflake account.password (
str
, optional) – Password of the snowflake account. Default:None
.private_key (
str
, optional) – Private key of the snowflake account. Default:None
.
- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- verify_connection(connection, library=None)[source]#
Verify connection.
- Parameters
connection (
streamsets.sdk.sch_models.Connection
) – Connection object.library (:py:`str`, optional) – Specify the library to test against. Default:
None
.
- Returns
An instance of
streamsets.sdk.sch_models.ConnectionVerificationResult
.
- wait_for_job_metric(job, metric, value, timeout_sec=200)[source]#
Block until a job’s realtime summary reaches the desired value for the desired metric.
- Parameters
job (
streamsets.sdk.sch_models.Job
) – The job instance.metric (
str
) – The desired metric (e.g.'output_record_count'
or'input_record_count'
).value (
int
) – The desired value to wait for.timeout (
int
, optional) – Timeout to wait formetric
to reachvalue
, in seconds. Default:streamsets.sdk.sch.DEFAULT_WAIT_FOR_METRIC_TIMEOUT
.
- Raises
TimeoutError – If
timeout
passes withoutmetric
reachingvalue
.
- wait_for_job_metrics_record_count(job, count, timeout_sec=200)[source]#
Block until a job’s metrics reaches the desired count.
- Parameters
job (
streamsets.sdk.sch_models.Job
) – The job instance.count (
int
) – The desired value to wait for.timeout_sec (
int
, optional) – Timeout to wait formetric
to reachcount
, in seconds. Default:streamsets.sdk.sch.DEFAULT_WAIT_FOR_METRIC_TIMEOUT
.
- Raises
TimeoutError – If
timeout_sec
passes withoutmetric
reachingvalue
.
- wait_for_job_sequence_status(job_sequence, expected_status, timeout_sec=300)[source]#
Block until the job sequence reaches the desired status.
- Parameters
job_sequence (
streamsets.sdk.sch_models.JobSequence
) – The JobSequence instance.expected_status (
str
) – The desired status to wait for.timeout_sec (
int
, optional) – Timeout to wait forjob_sequence
to reachexpected_status
, in seconds. Default:streamsets.sdk.sch.DEFAULT_WAIT_FOR_STATUS_TIMEOUT
.
- Raises
TimeoutError – If
timeout_sec
passes withoutjob_sequence
reachingexpected_status
.TypeError – If
job_sequence
is not astreamsets.sdk.sch_models.JobSequence
instance.
- wait_for_job_status(job, status, check_failures=False, timeout_sec=300)[source]#
Block until a job reaches the desired status.
- Parameters
job (
streamsets.sdk.sch_models.Job
) – The job instance.status (
str
) – The desired status to wait for.check_failures (
bool
, optional) – Flag to check for job exceptions. Default: Falsetimeout_sec (
int
, optional) – Timeout to wait forjob
to reachstatus
, in seconds. Default:streamsets.sdk.sch.DEFAULT_WAIT_FOR_STATUS_TIMEOUT
.
- Raises
TimeoutError – If
timeout_sec
passes withoutjob
reachingstatus
.
Models#
These models wrap and provide useful functionality for interacting with common SCH abstractions.
ACLs#
- class streamsets.sdk.sch_models.ACL(*args, **kwargs)[source]#
Represents an ACL.
- Parameters
acl (
dict
) – JSON representation of an ACL.control_hub (
streamsets.sdk.sch.ControlHub
) – ControlHub object.
- resource_id#
Resource ID of the ACL.
- Type
str
- resource_owner#
Resource owner of the ACL.
- Type
str
- resource_created_time#
Creation time of the resource.
- Type
str
- resource_type#
Resource type of the ACL.
- Type
str
- last_modified_by#
Who the resource was last modified by.
- Type
str
- last_modified_on#
Last modified time of the resource.
- Type
str
- permissions#
A Collection of Permissions.
- Type
streamsets.sdk.sch_models.Permissions
- permission_builder#
A Permission Builder instance.
- add_permission(permission)[source]#
Add new permission to the ACL.
- Parameters
permission (
streamsets.sdk.sch_models.Permission
) – A permission object.- Returns
An instance of
streamsets.sdk.sch_api.Command
- property permission_builder#
Get a permission builder instance with which a pipeline can be created.
- Returns
An instance of
streamsets.sdk.sch_models.ACLPermissionBuilder
.
- remove_permission(permission)[source]#
Remove a permission from ACL.
- Parameters
permission (
streamsets.sdk.sch_models.Permission
) – A permission object.- Returns
An instance of
streamsets.sdk.sch_api.Command
- class streamsets.sdk.sch_models.ACLPermissionBuilder(permission, acl)[source]#
Class to help build the ACL permission.
- Parameters
permission (
streamsets.sdk.sch_models.Permission
) – A permission object.acl (
streamsets.sdk.sch_models.ACL
) – An ACL object.
- build(subject_id, subject_type, actions)[source]#
Method to help build the ACL permission.
- Parameters
subject_id (
str
) – ID of the USER or GROUP.subject_type (
str
) – Type of the subject. Accepted Values are ‘USER’, ‘GROUP’.actions (
list
) – A list of actions of typestr
e.g. [‘READ’, ‘WRITE’, ‘EXECUTE’].
- Returns
An instance of
streamsets.sdk.sch_models.Permission
.
- class streamsets.sdk.sch_models.Permission(*args, **kwargs)[source]#
A container for a permission.
- Parameters
permission (
dict
) – A Python object representation of a permission.resource_type (
str
) – String representing the type of resource e.g. ‘JOB’, ‘PIPELINE’.api_client (
streamsets.sdk.sch_api.ApiClient
) – An instance of ApiClient.
- resource_id#
Id of the resource e.g. Pipeline or Job.
- Type
str
- subject_id#
Id of the subject e.g. user id
'admin@admin'
.- Type
str
- subject_type#
Type of the subject e.g.
'USER'
.- Type
str
- last_modified_by#
User who last modified this permission e.g.
'admin@admin'
.- Type
str
- last_modified_on#
Timestamp at which this permission was last modified e.g.
1550785079811
.- Type
int
Alerts#
- class streamsets.sdk.sch_models.Alert(*args, **kwargs)[source]#
Model for Alerts.
- message#
The Alert’s message text.
- Type
str
- alert_status#
The status of the Alert.
- Type
str
- Parameters
alert (
dict
) – JSON representation of an Alert.control_hub (
streamsets.sdk.ControlHub
) – Control Hub instance.
Api Credentials#
- class streamsets.sdk.sch_models.ApiCredential(*args, **kwargs)[source]#
Model for ApiCredential.
- Parameters
api_credential (
dict
) – A Python object representation of ApiCredential.control_hub (
streamsets.sdk.sch.ControlHub
) – ControlHub object. Default:None
- active#
Whether the API Credential is active or not.
- Type
bool
- auth_token#
Auth token to be used while making an API call.
- Type
str
- created_by#
User that created this API Credential.
- Type
str
- created_for#
The user who should use the API Credential.
- Type
str
- credential_id#
Credential ID to be used while making an API call.
- Type
str
- name#
Name of the API Credential.
- Type
str
- class streamsets.sdk.sch_models.ApiCredentials(control_hub)[source]#
Collection of
streamsets.sdk.sch_models.ApiCredential
instances.- Parameters
control_hub (
streamsets.sdk.sch.ControlHub
) – ControlHub object.
- class streamsets.sdk.sch_models.ApiCredentialBuilder(api_credential, control_hub)[source]#
Class with which to build instances of
streamsets.sdk.sch_models.ApiCredential
.- Instead of instantiating this class directly, most users should use
streamsets.sdk.sch.ControlHub.get_api_credential_builder()
.
- Parameters
api_credential (
dict
) – Python object built from our Swagger ApiCredentialJson definition.control_hub (
streamsets.sdk.sch.ControlHub
) – ControlHub object.
- build(name, user_id=None)[source]#
Build the ApiCredential.
- Parameters
name (
str
) – ApiCredential name.user_id (
str
, optional) – User ID for whom the credentials should be created, can only be used as an Org-Admin.
- Returns
An instance of
streamsets.sdk.sch_models.ApiCredential
.
Classifiers#
Classification Rules#
- class streamsets.sdk.sch_models.ClassificationRule(*args, **kwargs)[source]#
Classification Rule Model.
- Parameters
classification_rule (
dict
) – A Python dict representation of classification rule.classifiers (
list
) – A list ofstreamsets.sdk.sch_models.Classifier
instances.
- class streamsets.sdk.sch_models.ClassificationRuleBuilder(classification_rule, classifier)[source]#
Class with which to build instances of
streamsets.sdk.sch_models.ClassificationRule
.- Instead of instantiating this class directly, most users should use
streamsets.sdk.sch.ControlHub.get_classification_rule_builder()
.
- Parameters
classification_rule (
dict
) – Python object defining a classification rule.classifier (
dict
) – Python object defining a classifier.
- add_classifier(patterns=None, match_with=None, regular_expression_type='RE2/J', case_sensitive=False)[source]#
Add classifier to the classification rule.
- Parameters
patterns (
list
, optional) – List of strings of patterns. Default:None
.match_with (
str
, optional) – Default:None
.regular_expression_type (
str
, optional) – Default:'RE2/J'
.case_sensitive (
bool
, optional) – Default:False
.
- Returns
An instance of
streamsets.sdk.sch_models.Classifier
.
- build(name, category, score)[source]#
Build the classification rule.
- Parameters
name (
str
) – Classification Rule name.category (
str
) – Classification Rule category.score (
float
) – Classification Rule score.
- Returns
An instance of
streamsets.sdk.sch_models.ClassificationRule
.
Connections#
- class streamsets.sdk.sch_models.Connection(*args, **kwargs)[source]#
Model for connection.
- Parameters
connection (
dict
) – A Python object representation of Connection.control_hub (
streamsets.sdk.sch.ControlHub
) – ControlHub object.
- connection_definition#
The Connection Definition JSON.
- Type
dict
- library_definition#
The Library Definition JSON.
- Type
dict
- pipeline_commits (:py:class:`streamsets.sdk.utils.SeekableList` of
streamsets.sdk.sch_models.PipelineCommit
instances): Pipeline commits using this connection.
- acl#
Update Connection ACL.
- tags#
Connection tags.
- Type
streamsets.sdk.utils.SeekableList
of instances ofstreamsets.sdk.sch_models.Tag
- property acl#
Get Connection ACL.
- Returns
An instance of
streamsets.sdk.sch_models.ACL
.
- property pipeline_commits#
Get the pipeline commits using this connection.
- Returns
- property tags#
Get the connection tags.
- Returns
A
streamsets.sdk.utils.SeekableList
of instances ofstreamsets.sdk.sch_models.Tag
.
- class streamsets.sdk.sch_models.Connections(control_hub, organization)[source]#
Collection of
streamsets.sdk.sch_models.Connection
instances.- Parameters
control_hub – An instance of
streamsets.sdk.sch.ControlHub
.organization (
str
) – Organization Id.
- class streamsets.sdk.sch_models.ConnectionBuilder(connection, control_hub)[source]#
Class with which to build instances of
streamsets.sdk.sch_models.Connection
.- Instead of instantiating this class directly, most users should use
streamsets.sdk.sch.ControlHub.get_connection_builder()
.
- Parameters
connection (
dict
) – Python object built from our Swagger ConnectionJson definition.control_hub (
streamsets.sdk.sch.ControlHub
) – ControlHub object.
- build(title, connection_type, authoring_data_collector, tags=None)[source]#
Define the connection.
- Parameters
title (
str
) – Connection title.connection_type (
str
) – Type of connection. The options are ‘STREAMSETS_AWS_EMR_CLUSTER’, ‘STREAMSETS_MYSQL’, ‘STREAMSETS_SNOWFLAKE’, ‘STREAMSETS_COAP_CLIENT’, ‘STREAMSETS_OPC_UA_CLIENT’, ‘STREAMSETS_GOOGLE_PUB_SUB’, ‘STREAMSETS_MQTT’, ‘STREAMSETS_POSTGRES’, ‘STREAMSETS_GOOGLE_CLOUD_STORAGE’, ‘STREAMSETS_AWS_REDSHIFT’, ‘STREAMSETS_GOOGLE_BIG_QUERY’, ‘STREAMSETS_ORACLE’, ‘STREAMSETS_AWS_S3’, ‘STREAMSETS_REMOTE_FILE’, ‘STREAMSETS_SQLSERVER’, ‘STREAMSETS_AWS_SQS’, ‘STREAMSETS_SNOWPIPE’, and ‘STREAMSETS_JDBC’.authoring_data_collector (
streamsets.sdk.sch.DataCollector
) – Authoring Data Collector.tags (
list
, optional) – List of tags (strings). Default:None
.
- Returns
An instance of
streamsets.sdk.sch_models.Connection
.
- class streamsets.sdk.sch_models.ConnectionVerificationResult(*args, **kwargs)[source]#
Model for connection verification result.
- Parameters
connection_preview_json (
dict
) – dynamic preview API response JSON.
- issue_count#
The count of the number of issues for the connection verification result.
- Type
int
- issue_message#
The message provided for the connection verification result.
- Type
str
- property issue_count#
The count of the number of issues for the connection verification result.
- Returns
A
int
that represents the number of issues.
- property issue_message#
The message provided for the connection verification result.
- Returns
A
str
message detailing the response for the connection verification result.
DataCollectors#
- class streamsets.sdk.sch_models.DataCollector(*args, **kwargs)[source]#
Model for Data Collector.
- execution_mode#
True
for Edge andFalse
for SDC.- Type
bool
- id#
Data Collectort id.
- Type
str
- labels#
Labels for Data Collector.
- Type
list
- last_validated_on#
Last validated time for Data Collector.
- Type
str
- reported_labels#
Reported labels for Data Collector.
- Type
list
- url#
Data Collector’s url.
- Type
str
- accessible#
Whether the Data Collector instance is accessible.
- Type
bool
- responding#
Whether the Data Collector instance is responding.
- Type
bool
- attributes#
Data Collector’s attributes.
- Type
dict
- attributes_updated_on#
When the Data Collector’s attributes were last updated on.
- Type
str
- authentication_token_generated_on#
When the Data Collector authentication token was generated.
- Type
str
- jobs#
The Data Collector’s jobs.
- job_ids#
Data Collector’s job ids.
- Type
str
- registered_by#
Who registered the Data Collector.
- Type
str
- pipelines_committed#
Pipelines that have been committed.
- Type
list
of (str
objects)
- resource_thresholds#
DataCollector resource thresholds.
- Type
str
- acl#
DataCollector ACL.
- property accessible#
Returns a
bool
for whether the Data Collector instance is accessible.
- property acl#
Get DataCollector ACL.
- Returns
An instance of
streamsets.sdk.sch_models.ACL
.
- property attributes#
Returns a
dict
of Data Collector attributes.
- property attributes_updated_on#
Returns when the Data Collector attributes was updated.
- property authentication_token_generated_on#
Returns when the Data Collector authentication token was generated.
- property job_ids#
Returns the Data Collector Job ids.
- property jobs#
Returns the Data Collector
streamsets.sdk.sch_models.Job
instances.
- property pipelines_committed#
Pipelines that have been committed.
- Returns
A
list
of Job IDs (str
objects).
- property registered_by#
Returns who the Data Collector was registered by.
- property resource_thresholds#
Return DataCollector resource thresholds.
- Returns
- A
dict
of DataCollector thresholds named as “max_memory_used”, “max_cpu_load” and “max_pipelines_running”
- A
- property responding#
Returns a
bool
for whether the Data Collector instance is responding.
Deployments#
- class streamsets.sdk.sch_models.AzureVMDeployment(deployment, control_hub=None, **kwargs)[source]#
Model for Azure VM Deployment.
- attach_public_ip#
- Type
bool
- azure_tags#
Azure tags to apply to all provisioned resources in this Azure deployment.
- Type
dict
- desired_instances#
Number of engine instances to deploy.
- Type
int
- key_pair_name#
SSH key pair.
- Type
str
- managed_identity#
- Type
str
- public_ssh_key#
- Type
str
- resource_group#
- Type
str
- ssh_key_source#
SSH key source.
- Type
str
- vm_size#
- Type
str
- zones#
- Type
list
- class streamsets.sdk.sch_models.Deployment(deployment, control_hub=None, **kwargs)[source]#
Model for Deployment. This is an abstract class.
- acl#
The Deployment ACL instance.
- created_by#
User that created this deployment.
- Type
str
- created_on#
Time at which this deployment was created.
- Type
int
- deployment_events#
Name of the deployment.
- Type
DeploymentEvents
- deployment_id#
Id of the deployment.
- Type
str
- deployment_name#
Name of the deployment.
- Type
str
- deployment_tags#
Raw deployment tags of the deployment.
- Type
list
- deployment_type#
Type of the deployment.
- Type
str
- desired_instances#
The Deployment desired number of instances.
- Type
int
- engine_configuration#
The Deployment Engine Configuration.
- engine_type#
The Deployment Engine type.
- Type
str
- engine_version#
The Deployment Engine Version.
- Type
str
- environment_id#
Enabled environment where engine will be deployed.
- Type
list
- last_modified_by#
User that last modified this deployment.
- Type
str
- last_modified_on#
Time at which this deployment was last modified.
- Type
int
- network_tags#
The Deployment network tags.
- Type
list
- scala_binary_version#
scala Binary Version.
- Type
str
- organization#
The Deployment organization.
- Type
str
- json_state#
State of the deployment - this is json field called state.
- Type
str
- state#
State of the deployment - displayed as state on UI.
- Type
str
- status#
Status of the deployment.
- Type
str
- status_detail#
Status detail of the deployment.
- Type
str
- registered_engines#
Registered Engines of the deployment.
- Type
RegisteredEngines
- tags#
Tags of the deployment.
- Type
a
streamsets.sdk.utils.SeekableList
ofstreamsets.sdk.sch_models.Tag
instances
- class ENGINE_TYPES(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]#
- class TYPES(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]#
- property acl#
Get deployment ACL.
- Returns
An instance of
streamsets.sdk.sch_models.ACL
.
- property deployment_events#
Get the events for a deployment.
- Returns
An instance of
streamsets.sdk.sch_models.DeploymentEvents
- property engine_configuration#
The engine configuration of deployment engine. :returns: An instance of
streamsets.sdk.sch_models.DeploymentEngineConfiguration
.
- property registered_engines#
Get the registered engines.
- Returns
An instance of
streamsets.sdk.sch_models.RegisteredEngines
- property tags#
Get the tags.
- Returns
A
streamsets.sdk.utils.SeekableList
ofstr
instances.
- class streamsets.sdk.sch_models.DeploymentBuilder(deployment, control_hub)[source]#
Class with which to build instances of
streamsets.sdk.sch_models.Deployment
.- Instead of instantiating this class directly, most users should use
streamsets.sdk.sch.ControlHub.get_deployment_builder()
.
- Parameters
deployment (
dict
) – Python object built from our Swagger CspDeploymentJson definition.control_hub (
streamsets.sdk.sch.ControlHub
) – ControlHub object.
- build(deployment_name, engine_type, engine_version, environment, external_resource_location=None, engine_labels=None, max_cpu_load=80.0, max_memory_used=100.0, max_pipelines_running=1000000, deployment_tags=None, engine_build=None, scala_binary_version=None, **kwargs)[source]#
Define the deployment.
- Parameters
deployment_name (
str
) – deployment name.engine_type (
str
) – Type of engine to deploy.engine_version (
str
) – Version of engine to deploy.environment (
streamsets.sdk.sch_models.Environment
) – The environment instance.external_resource_location (
str
, optional) – External Resource Location URL. Default:None
.engine_labels (
list
, optional) – List of labels (strings). Default:None
.max_cpu_load (
float
orint
, optional) – Max CPU Load (percent). Default:80.0
.max_memory_used (
float
orint
, optional) – Max Memory Used (percent). Default:100.0
.max_pipelines_running (
int
, optional) – Max Running Pipeline Count. Default:1000000
.deployment_tags (
list
, optional) – List of tags (strings). Default:None
.engine_build (
str
, optional) – Build of engine to deploy. Default:None
.scala_binary_version (
str
, optional) – Scala binary version required in case of engine_type=’TF’ Default:None
.
- Returns
An instance of subclass of
streamsets.sdk.sch_models.Deployment
.
- class streamsets.sdk.sch_models.DeploymentEngineAdvancedConfiguration(deployment_engine_advanced_config, deployment=None)[source]#
Model for advanced configuration in Deployment engine configuration.
- data_collector_configuration#
The Data Collector Configuration.
- Type
dict
- transformer_configuration#
The Transformer Configuration.
- Type
dict
- credential_stores#
- Type
dict
- proxy_properties#
Proxy Properties for the Deployment engine configuration.
- Type
dict
- security_policy#
Security Policy for the Deployment engine configuration.
- Type
dict
- class streamsets.sdk.sch_models.DeploymentEngineConfiguration(*args, **kwargs)[source]#
Model for Deployment engine configuration.
- advanced_configuration#
- Type
str
- aws_image_id#
- Type
str
- azure_image_id#
- Type
str
- created_by#
User that created this deployment.
- Type
str
- created_on#
Time at which this deployment was created.
- Type
int
- download_url#
- Type
list
- engine_type#
engine type.
- Type
list
- engine_version#
Engine version to deploy.
- Type
list
- id#
User that created this deployment.
- Type
str
- gcp_image_id#
- Type
str
- last_modified_by#
User that last modified this deployment.
- Type
str
- last_modified_on#
Time at which this deployment was last modified.
- Type
int
- scala_binary_version#
scala Binary Version.
- Type
str
- java_configuration#
Java Configuration instance.
- stage_libs#
DeploymentStageLibraries instance.
- class streamsets.sdk.sch_models.DeploymentEngineConfigurations(control_hub)[source]#
Collection of
streamsets.sdk.sch_models.DeploymentEngineConfiguration
instances.- Parameters
control_hub – An instance of
streamsets.sdk.sch.ControlHub
.
- class streamsets.sdk.sch_models.DeploymentEngineJavaConfiguration(deployment_engine_java_configuration, deployment=None)[source]#
Model for Deployment engine Java configuration.
- id#
- Type
str
- java_options#
- Type
str
- java_memory_strategy (
object:ABSOLUTE/PERCENTAGE)
- maximum_java_heap_size_in_mb#
- Type
long
- minimum_java_heap_size_in_mb#
- Type
long
- maximum_java_heap_size_in_percent#
- Type
int
- minimum_java_heap_size_in_percent#
- Type
int
- class streamsets.sdk.sch_models.DeploymentStageLibraries(values, deployment_config)[source]#
Wrapper class for the list of stage libraries in a deployment.
- Parameters
values (
list
) – A list of stage library names.deployment_config (
streamsets.sdk.sch_models.DeploymentEngineConfiguration
) – The deployment engine configuration object that these stage libraries pertain to.
- class streamsets.sdk.sch_models.EC2Deployment(deployment, control_hub=None, **kwargs)[source]#
Model for EC2 Deployment.
- ec2_instance_type#
Type EC2 engine instance.
- Type
str
- desired_instances#
No. of EC2 engine instances.
- Type
int
- instance_profile#
arn of instance profile created as an environment prerequisite.
- Type
str
- key_pair_name#
SSH key pair.
- Type
str
- aws_tags#
AWS tags to apply to all provisioned resources in this AWS deployment.
- Type
dict
- ssh_key_source#
SSH key source.
- Type
str
- tracking_url#
- Type
str
- class streamsets.sdk.sch_models.GCEDeployment(deployment, control_hub=None, **kwargs)[source]#
Model for GCE Deployment.
- block_project_ssh_keys#
Blocks the use of project-wide public SSH keys to access the provisioned VM instances.
- Type
bool
- desired_instances#
Number of engine instances to deploy.
- Type
int
- gcp_labels#
GCP labels to apply to all provisioned resources in your GCP project.
- Type
dict
- instance_service_account#
Instance service account.
- Type
str
- machine_type#
machine type to use for the provisioned VM instances.
- Type
str
- public_ssh_key#
full contents of public SSH key associated with each VM instance.
- Type
str
- region#
Region to provision VM instances in.
- Type
str
- tracking_url#
Tracking URL.
- Type
str
- zone#
GCP zone
- Type
list
- class streamsets.sdk.sch_models.KubernetesDeployment(deployment, control_hub=None, **kwargs)[source]#
Model for Kubernetes Deployment.
- kubernetes_labels#
Kubernetes labels to apply to all provisioned resources in your Kubernetes project.
- Type
dict
- property kubernetes_labels#
Kubernetes labels for the deployment.
- Returns
value pairs of labels.
- Return type
A (
dict
) of key
- class streamsets.sdk.sch_models.SelfManagedDeployment(deployment, control_hub=None, **kwargs)[source]#
Model for self managed Deployment.
- class InstallMechanism(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]#
- install_script(install_mechanism='DEFAULT', java_version=None)[source]#
Gets the installation script to be run for this deployment.
- Parameters
install_mechanism (
str
, optional) – Possible values for install are “DEFAULT”, “BACKGROUND” and “FOREGROUND”. Default:DEFAULT
.java_version (
str
, optional) – Java Development Kit Version to be used. Example: “8”. Default:None
.
- Returns
An
str
instance of the installation command.
DraftRun#
- class streamsets.sdk.sch_models.DraftRun(*args, **kwargs)[source]#
Model for DraftRun.
- commit_id#
Pipeline commit id.
- Type
str
- commit_label#
Pipeline commit label.
- Type
str
- created_by#
User that created this job.
- Type
str
- created_on#
Time at which this job was created.
- Type
int
- data_collector_labels#
Labels of the data collectors.
- Type
list
- delete_after_completion#
Flag that indicates if this job should be deleted after completion.
- Type
bool
- description#
Job description.
- Type
str
- enable_failover#
Flag that indicates if failover is enabled.
- Type
bool
- enable_time_series_analysis#
Flag that indicates if time series is enabled.
- Type
bool
- execution_mode#
True for Edge and False for SDC.
- Type
bool
- job_id#
Id of the job.
- Type
str
- job_name#
Name of the job.
- Type
str
- last_modified_by#
User that last modified this job.
- Type
str
- last_modified_on#
Time at which this job was last modified.
- Type
int
- number_of_instances#
Number of instances.
- Type
int
- pipeline_force_stop_timeout#
Timeout for Pipeline force stop.
- Type
int
- pipeline_id#
Id of the pipeline that is running the job.
- Type
str
- pipeline_name#
Name of the pipeline that is running the job.
- Type
str
- pipeline_rule_id#
Rule Id of the pipeline that is running the job.
- Type
str
- read_policy#
Read Policy of the job.
- runtime_parameters#
Run-time parameters of the job.
- Type
str
- static_parameters#
List of parameters wjat cannot be overriden.
- Type
list
- statistics_refresh_interval_in_millisecs#
Refresh interval for statistics in milliseconds.
- Type
int
- status#
Status of the job.
- Type
string
- snapshots#
Snapshots belonging to the draft run.
- capture_snapshot()[source]#
Generate a new snapshot.
- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- get_logs(ending_offset=- 1)[source]#
Retrieve the logs from the engine.
- Parameters
ending_offset (
int
, optional) – The offset to capture logs up until. Default:-1
- Returns
A
list
ofdict
instances, one dictionary per log line.
- remove_snapshot(snapshot)[source]#
Remove a snapshot.
- Parameters
snapshot (
streamsets.sdk.sch_models.Snapshot
) – Snapshot object.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- property snapshots#
Snapshots belonging to the draft run.
- Returns
A
streamsets.sdk.utils.SeekableList
ofstreamsets.sdk.sch_models.Snapshot
instances.
DraftRuns#
- class streamsets.sdk.sch_models.DraftRuns(control_hub)[source]#
Collection of
streamsets.sdk.sch_models.DraftRun
instances.- Parameters
control_hub (
streamsets.sdk.sch.ControlHub
) – ControlHub object.
Engines#
- class streamsets.sdk.sch_models.Engine(*args, **kwargs)[source]#
Model for Platform Engines.
- acl#
The engine ACL instance.
- configuration#
The engine Configuration instance.
- id#
ID of the engine.
- Type
str
- organization#
Organization that the engine belongs to.
- Type
str
- engine_url#
URL registered for the engine.
- Type
str
- version#
Version of the engine.
- Type
str
- labels#
Labels for the engine.
- Type
list
ofstr
instances
- last_reported_time#
The last time the engine contacted StreamSets Platform.
- Type
str
- start_up_time#
The time the engine was started.
- Type
str
- edge#
Whether this is an Edge engine or not.
- Type
bool
- cpu_load#
The percent utilization of the CPU by the engine.
- Type
float
- memory_used#
The amount of memory used by the engine in MB.
- Type
float
- total_memory#
The total amount of memory configured for the engine in MB.
- Type
float
- running_pipelines#
The total number of pipelines running on the engine.
- Type
int
- responding#
Whether the engine is responding or not.
- Type
bool
- engine_type#
The type of engine.
- Type
str
- max_cpu_load#
The percentage limit on CPU load configured for this engine.
- Type
int
- max_memory_used#
The percentage limit on memory configured for this engine.
- Type
int
- max_pipelines_running#
The limit on the number of concurrent pipelines for this engine.
- Type
int
- deployment_id#
The ID of the deployment that the engine belongs to.
- Type
str
- directories#
The directories of the engine.
- Type
dict
- external_libraries#
The External Library of the engine.
- Type
streamsets.sdk.sch_models.ExternalLibrary
- memory_used_mb#
The memory used in MB by the engine.
- Type
float
- resources#
The External Resource of the engine.
- Type
streamsets.sdk.sch_models.ExternalResource
- responding#
Whether the engine is responding.
- Type
bool
- running_pipelines#
The running pipelines of the engine.
- Type
list
ofstreamsets.sdk.sch_models.Pipeline
instances
- running_pipelines_count#
The running pipelines count of the engine.
- Type
int
- total_memory_mb#
The total memory in MB of the engine.
- Type
int
- user_stage_libraries#
The user stage libraries of the engine.
- Type
dict
- add_external_libraries(stage_library, *libraries)[source]#
Add external libraries to the engine.
- Parameters
stage_library (
str
) – The stage library that includes the stage requiring the external library.*libraries – One or more
file
instances to add to an engine, in binary format.
- add_resources(*resources)[source]#
Add resource files to the engine.
- Parameters
*resources – One or more
file
instances to add to an engine, in binary format.
- delete_external_libraries(*libraries)[source]#
Delete external libraries from the engine.
- Parameters
*libraries – One or more
streamsets.sdk.sch_models.ExternalLibrary
instances to delete from the engine.
- delete_resources(*resources)[source]#
Delete resource files on the engine.
- Parameters
*resources – One or more
streamsets.sdk.sch_models.ExternalResource
instances to delete from the engine.
- property external_libraries#
Get the external libraries for the engine.
- Returns
A
list
ofstreamsets.sdk.sch_models.ExternalLibrary
instances.
- get_logs(ending_offset=- 1)[source]#
Retrieve the logs for the engine.
- Parameters
ending_offset (
int
, optional) – The offset to capture logs up until. Default:-1
- Returns
A
list
ofdict
instances, one dictionary per log line.
- get_thread_dump()[source]#
Generate a thread dump for the engine.
- Returns
A
list
ofdict
instances, one dictionary per thread.
- refresh()[source]#
Retrieve the latest state of the Engine from Platform, and update the in-memory representation.
- property resources#
Get the external resources for the engine.
- Returns
A
list
ofstreamsets.sdk.sch_models.ExternalResource
instances.
- class streamsets.sdk.sch_models.Engines(control_hub)[source]#
Collection of
streamsets.sdk.sch_models.Engine
instances.- Parameters
control_hub (
streamsets.sdk.sch.ControlHub
) – ControlHub object.
Environments#
- class streamsets.sdk.sch_models.AWSEnvironment(environment, control_hub=None, **kwargs)[source]#
Model for AWS Environment.
- access_key_id#
AWS access key id to access AWS account. Required when credential_type is AWS_STATIC_KEYS.
- Type
str
- aws_tags#
AWS tags to apply to all provisioned resources in this AWS deployment.
- Type
dict
- credentials#
Credentials of the environment.
- Type
dict
- credential_type#
Credential Type of the environment.
- Type
str
- default_instance_profile#
aarn of default instance profile created as an environment prerequisite.
- Type
dict
- region#
AWS region where VPC is located in case of AWS environment.
- Type
str
- role_arn#
Role AR of the cross-account role created as an environment prerequisite in user’s AWS account. Required when credential_type is AWS_CROSS_ACCOUNT_ROLE.
- Type
str
- secret_access_key#
AWS secret access key to access AWS account. Required when credential_type is AWS_STATIC_KEYS.
- Type
str
- subnet_ids#
Range of Subnet IDs in the VPC.
- Type
str
- security_group_id#
Security group ID.
- Type
str
- vpc_id#
Id of the Amazon VPC created as an environment prerequisite in user’s AWS account.
- Type
str
- class streamsets.sdk.sch_models.AzureEnvironment(environment, control_hub=None, **kwargs)[source]#
Model for Azure Environment.
- azure_tags#
Azure tags for this environment.
- Type
dict
- credential_type#
Credential type of the environment.
- Type
str
- client_id#
Client ID of the environment.
- Type
str
- client_secret#
Client Secret of the environment.
- Type
str
- credentials#
Credentials of the environment.
- Type
dict
- default_managed_identity#
default managed identity.
- Type
str
- default_resource_group#
default resource group.
- Type
str
- region#
Azure region where vnet is located.
- Type
str
- subnet_id#
Subnet ID in the vnet.
- Type
str
- subscription_id#
Subscription ID in the vnet.
- Type
str
- security_group_id#
Security group ID.
- Type
str
- tenet_id#
Tenet ID of the environment.
- Type
str
- vnet_id#
Id of the vnet.
- Type
str
- class streamsets.sdk.sch_models.Environment(environment, control_hub=None, **kwargs)[source]#
Model for Environment. This is an abstract class.
- acl#
Environment ACL.
- tags#
Environment tags.
- Type
streamsets.sdk.utils.SeekableList
ofstr
instances
- allow_nightly_engine_builds#
Whether or not the environment allows use of nightly engine build.
- Type
bool
- created_by#
User that created this environment.
- Type
str
- created_on#
Time at which this environment was created.
- Type
int
- environment_id#
Id of the environment.
- Type
str
- environment_name#
Name of the environment.
- Type
str
- environment_tags#
Raw environment tags of the environment.
- Type
list
- environment_type#
Type of the environment.
- Type
str
- last_modified_by#
User that last modified this environment.
- Type
str
- last_modified_on#
Time at which this environment was last modified.
- Type
int
- json_state#
State of the environment - which is stored in json field called state.
- Type
str
- state#
State of the environment stateDisplayLabel - shown on the UI as state.
- Type
str
- status#
Status of the environment.
- Type
str
- status_detail#
Status detail of the environment.
- Type
str
- class TYPES(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]#
- property acl#
Get environment ACL.
- Returns
An instance of
streamsets.sdk.sch_models.ACL
.
- property tags#
Get the tags.
- Returns
A
streamsets.sdk.utils.SeekableList
ofstr
instances.
- class streamsets.sdk.sch_models.EnvironmentBuilder(environment, control_hub)[source]#
Class with which to build instances of
streamsets.sdk.sch_models.Environment
.- Instead of instantiating this class directly, most users should use
streamsets.sdk.sch.ControlHub.get_environment_builder()
.
- Parameters
environment (
dict
) – Python object built from our Swagger CspEnvironmentJson definition.control_hub (
streamsets.sdk.sch.ControlHub
) – ControlHub object.
- build(environment_name, **kwargs)[source]#
Define the environment.
- Parameters
environment_name (
str
) – environment name.allow_nightly_builds (
bool
, optional) – Default:False
.allow_nightly_engine_builds (
bool
, optional) – Default:False
.environment_tags (
list
, optional) – List of tags (strings). Default:None
.
- Returns
An instance of subclass of
streamsets.sdk.sch_models.Environment
.
- class streamsets.sdk.sch_models.GCPEnvironment(environment, control_hub=None, **kwargs)[source]#
Model for GCP Environment.
- credentials#
Credentials of the environment.
- Type
dict
- account_key_json#
JSON file contents for the Service Account Key. Required when credential_type is GCP_SERVICE_ACCOUNT_KEY.
- Type
str
- project#
Project where VPC is located.
- Type
str
- gcp_labels#
GCP labels for this environment.
- Type
dict
- gcp_tags#
GCP tags to apply to all resources provisioned in GCP account.
- Type
str
- service_account_email#
Service account ID. Required when credential_type is GCP_SERVICE_ACCOUNT_IMPERSONATION.
- Type
str
- vpc_id#
Id of the GCP VPC created as an environment prerequisite in user’s GCP account.
- Type
str
- fetch_available_machine_types(zone_id)[source]#
Returns the available machine types for the given Environment and GCP Project and GCP Zone.
- fetch_available_regions()[source]#
Returns the available regions for the given Environment and GCP Project.
- fetch_available_service_accounts()[source]#
Returns the available service accounts for the given Environment and GCP Project.
- fetch_available_vpcs()[source]#
Returns the available Networks for the given Environment and GCP Project. Here it is called vpcs since on UI it is shown as VPC.
- fetch_available_zones(region_id)[source]#
Returns the available zones for the given environment and GCP project and GCP region.
- class streamsets.sdk.sch_models.KubernetesEnvironment(environment, control_hub=None, **kwargs)[source]#
Model for Kubernetes Environment.
- agent_event_logs (:obj:`list` of
py:class:`streamsets.sdk.sch_models.KubernetesAgentEvent instances): Agent logs for the environment.
- agent_java_options#
Java configuration options set for the Agent.
- Type
str
- agent_status#
Status of the Agent.
- Type
str
- agent_status_detail#
Details pertaining to the status of the Agent.
- Type
str
- agent_version#
Version of the Agent.
- Type
str
- allow_nightly_engine_builds#
Whether or not the environment allows use of nightly engine build. versions.
- Type
bool
- created_by#
ID of the user who created the environment.
- Type
str
- created_on#
Millisecond timestamp of when the environment was created.
- Type
int
- environment_id#
ID of the environment.
- Type
str
- environment_name#
Name of the environment.
- Type
str
- environment_tags#
Tags assigned to the environment.
- Type
list
ofstr
instances
- environment_type (
obj:`str): Type of the environment.
- kubernetes_labels#
Kubernetes labels to apply to all resources provisioned in the Kubernetes namespace.
- Type
dict
- kubernetes_namespace#
Name of the Kubernetes namespace to provision the resources in.
- Type
str
- last_modified_by#
ID of the user who last modified the environment.
- Type
str
- last_modified_on#
Millisecond timestamp of the last time the environment was modified.
- Type
int
- state#
The current state of the environment.
- Type
str
- property agent_event_logs#
Get the Agent event logs for this environment.
- Returns
An instance of
streamsets.sdk.sch_models.KubernetesAgentEvents
- property agent_java_options#
Get the Agent’s extra java options.
- Returns
A
str
instance of the agent’s extra java options.
- property agent_status#
Get the Agent’s status.
- Returns
A
str
instance of the agent’s status.
- property agent_status_detail#
Get the details of the Agent’s status
- Returns
A
str
instance of the agent’s status detail.
- apply_agent_yaml_command()[source]#
Gets the installation script to be run for this environment.
- Returns
An
str
instance of the installation command.
- property kubernetes_labels#
Kubernetes labels for the environment.
- Returns
value pairs of labels.
- Return type
A (
dict
) of key
Transformers#
- class streamsets.sdk.sch_models.Transformer(*args, **kwargs)[source]#
Model for Transformer.
- execution_mode#
- Type
str
- id#
Transformer id.
- Type
str
- labels#
Labels for Transformer.
- Type
list
- last_validated_on#
Last validated time for Transformer.
- Type
str
- reported_labels#
Reported labels for Transformer.
- Type
list
- url#
Transformer’s url.
- Type
str
- version#
Transformer’s version.
- Type
str
- accessible#
Whether the Transformer instance is accessible.
- Type
bool
- attributes#
Transformer’s attributes.
- Type
dict
- attributes_updated_on#
When the Transformer’s attributes were last updated on.
- Type
str
- authentication_token_generated_on#
When the Transformer authentication token was generated.
- Type
str
- registered_by#
Who registered the Transformer.
- Type
str
- acl#
Transformer ACL.
- property accessible#
Returns a
bool
for whether the Transformer instance is accessible.
- property acl#
Get Transformer ACL.
- Returns
An instance of
streamsets.sdk.sch_models.ACL
.
- property attributes#
Returns a
dict
of Transformer attributes.
- property attributes_updated_on#
Returns when the Transformer attributes was updated.
- property authentication_token_generated_on#
Returns when the Transformer authentication token was generated.
- property registered_by#
Returns who the Transformer was registered by.
Groups#
- class streamsets.sdk.sch_models.Group(*args, **kwargs)[source]#
Model for Group.
- Parameters
group (
dict
) – A Python object representation of Group.roles (
dict
) – A mapping of role IDs to role labels.control_hub (
streamsets.sdk.ControlHub
, optional) – An instance of Control Hub. Default:None
- created_by#
Creator of this group.
- Type
str
- created_on#
Creation time of this group.
- Type
str
- display_name#
Display name of this group.
- Type
str
- group_id#
ID of this group.
- Type
str
- last_modified_by#
Group last modified by.
- Type
str
- last_modified_on#
Group last modification time.
- Type
str
- organization#
Organization this group belongs to.
- Type
str
- roles#
A set of role labels.
- Type
set
- users#
Users that are a member of this group.
- Type
streamsets.sdk.util.SeekableList
ofstreamsets.sdk.sch_models.User
instances
- property roles#
Get the roles for the group.
- property users#
Get the users that are members of the group.
- class streamsets.sdk.sch_models.Groups(control_hub, roles, organization)[source]#
Collection of
streamsets.sdk.sch_models.Group
instances.- Parameters
control_hub – An instance of
streamsets.sdk.sch.ControlHub
.roles (
dict
) – A mapping of role IDs to role labels.organization (
str
) – Organization ID.
- class streamsets.sdk.sch_models.GroupBuilder(group, roles, control_hub=None)[source]#
Class with which to build instances of
streamsets.sdk.sch_models.Group
.- Instead of instantiating this class directly, most users should use
streamsets.sdk.sch.ControlHub.get_group_builder()
.
- Parameters
group (
dict
) – Python object built from our Swagger GroupJson definition.roles (
dict
) – A mapping of role IDs to role labels.control_hub (
streamsets.sdk.sch.ControlHub
, optional) – ControlHub object. Default:None
- build(display_name, group_id=None, ldap_groups=None)[source]#
Build the group.
- Parameters
display_name (
str
) – Group display name.group_id (
str
, optional) – Group ID. Can only include letters, numbers, underscores and periods. Default:None
.ldap_groups (
list
, optional) – List of LDAP groups (strings).
- Returns
An instance of
streamsets.sdk.sch_models.Group
.
Jobs#
- class streamsets.sdk.sch_models.Job(*args, **kwargs)[source]#
Model for Job.
- archived#
Flag that indicates if this job is archived.
- Type
bool
- commit_id#
Pipeline commit id.
- Type
str
- commit_label#
Pipeline commit label.
- Type
str
- created_by#
User that created this job.
- Type
str
- created_on#
Time at which this job was created.
- Type
int
- data_collector_labels#
Labels of the data collectors.
- Type
list
- delete_after_completion#
Flag that indicates if this job should be deleted after completion.
- Type
bool
- description#
Job description.
- Type
str
- destroyer#
Job destroyer.
- Type
str
- enable_failover#
Flag that indicates if failover is enabled.
- Type
bool
- enable_time_series_analysis#
Flag that indicates if time series is enabled.
- Type
bool
- execution_mode#
True for Edge and False for SDC.
- Type
bool
- job_deleted#
Flag that indicates if this job is deleted.
- Type
bool
- job_id#
Id of the job.
- Type
str
- job_name#
Name of the job.
- Type
str
- last_modified_by#
User that last modified this job.
- Type
str
- last_modified_on#
Time at which this job was last modified.
- Type
int
- number_of_instances#
Number of instances.
- Type
int
- pipeline_force_stop_timeout#
Timeout for Pipeline force stop.
- Type
int
- pipeline_id#
Id of the pipeline that is running the job.
- Type
str
- pipeline_name#
Name of the pipeline that is running the job.
- Type
str
- pipeline_rule_id#
Rule Id of the pipeline that is running the job.
- Type
str
- read_policy#
Read Policy of the job.
- runtime_parameters#
Run-time parameters of the job.
- Type
str
- static_parameters#
List of parameters that cannot be overriden.
- Type
list
- statistics_refresh_interval_in_millisecs#
Refresh interval for statistics in milliseconds.
- Type
int
- status#
Status of the job.
- Type
string
- history#
Status and run count history for the Job.
- template_run_history_list#
List of job template run history.
- Type
list
- write_policy#
Write Policy of the job.
- property acl#
Get job ACL.
- Returns
An instance of
streamsets.sdk.sch_models.ACL
.
- property commit#
Get pipeline commit of the job.
- Returns
An instance of
streamsets.sdk.sch_models.PipelineCommit
.
- property committed_offsets#
Get the committed offsets for a given job id.
- Returns
An instance of
streamsets.sdk.sch_models.JobCommittedOffset
.
- get_run_logs()[source]#
Retrieve the logs for the last run of the job.
- Returns
A
list
ofdict
instances, one dictionary per log line.
- get_snowflake_generated_queries()[source]#
Retrieve the Snowflake generated queries of the last run of the job.
- Returns
A
list
ofdict
instances, one dictionary per query.
- property history#
Status and run count history for the Job.
- property job_sequence#
Get the Job Sequence that this job is a part of.
- Returns
An instance of
streamsets.sdk.sch_models.JobSequence
orNone
.
- property latest_committed_offsets#
Get the latest committed offsets for a given job id.
- Returns
A (
dict
) object.
- property metrics#
The metrics from all runs of a Job.
- Returns
A
streamsets.sdk.utils.SeekableList
ofstreamsets.sdk.sch_models.JobMetrics
instances.
- property pipeline#
Get the pipeline object corresponding to this job.
- property status#
Current job status.
- property system_job#
Get the sytem Job for this job if exists.
- Returns
An instance of
streamsets.sdk.sch_models.Job
.
- property tag#
Get pipeline tag of the job.
- Returns
An instance of
streamsets.sdk.sch_models.PipelineTag
.
- property tags#
Get the job tags.
- Returns
A
streamsets.sdk.utils.SeekableList
of instances ofstreamsets.sdk.sch_models.Tag
.
- time_series_metrics(metric_type, time_filter_condition='LAST_5M', **kwargs)[source]#
Get historic time series metrics for the job.
- Parameters
metric_type (
str
) – metric type in {‘Record Count Time Series’, ‘Record Throughput Time Series’, ‘Batch Throughput Time Series’, ‘Stage Batch Processing Timer seconds’}.time_filter_condition (
str
, optional) – Default:'LAST_5M'
.
- Returns
An instance of
streamsets.sdk.sch_models.JobTimeSeriesMetrics
.
- class streamsets.sdk.sch_models.Jobs(control_hub)[source]#
Collection of
streamsets.sdk.sch_models.Job
instances.- Parameters
control_hub (
streamsets.sdk.sch.ControlHub
) – ControlHub object.
- class streamsets.sdk.sch_models.JobBuilder(job, control_hub)[source]#
Class with which to build instances of
streamsets.sdk.sch_models.Job
.- Instead of instantiating this class directly, most users should use
streamsets.sdk.sch.ControlHub.get_job_builder()
.
- Parameters
job (
dict
) – Python object built from our Swagger JobJson definition.
- build(job_name, pipeline, job_template=False, runtime_parameters=None, pipeline_commit=None, pipeline_tag=None, pipeline_commit_or_tag=None, tags=None)[source]#
Build the job.
- Parameters
job_name (
str
) – Name of the job.pipeline (
streamsets.sdk.sch_models.Pipeline
) – Pipeline object.job_template (
boolean
, optional) – Indicate if it is a Job Template. Default:False
.runtime_parameters (
dict
, optional) – Runtime Parameters for the Job or Job Template. Default:None
.pipeline_commit (
streamsets.sdk.sch_models.PipelineCommit
) – Default: ``None`, which resolves to the latest pipeline commit.pipeline_tag (
streamsets.sdk.sch_models.PipelineTag
) – Default: ``None`, which resolves to the latest pipeline tag.pipeline_commit_or_tag (
str
, optional) – Default:None
, which resolves to the latest pipeline commit.tags (
list
, optional) – Job tags. Default:None
.
- Returns
An instance of
streamsets.sdk.sch_models.Job
.
- class streamsets.sdk.sch_models.JobMetrics(*args, **kwargs)[source]#
Model for job metrics.
- error_count#
The number of error records generated by this run of the Job.
- Type
int
- error_records_per_sec#
The number of error records per second generated by this run of the Job.
- Type
float
- input_count#
The number of records ingested by this run of the Job.
- Type
int
- input_records_per_sec#
The number of records per second ingested by this run of the Job.
- Type
float
- output_count#
The number of records output by this run of the Job.
- Type
int
- output_records_per_sec#
The number of records output per second by the run of the Job.
- Type
float
- pipeline_version#
The version of the pipeline that was used in this Job run.
- Type
str
- run_count#
The count corresponding to this Job run.
- Type
int
- sdc_id#
The ID of the SDC instance on which this Job run was executed.
- Type
str
- stage_errors_count#
The number of stage error records generated by this run of the Job.
- Type
int
- stage_error_records_per_sec#
The number of stage error records generated per second by this run of the job.
- Type
float
- total_error_count#
The total number of both error records and stage errors generated by this run of the job.
- Type
int
- Parameters
metrics (
dict
) – Metrics counts in JSON format.
- class streamsets.sdk.sch_models.JobOffset(*args, **kwargs)[source]#
Model for offset.
- Parameters
offset (
dict
) – Offset in JSON format.
- class streamsets.sdk.sch_models.JobRunEvent(*args, **kwargs)[source]#
Model for an event in a Job Run.
- Parameters
event (
dict
) – Job Run Event in JSON format.
- class streamsets.sdk.sch_models.JobStatus(*args, **kwargs)[source]#
Model for Job Status.
- color#
Job status color.
- Type
str
- run_history#
(
streamsets.sdk.utils.JobRunHistoryEvent
): History of a particular job run.
- offsets#
(
streamsets.sdk.utils.JobPipelineOffset
): Offsets after the job run.
- Parameters
status (
dict
) – Job status in JSON format.control_hub (
streamsets.sdk.sch.ControlHub
) – ControlHub object.
- property color#
Get the color of the Job Status.
- property offsets#
Get the offsets of the Job Status.
- property run_history#
Get the run history of the Job Status.
- class streamsets.sdk.sch_models.JobTimeSeriesMetric(*args, **kwargs)[source]#
Model for job metrics.
- name#
Name of measurement.
- Type
str
- values#
Timeseries data.
- Type
list
- time_series#
Timeseries data with timestamp as key and metric value as value.
- Type
dict
- Parameters
metric (
dict
) – Metrics in JSON format.
- property time_series#
Get the time series (
dict
).
- class streamsets.sdk.sch_models.JobTimeSeriesMetrics(*args, **kwargs)[source]#
Model for job metrics.
- input_records#
Appears when queried for ‘Record Count Time Series’ or ‘Record Throughput Time Series’.
- output_records#
Appears when queried for ‘Record Count Time Series’ or ‘Record Throughput Time Series’.
- error_records#
Appears when queried for ‘Record Count Time Series’ or ‘Record Throughput Time Series’.
- batch_counter#
Appears when queried for ‘Batch Throughput Time Series’.
- batch_processing_timer#
Appears when queried for ‘Batch Processing Timer seconds’.
- Parameters
metrics (
dict
) – Metrics in JSON format.
- class streamsets.sdk.sch_models.RuntimeParameters(runtime_parameters, job)[source]#
Wrapper for ControlHub job runtime parameters.
- Parameters
runtime_parameters (
str
) – Runtime parameter.job (
streamsets.sdk.sch_models.Job
) – Job object.
Job Sequences#
- class streamsets.sdk.sch_models.JobSequence(*args, **kwargs)[source]#
Model for Job Sequence
- Parameters
job_sequence (
dict
) – The job sequence Swagger JSON.control_hub (
streamsets.sdk.sch.ControlHub
) – ControlHub object.
- id#
Job Sequence ID.
- Type
str
- name#
Job Sequence name.
- Type
str
- description#
Job Sequence description.
- Type
str
- organization#
Organization that the Job Sequence is a part of.
- Type
str
- created_by#
Who created the Job Sequence.
- Type
str
- create_time#
When the Job Sequence was created.
- Type
str
- last_modified_by#
Who the Job Sequence was last modified by.
- Type
str
- last_modified_time#
When the Job Sequence was last modified.
- Type
str
- start_time#
Start time of the Job Sequence.
- Type
str
- end_time#
End time of the Job Sequence.
- Type
str
- time_zone#
Job Sequence timezone.
- Type
str
- cron_tab_mask#
Cron tab mask of the Job Sequence.
- Type
str
- steps#
Job Sequence steps.
- Type
str
- status#
Job Sequence status.
- Type
str
- class JobSequenceStatusType(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]#
- class LogLevel(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]#
- class LogType(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]#
- add_step_with_jobs(jobs, parallel_jobs=False, ignore_error=True)[source]#
Generate one or more Steps by providing new Job objects to be added to the Job Sequence.
- Parameters
jobs (
list
ofstreamsets.sdk.sch_models.Job
) – Jobs to be added to the Job Sequence as part of the step.parallel_jobs (
bool
) – Whether the passed in Jobs will be a part of the same step or not. Default:False
.ignore_error (
bool
) – Whether to ignore Job errors or not. Default:True
.
- delete_history_logs(history_logs)[source]#
Delete history logs from the Job Sequence.
- Parameters
history_logs (
list
ofstreamsets.sdk.sch_models.JobSequenceHistoryLog
) – History Logs to delete.
- get_history_log(log_type=None, log_level=None, last_run_only=None, run_id=None, from_date=None, to_date=None)[source]#
Get the history log of the Job Sequence.
- Args:
- log_type (
str
, optional): Accepted values are “SEQUENCE_START”, “SCHEDULER_TRIGGER_ERROR”, “STEP_START”. Default: None.
log_level (
str
, optional): Accepted values are “INFO”, “WARN”, “ERROR”. Default: None. last_run_only (bool
, optional): Whether to return History Log for the last run only. Default: None. run_id (str
, optional): The desired Run ID to get the History Log for. Default: None. from_date (str
, optional): The starting date from which we’d like to see the History Log for. Default: None. to_date (str
, optional): The end date till which we’d like to see the History Log for. Default: None.- log_type (
- Returns
A
SeekableList
ofstreamsets.sdk.sch_models.JobSequenceHistoryLog
objects.
- property history_logs#
Get the history logs of the Job Sequence.
- Returns
An instance of
streamsets.sdk.sch_models.JobSequenceHistoryLogs
.
- mark_job_as_finished(job)[source]#
Send an event in Control Hub indicating that a job has finished.
- Parameters
job (
streamsets.sdk.sch_models.Job
) – Job to mark as finished.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- move_step(step, target_step_number, swap=False)[source]#
Move one Step to another index within the Job Sequence.
- Parameters
step (
streamsets.sdk.sch_models.Step
) – The step to be moved.target_step_number (
int
) – The step number to move the step to.swap (
bool
) – Whether to swap the step with the step in the given index, if False, this will moveDefault (the step to the index and shift the steps to the right.) –
False
.
- remove_step(step)[source]#
Remove a Step from a Job Sequence.
- Parameters
step – A
streamsets.sdk.sch_models.Step
instance.
- property run_ids#
Get all the run IDs of the Job Sequence.
- Returns
An
list
ofint
.
- property steps#
The Steps that are part of the Job Sequence.
- Returns
A (
streamsets.sdk.utils.SeekableList
) of (streamsets.sdk.sch_models.Step
) objects.
- class streamsets.sdk.sch_models.JobSequences(control_hub)[source]#
Collection of
streamsets.sdk.sch_models.JobSequence
instances.- Parameters
control_hub (
streamsets.sdk.sch.ControlHub
) – An instance of the ControlHub.
- class streamsets.sdk.sch_models.JobSequenceBuilder(job_sequence, control_hub=None)[source]#
Class with which to build instances of
streamsets.sdk.sch_models.JobSequence
.- Instead of instantiating this class directly, most users should use
streamsets.sdk.sch.ControlHub.get_job_sequence_builder()
.
- Parameters
job_sequence (
dict
) – Python object built from our Swagger PipelineJson definition.
- add_start_condition(start_time=0, end_time=0, time_zone='UTC', crontab_mask='0/1 * 1/1 * ? *')[source]#
Add a start condition.
- Parameters
start_time (
int
, optional) – Unix timestamp representing the start time for the Job Sequence. Default: 0.end_time (
int
, optional) – Unix timestamp representing the end time for the Job Sequence. Default: 0.time_zone (
str
, optional) – Time Zone of the Job Sequence. Default:UTC
.crontab_mask (
str
, optional) – Cron Tab Mask of the Job Sequence. Default: None.
- build(name='Job Sequence', description=None)[source]#
Build the Job Sequence.
- Parameters
name (
str
, optional) – Name of the Job Sequence. Default:Job Sequence
.description (
str
, optional) – End time for the Job Sequence. Default: None.
- Returns
A
streamsets.sdk.sch_models.JobSequence
instances.
- class streamsets.sdk.sch_models.JobSequenceHistoryLog(*args, **kwargs)[source]#
Wrapper class for representing JobSequence History Log.
- Parameters
history_log (
dict
) – JSON representation of a History Log.
- class streamsets.sdk.sch_models.Step(*args, **kwargs)[source]#
Model for Job Step.
- id#
Step ID.
- Type
str
- job_id#
Job ID.
- Type
str
- sequence_id#
Job Sequence ID.
- Type
str
- job_name#
Job Sequence name.
- Type
str
- organization#
Organization that the Job Sequence is a part of.
- Type
str
- status#
Job Sequence status.
- Type
str
- step_number#
Job Sequence status.
- Type
str
- add_jobs(jobs, ignore_error=True)[source]#
List of Jobs to add to the Step.
- Parameters
jobs (
list
ofstreamsets.sdk.sch_models.Job
) – Job to add to the step.ignore_error (
bool
) – Whether to ignore Job errors or not. Default:True
.
- remove_jobs(*jobs)[source]#
Removes Jobs from Step.
- Parameters
jobs – One or more
streamsets.sdk.sch_models.Job
instances.
- property step_jobs#
The Jobs that are part of the Step.
- Returns
A (
streamsets.sdk.utils.SeekableList
) of (streamsets.sdk.sch_models.Job
) objects.
Organizations#
- class streamsets.sdk.sch_models.Organization(*args, **kwargs)[source]#
Model for Organization.
- Parameters
organization (
str
) – Organization Id.organization_admin_user (
str
, optional) – Default:None
.api_client (
streamsets.sdk.sch_api.ApiClient
, optional) – Default:None
.
- default_user_password_expiry_time_in_days#
Default expiry time of the user password for the organization.
- Type
str
- configuration#
Configuration for the organization.
- admin_user_id#
Organization admin’s user ID.
- Type
str
- created_by#
Who the Organization was created by.
- Type
str
- saml_intergration_enabled#
Whether SAML integration is enabled.
- Type
bool
- property configuration#
Get the
streamsets.sdk.models.Configuration
instance for the organization.
- property default_user_password_expiry_time_in_days#
Get the default expiry time of the user password for the organization.
- class streamsets.sdk.sch_models.Organizations(control_hub)[source]#
Collection of
streamsets.sdk.sch_models.Organization
instances.
- class streamsets.sdk.sch_models.OrganizationBuilder(organization, organization_admin_user)[source]#
Class with which to build instances of
streamsets.sdk.sch_models.Organization
.- Instead of instantiating this class directly, most users should use
streamsets.sdk.sch.ControlHub.get_organization_builder()
.
- Parameters
organization (
dict
) – Python object built from our Swagger UserJson definition.
- build(id, name, admin_user_id, admin_user_display_name, admin_user_email_address, admin_user_ldap_user_name=None)[source]#
Build the organization.
- Parameters
id (
str
) – Organization ID.name (
str
) – Organization name.admin_user_id (
str
) – User Id of the admin of this organization.admin_user_display_name (
str
) – User display name of admin of this organization.admin_user_email_address (
str
) – User email address of admin of this organization.admin_user_ldap_user_name (
str
, optional) – LDAP username. Default:None
.
- Returns
An instance of
streamsets.sdk.sch_models.Organization
.
Pipelines#
- class streamsets.sdk.sch_models.Pipeline(*args, **kwargs)[source]#
Model for Pipeline.
- Parameters
builder (
streamsets.sdk.sch_models.PipelineBuilder
) – Pipeline Builder object.control_hub (
streamsets.sdk.sch.ControlHub
, optional) – ControlHub object. Default:None
.engine_id (
str
) – The ID of the authoring engine for this pipeline.library_definitions (
dict
, optional) – Library Definition in JSON format. Default:None
.pipeline (
dict
) – Pipeline in JSON format.pipeline_definition (
dict
) – Pipeline Definition in JSON format.rules_definition (
dict
) – Rules Definition in JSON format.
- library_definitions#
Pipeline’s Library Definitions.
- Type
dict
- pipeline_definition#
Pipeline’s definitions.
- Type
dict
- commits#
Pipeline’s commits.
- Type
list of
streamsets.sdk.sch_models.PipelineCommit
instances
- tags#
Pipeline’s tags.
- Type
list of
streamsets.sdk.sch_models.Tag
instances
- configuration#
Pipeline’s configuration.
- acl#
Pipeline’s ACL.
- stages#
Pipeline’s Stages.
- error_stage#
Pipeline’s Error Stage.
- stats_aggregator_stage#
Pipeline’s States Aggregator Stage.
- parameters#
Pipeline’s parameters.
- Type
str
- name#
Pipeline’s name.
- Type
str
- description#
Pipeline’s description.
- Type
str
- labels#
Pipeline’s labels.
- Type
str
- engine_id#
Pipeline’s SDC ID.
- Type
str
- property acl#
Get pipeline ACL.
- Returns
An instance of
streamsets.sdk.sch_models.ACL
.
- add_fragment(fragment, parameter_name_prefix=None)[source]#
Add a fragment to the pipeline.
- Parameters
fragment (py:obj:streamsets.sdk.sch_models.Pipeline) – Fragment to add.
parameter_name_prefix (
str
, optional) – Prefix name for the parameters of fragment. Default:None
.
- Returns
An instance of
streamsets.sdk.sdc_models.Stage
.
- add_stage(label=None, name=None, type=None, library=None)[source]#
Add a stage to the pipeline.
When specifying a stage, either
label
orname
must be used.type
andlibrary
may also be used to select a particular stage if ambiguities exist. Iftype
and/orlibrary
are omitted, the first stage definition matching the givenlabel
orname
will be used.- Parameters
label (
str
, optional) – Stage label to use when selecting stage from definitions. Default:None
.name (
str
, optional) – Stage name to use when selecting stage from definitions. Default:None
.type (
str
, optional) – Stage type to use when selecting stage from definitions (e.g. origin, destination, processor, executor). Default:None
.library (
str
, optional) – Stage library to use when selecting stage from definitions. Default:None
.
- Returns
- An instance of
streamsets.sdk.sch_models.SchSdcStage
- An instance of
- property commits#
Get commits for this pipeline.
- Returns
A
streamsets.sdk.utils.SeekableList
of instances ofstreamsets.sdk.sch_models.PipelineCommit
.
- property configuration#
Get pipeline’s configuration.
- Returns
An instance of
streamsets.sdk.sch_models.Configuration
.
- property description#
Get the description of the Pipeline.
- property error_stage#
Get the error stage of the Pipeline.
- get_jobs_using_pipeline()[source]#
Get the jobs that are running on this pipeline.
- Returns
An instance of
streamsets.sdk.utils.SeekableList
containing instances ofstreamsets.sdk.sch_models.Job
that run on this pipeline.
- property labels#
Get the pipeline labels.
- Returns
A
streamsets.sdk.utils.SeekableList
of instances ofstreamsets.sdk.sch_models.PipelineLabel
.
- property library_definitions#
Get the Pipeline’s (
dict
) Library Definitions.
- property name#
Get the name of the Pipeline.
- property parameters#
Get the pipeline parameters.
- Returns
A dict like,
streamsets.sdk.sch_models.PipelineParameters
object of parameter key-value pairs.
- property pipeline_definition#
Get the Pipeline’s (
dict
) Pipeline Definitions.
- remove_stages(*stages)[source]#
Remove one or more stages from a Pipeline.
- Parameters
stages (
streamsets.sdk.sdc_models.Stage
orstreamsets.sdk.st_models.Stage
) –objects (One or more stage) –
- property sdc_id#
Get the SDC ID of the Pipeline.
- property stages#
Get the stages of the Pipeline.
- property stats_aggregator_stage#
Get the stats aggregator stage of the Pipeline.
- property tags#
Get tags for this pipeline.
- Returns
A
streamsets.sdk.utils.SeekableList
of instances ofstreamsets.sdk.sch_models.PipelineTag
.
- class streamsets.sdk.sch_models.Pipelines(control_hub, organization)[source]#
Collection of
streamsets.sdk.sch_models.Pipeline
instances.- Parameters
control_hub – An instance of
streamsets.sdk.sch.ControlHub
.organization (
str
) – Organization Id.
- class streamsets.sdk.sch_models.PipelineBuilder(pipeline, data_collector_pipeline_builder, control_hub=None, fragment=False, engine_start_up_time=0)[source]#
Class with which to build instances of
streamsets.sdk.sch_models.Pipeline
.- Instead of instantiating this class directly, most users should use
streamsets.sdk.sch.ControlHub.get_pipeline_builder()
.
- Parameters
pipeline (
dict
) – Python object built from our Swagger PipelineJson definition.data_collector_pipeline_builder (
streamsets.sdk.sdc_models.PipelineBuilder
) – Data Collector Pipeline Builder object.control_hub (
streamsets.sdk.sch.ControlHub
) – Default:None
.fragment (
boolean
, optional) – Specify if a fragment builder. Default:False
.engine_start_up_time (
int
, optional) – Engine’s start up time. Default:0
.
- add_stage(label=None, name=None, type=None, library=None)[source]#
Add a stage to the pipeline.
When specifying a stage, either
label
orname
must be used.type
andlibrary
may also be used to select a particular stage if ambiguities exist. Iftype
and/orlibrary
are omitted, the first stage definition matching the givenlabel
orname
will be used.- Parameters
label (
str
, optional) – Stage label to use when selecting stage from definitions. Default:None
.name (
str
, optional) – Stage name to use when selecting stage from definitions. Default:None
.type (
str
, optional) – Stage type to use when selecting stage from definitions (e.g. origin, destination, processor, executor). Default:None
.library (
str
, optional) – Stage library to use when selecting stage from definitions. Default:None
.
- Returns
An instance of
streamsets.sdk.sch_models.SchSdcStage
.
- build(title='Pipeline', description='', labels=None, build_from_imported=False, **kwargs)[source]#
Build the pipeline.
- Parameters
title (
str
, optional) – Title of the pipeline.description (
str
, optional) – Description of the pipeline. Default: ````.labels (
list
, optional) – List of pipeline labels of typestr
. Default:None
.build_from_imported (
boolean
, optional) – Whether we want to build a pipeline from an imported pipeline. Default:False
- Returns
py:class`streamsets.sdk.sch_models.Pipeline`.
- Return type
An instance of
- import_pipeline(pipeline, commit_id_regeneration=True, regenerate_id=True, **kwargs)[source]#
Import a pipeline into the PipelineBuilder to use as a starting point based off of an existing Pipeline.
- Parameters
pipeline( – py:class`streamsets.sdk.sch_models.Pipeline`): Pipeline object.
commit_id_regeneration (
bool
, optional) – Whether to use the imported pipeline’s commit ID. When set to False, the imported pipeline will be edited as-is without a fresh pipeline being created. Default: Trueregenerate_id (
bool
, optional) – Whether to use the imported pipeline’s pipeline_id. When set to False, the imported pipeline’s pipeline_id will be used. Default: True
- Returns
An instance of
streamsets.sdk.sdc_models.PipelineBuilder
.
- class streamsets.sdk.sch_models.PipelineCommit(*args, **kwargs)[source]#
Model for pipeline commit.
- Parameters
pipeline_commit (
dict
) – Pipeline commit in JSON format.control_hub (
streamsets.sdk.sch.ControlHub
) – ControlHub object.
- pipeline#
The commit’s pipeline.
- property pipeline#
Get the commits pipeline.
- class streamsets.sdk.sch_models.PipelineLabel(*args, **kwargs)[source]#
Model for pipeline label.
- Parameters
pipeline_label (
dict
) – Pipeline label in JSON format.
- label#
The pipeline label.
- Type
str
- property label#
Get the pipeline label.
- class streamsets.sdk.sch_models.PipelineLabels(control_hub, organization)[source]#
Collection of
streamsets.sdk.sch_models.PipelineLabel
instances.- Parameters
control_hub – An instance of
streamsets.sdk.sch.ControlHub
.organization (
str
) – Organization Id.
- class streamsets.sdk.sch_models.PipelineParameters(pipeline)[source]#
Parameters for pipelines.
- Parameters
pipeline (
streamsets.sdk.sch_models.Pipeline
) – Pipeline Instance.
- class streamsets.sdk.sch_models.StPipelineBuilder(pipeline, transformer_pipeline_builder, control_hub=None, fragment=False, engine_start_up_time=0)[source]#
Class with which to build instances of
streamsets.sdk.sch_models.Pipeline
.- Instead of instantiating this class directly, most users should use
streamsets.sdk.sch.ControlHub.get_pipeline_builder()
.
- Parameters
pipeline (
dict
) – Python object built from our Swagger PipelineJson definition.transformer_pipeline_builder (
streamsets.sdk.sdc_models.PipelineBuilder
) – Transformer Pipeline Builder object.control_hub (
streamsets.sdk.sch.ControlHub
) – Default:None
.fragment (
boolean
, optional) – Specify if a fragment builder. Default:False
.engine_start_up_time (
int
, optional) – Engine’s start up time. Default:0
.
- add_stage(label=None, name=None, type=None, library=None)[source]#
Add a stage to the pipeline.
When specifying a stage, either
label
orname
must be used.type
andlibrary
may also be used to select a particular stage if ambiguities exist. Iftype
and/orlibrary
are omitted, the first stage definition matching the givenlabel
orname
will be used.- Parameters
label (
str
, optional) – Transformer stage label to use when selecting stage from definitions. Default:None
.name (
str
, optional) – Transformer stage name to use when selecting stage from definitions. Default:None
.type (
str
, optional) – Transformer stage type to use when selecting stage from definitions (e.g. origin, destination, processor, executor). Default:None
.library (
str
, optional) – Transformer stage library to use when selecting stage from definitions. Default:None
.
- Returns
An instance of
streamsets.sdk.sch_models.SchStStage
.
- build(title='Pipeline', description='', build_from_imported=False, **kwargs)[source]#
Build the pipeline.
- Parameters
- Returns
py:class`streamsets.sdk.sch_models.Pipeline`.
- Return type
An instance of
- import_pipeline(pipeline, commit_id_regeneration=True, regenerate_id=True, **kwargs)[source]#
Import a pipeline into the PipelineBuilder to use as a starting point based off of an existing Pipeline.
- Parameters
pipeline( – py:class`streamsets.sdk.sch_models.Pipeline`): Pipeline object.
commit_id_regeneration (
bool
, optional) – Whether to use the imported pipeline’s commit ID. When set to False, the imported pipeline will be edited as-is without a fresh pipeline being created. Default: Trueregenerate_id (
bool
, optional) – Whether to use the imported pipeline’s pipeline_id. When set to False, the imported pipeline’s pipeline_id will be used. Default: True
- Returns
An instance of
streamsets.sdk.sdc_models.PipelineBuilder
.
Protection Methods#
- class streamsets.sdk.sch_models.ProtectionMethod(stage)[source]#
Protection Method Model.
- Parameters
stage (
dict
) – JSON representation of a stage.
- class streamsets.sdk.sch_models.ProtectionMethodBuilder(pipeline_builder)[source]#
Class with which to build instances of
streamsets.sdk.sch_models.ProtectionMethod
.- Instead of instantiating this class directly, most users should use
streamsets.sdk.sch.ControlHub.get_protection_method_builder()
.
- Parameters
pipeline_builder (
streamsets.sdk.sch_models.PipelineBuilder
) – Pipeline Builder object.
Protection Policies#
- class streamsets.sdk.sch_models.ProtectionPolicy(*args, **kwargs)[source]#
Model for Protection Policy.
- Parameters
protection_policy (
dict
) – JSON representation of Protection Policy.procedures (
list
) – A list ofstreamsets.sdk.sch_models.PolicyProcedure
instances, Default:None
.
- procedures#
Procedures for the Protection Policy.
- Type
- default_setting#
Sefault Setting for the Protection Policy.
- Type
str
- enactment#
- Type
str
- sampling#
Sampling for the Protection Policy.
- Type
str
- class streamsets.sdk.sch_models.ProtectionPolicies(control_hub)[source]#
Collection of
streamsets.sdk.sch_models.ProtectionPolicy
instances.
- class streamsets.sdk.sch_models.ProtectionPolicyBuilder(control_hub, protection_policy, policy_procedure)[source]#
Class with which to build instances of
streamsets.sdk.sch_models.ProtectionPolicy
.- Instead of instantiating this class directly, most users should use
streamsets.sdk.sch.ControlHub.get_protection_policy_builder()
.
- Parameters
protection_policy (
dict
) – Python object defining a protection policy.policy_procedure (
dict
) – Python object defining a policy procedure.
ProvisioningAgents#
- class streamsets.sdk.sch_models.Deployment(deployment, control_hub=None, **kwargs)[source]#
Model for Deployment. This is an abstract class.
- acl#
The Deployment ACL instance.
- created_by#
User that created this deployment.
- Type
str
- created_on#
Time at which this deployment was created.
- Type
int
- deployment_events#
Name of the deployment.
- Type
DeploymentEvents
- deployment_id#
Id of the deployment.
- Type
str
- deployment_name#
Name of the deployment.
- Type
str
- deployment_tags#
Raw deployment tags of the deployment.
- Type
list
- deployment_type#
Type of the deployment.
- Type
str
- desired_instances#
The Deployment desired number of instances.
- Type
int
- engine_configuration#
The Deployment Engine Configuration.
- engine_type#
The Deployment Engine type.
- Type
str
- engine_version#
The Deployment Engine Version.
- Type
str
- environment_id#
Enabled environment where engine will be deployed.
- Type
list
- last_modified_by#
User that last modified this deployment.
- Type
str
- last_modified_on#
Time at which this deployment was last modified.
- Type
int
- network_tags#
The Deployment network tags.
- Type
list
- scala_binary_version#
scala Binary Version.
- Type
str
- organization#
The Deployment organization.
- Type
str
- json_state#
State of the deployment - this is json field called state.
- Type
str
- state#
State of the deployment - displayed as state on UI.
- Type
str
- status#
Status of the deployment.
- Type
str
- status_detail#
Status detail of the deployment.
- Type
str
- registered_engines#
Registered Engines of the deployment.
- Type
RegisteredEngines
- tags#
Tags of the deployment.
- Type
a
streamsets.sdk.utils.SeekableList
ofstreamsets.sdk.sch_models.Tag
instances
- class ENGINE_TYPES(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]#
- class TYPES(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]#
- property acl#
Get deployment ACL.
- Returns
An instance of
streamsets.sdk.sch_models.ACL
.
- property deployment_events#
Get the events for a deployment.
- Returns
An instance of
streamsets.sdk.sch_models.DeploymentEvents
- property engine_configuration#
The engine configuration of deployment engine. :returns: An instance of
streamsets.sdk.sch_models.DeploymentEngineConfiguration
.
- property registered_engines#
Get the registered engines.
- Returns
An instance of
streamsets.sdk.sch_models.RegisteredEngines
- property tags#
Get the tags.
- Returns
A
streamsets.sdk.utils.SeekableList
ofstr
instances.
- class streamsets.sdk.sch_models.Deployments(control_hub, organization)[source]#
Collection of
streamsets.sdk.sch_models.Deployment
instances.- Parameters
control_hub – An instance of
streamsets.sdk.sch.ControlHub
.organization (
str
) – Organization Id.
- class streamsets.sdk.sch_models.DeploymentBuilder(deployment, control_hub)[source]#
Class with which to build instances of
streamsets.sdk.sch_models.Deployment
.- Instead of instantiating this class directly, most users should use
streamsets.sdk.sch.ControlHub.get_deployment_builder()
.
- Parameters
deployment (
dict
) – Python object built from our Swagger CspDeploymentJson definition.control_hub (
streamsets.sdk.sch.ControlHub
) – ControlHub object.
- build(deployment_name, engine_type, engine_version, environment, external_resource_location=None, engine_labels=None, max_cpu_load=80.0, max_memory_used=100.0, max_pipelines_running=1000000, deployment_tags=None, engine_build=None, scala_binary_version=None, **kwargs)[source]#
Define the deployment.
- Parameters
deployment_name (
str
) – deployment name.engine_type (
str
) – Type of engine to deploy.engine_version (
str
) – Version of engine to deploy.environment (
streamsets.sdk.sch_models.Environment
) – The environment instance.external_resource_location (
str
, optional) – External Resource Location URL. Default:None
.engine_labels (
list
, optional) – List of labels (strings). Default:None
.max_cpu_load (
float
orint
, optional) – Max CPU Load (percent). Default:80.0
.max_memory_used (
float
orint
, optional) – Max Memory Used (percent). Default:100.0
.max_pipelines_running (
int
, optional) – Max Running Pipeline Count. Default:1000000
.deployment_tags (
list
, optional) – List of tags (strings). Default:None
.engine_build (
str
, optional) – Build of engine to deploy. Default:None
.scala_binary_version (
str
, optional) – Scala binary version required in case of engine_type=’TF’ Default:None
.
- Returns
An instance of subclass of
streamsets.sdk.sch_models.Deployment
.
- class streamsets.sdk.sch_models.ProvisioningAgent(*args, **kwargs)[source]#
Model for Provisioning Agent.
- Parameters
provisioning_agent (
dict
) – A Python object representation of Provisioning Agent.control_hub – An instance of
streamsets.sdk.sch.ControlHub
.
- deployments#
ProvisioningAgent deployments.
- Type
streamsets.sdk.sch_models.LegacyDeployments
- acl#
ProvisioningAgent ACL
- property acl#
Get the ACL of a Provisioning Agent.
- Returns
An instance of
streamsets.sdk.sch_models.ACL
.
- property deployments#
Get the deployments associated with the Provisioning Agent.
- Returns
A
list
ofstreamsets.sdk.sch_models.LegacyDeployment
instances.
- class streamsets.sdk.sch_models.ProvisioningAgents(control_hub, organization)[source]#
Collection of
streamsets.sdk.sch_models.ProvisioningAgent
instances.- Parameters
control_hub – An instance of
streamsets.sdk.sch.ControlHub
.organization (
str
) – Organization Id.
Roles#
- class streamsets.sdk.sch_models.Roles(values, entity, role_label_to_id)[source]#
Wrapper class over the list of Roles.
- Parameters
values (
list
) – List of roles.entity (
streamsets.sdk.sch_models.Group
) – (streamsets.sdk.sch_models.User
): Group or User object.role_label_to_id (
dict
) – Role label to Role ID mapping.
- append(value)[source]#
The method appends a role or an iterable of roles
- Parameters
value (
str
, iterable) – Role string or an iterable of roles- Returns
None
- Raises
TypeError – This exception is raised if the value argument is not a str or iterable of str.
- pop(value=- 1)[source]#
Remove and return item at index (default last).
Raises IndexError if list is empty or index is out of range.
- remove(value)[source]#
The method removes a role or an iterable of roles
- Parameters
value (
str
, iterable) – Role string or an iterable of roles- Returns
None
- Raises
TypeError – This exception is raised if the value argument is not a str or iterable of str.
- sort()[source]#
Sort the list in ascending order and return None.
The sort is in-place (i.e. the list itself is modified) and stable (i.e. the order of two equal elements is maintained).
If a key function is given, apply it once to each list item and sort them, ascending or descending, according to their function values.
The reverse flag can be set to sort in descending order.
SAQL Searches#
- class streamsets.sdk.sch_models.SAQLSearch(*args, **kwargs)[source]#
Model for SAQL Searches.
- Parameters
saql_search_json (
dict
) – The SAQL Search JSON.control_hub (
streamsets.sdk.sch.ControlHub
) – ControlHub object.
- id#
ID of the SAQL Search.
- Type
str
- orgId#
ID of the Organization.
- Type
str
- type#
Type of SAQL search.
- Type
str
- name#
Name of SAQL search.
- Type
str
- mode#
Mode of SAQL search.
- Type
str
- query#
SAQL search query.
- Type
str
- creator#
Creator of SAQL search.
- Type
str
- createTime#
Time of creation of SAQL search.
- Type
str
- lastModifiedBy#
Last author to modify SAQL search.
- Type
str
- lastModifiedOn#
Last Time of modification of SAQL search.
- Type
str
- favorite#
Represents whether an SAQL Search is marked as a favorite or not.
- Type
boolean
- Returns
An instance of
streamsets.sdk.sch_models.SAQLSearch
.
- class JobType(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]#
- class streamsets.sdk.sch_models.SAQLSearches(control_hub, saql_search_type)[source]#
Collection of
streamsets.sdk.sch_models.SAQLSearch
instances.- Parameters
control_hub (
streamsets.sdk.sch.ControlHub
) – ControlHub object.saql_search_type (
streamsets.sdk.sch_models.SAQLSearch.PipelineType
) –- (
streamsets.sdk.sch_models.SAQLSearch.JobType
) Enum of the SAQL search type, limited to:
'PIPELINE'
, ‘FRAGMENT’`,``’JOB_INSTANCE’, ``'JOB_TEMPLATE'
and'JOB_DRAFT_RUN'
.
- (
- class streamsets.sdk.sch_models.SAQLSearchBuilder(saql_search, control_hub, mode='BASIC')[source]#
Class with which to build instances of
streamsets.sdk.sch_models.SAQLSearch
.- Instead of instantiating this class directly, most users should use
streamsets.sdk.sch.ControlHub.get_saql_search_builder()
.
- Parameters
saql_search (
dict
) – Python object built from expected SAQL JSON structurecontrol_hub (
streamsets.sdk.sch.ControlHub
) – ControlHub object.mode (
str
, optional) – mode of SAQL Search. Default:'BASIC'
- add_filter(property_name='name', property_operator='contains', property_value='', property_condition_combiner='AND')[source]#
Adds a filter to the SAQL Search Query.
- Parameters
property_name (
str
, optional) – Default:'name'
property_operator (
str
, optional) – Default:'contains'
property_value (
str
, optional) – Default:''
property_condition_combiner (
str
, optional) – Default:'AND'
- build(name)[source]#
Builder for SAQL Search.
- Parameters
name (
str
) – Name of SAQL Search.- Returns
An instance of
streamsets.sdk.sch_models.SAQLSearch
.
Scheduler#
- class streamsets.sdk.sch_models.ScheduledTask(*args, **kwargs)[source]#
Model for Scheduled Task.
- Parameters
task (
dict
) – JSON representation of task.control_hub (
streamsets.sdk.sch.ControlHub
) – ControlHub instance.
- runs#
Scheduled Task Runs.
- audits (:py:obj:`streamsets.sdk.utils.SeekableList` of
streamsets.sdk.sch_models.ScheduledTaskAudit
): Scheduled Task Audits.
- acl#
ACL of a Scheduled Task.
- job_id#
Job ID for which this task is scheduled.
- Type
str
- property acl#
Get the ACL of a Scheduled Task.
- Returns
An instance of
streamsets.sdk.sch_models.ACL
.
- property audits#
Get Scheduled Task Audits.
- Returns
A
streamsets.sdk.utils.SeekableList
of inherited instances ofstreamsets.sdk.sch_models.ScheduledTaskAudit
.
- property job_id#
Job ID for which this task is scheduled.
- Returns
An instance of
str
.
- property runs#
Get Scheduled Task Runs.
- Returns
A
streamsets.sdk.utils.SeekableList
of inherited instances ofstreamsets.sdk.sch_models.ScheduledTaskRun
.
- class streamsets.sdk.sch_models.ScheduledTaskAudit(*args, **kwargs)[source]#
Scheduled Task Audit.
- Parameters
run (
dict
) – JSON representation of scheduled task audit.
- class streamsets.sdk.sch_models.ScheduledTaskBaseModel(*args, **kwargs)[source]#
Base Model for Scheduled Task related classes.
- class streamsets.sdk.sch_models.ScheduledTaskBuilder(job_selection_types, control_hub)[source]#
Builder for Scheduled Task.
- Instead of instantiating this class directly, most users should use
streamsets.sdk.sch.ControlHub.get_scheduled_task_builder()
.
- Parameters
job_selection_types (
dict
) – JSON representation of job selection types.control_hub (
streamsets.sdk.sch.ControlHub
) – ControlHub instance.
- build(task_object, action='START', name=None, description=None, cron_expression='0 0 1/1 * ? *', time_zone='UTC', status='RUNNING', start_time=None, end_time=None, missed_execution_handling='IGNORE')[source]#
Builder for Scheduled Task.
- Parameters
task_object (
streamsets.sdk.sch_models.Job
) – (streamsets.sdk.sch_models.ReportDefinition
): Job or ReportDefinition object.action (
str
, optional) – One of the {‘START’, ‘STOP’, ‘UPGRADE’} actions. Default:START
.name (
str
, optional) – Name of the task. Default:None
.description (
str
, optional) – Description of the task. Default:None
.crontab_mask (
str
, optional) – Schedule in cron syntax. Default:"0 0 1/1 * ? *"
. (Daily at 12).time_zone (
str
, optional) – Time zone. Default:"UTC"
.status (
str
, optional) – One of the {‘RUNNING’, ‘PAUSED’} statuses. Default:RUNNING
.start_time (
str
, optional) – Start time of task. Default:None
.end_time (
str
, optional) – End time of task. Default:None
.missed_trigger_handling (
str
, optional) – One of {‘IGNORE’, ‘RUN ALL’, ‘RUN ONCE’}. Default:IGNORE
.
- Returns
An instance of
streamsets.sdk.sch_models.ScheduledTask
.
- class streamsets.sdk.sch_models.ScheduledTaskRun(*args, **kwargs)[source]#
Scheduled Task Run.
- Parameters
run (
dict
) – JSON representation if scheduled task run.
- class streamsets.sdk.sch_models.ScheduledTasks(control_hub)[source]#
Collection of
streamsets.sdk.sch_models.ScheduledTask
instances.
SeekableList#
- class streamsets.sdk.utils.SeekableList(iterable=(), /)[source]#
- get(**kwargs)[source]#
Retrieve the first instance that matches the supplied arguments.
- Parameters
**kwargs – Optional arguments to be passed to filter the results offline.
- Returns
The first instance from the group that matches the supplied arguments.
- get_all(**kwargs)[source]#
Retrieve all instances that match the supplied arguments.
- Parameters
**kwargs – Optional arguments to be passed to filter the results offline.
- Returns
A
streamsets.sdk.utils.SeekableList
of results that match the supplied arguments.
Snapshot#
- class streamsets.sdk.sch_models.Snapshot(*args, **kwargs)[source]#
Model for Snapshot.
- Parameters
snapshot (
dict
) – Python object representation of the snapshot.draft_run (
streamsets.sdk.sch.DraftRun
) – DraftRun object.
- batch_number#
The number of the batch that the snapshot captured.
- Type
int
- batches#
A list of
streamsets.sdk.sdc_models.Batch
instances.- Type
list
- id#
The snapshot’s ID.
- Type
str
- name#
The snapshot’s name.
- Type
str
- pipeline_id#
The ID of the pipeline that the snapshot belongs to.
- Type
str
- time_stamp#
The creation date of the snapshot as a unix timestamp.
- Type
int
Stage#
- class streamsets.sdk.sch_models.SchSdcStage(stage, pipeline=None, output_streams=None, supported_connection_types=None)[source]#
- add_output(*other_stages, event_lane=False, allow_empty_stages=False, output_lane_index=None)#
Connect output of this stage to another stage.
The __rshift__ operator (>>) has been overloaded to invoke this method.
- Parameters
other_stages (
streamsets.sdk.sdc_models.Stage
) – Stage object.event_lane (
boolean
, optional) – Default:False
.allow_empty_stages (
bool
, optional) – Disable the check for empty stages. Default:False
.output_lane_index (
int
, optional) – The index of the output lane to attach an output to. Default:None
.
- Returns
This stage as an instance of
streamsets.sdk.sdc_models.Stage
).
- connect_inputs(stages=None, event_lane=False, target_stage_output_lane_index=None)#
Connect inputs of this stage.
- Parameters
stages – A list of
streamsets.sdk.sdc_models.Stage
objects to connect as inputs. Default:None
.event_lane (
boolean
, optional) – Default:False
.target_stage_output_lane_index (
int
, optional) – Index of the target stage’s output lane to attach to. Default:None
.
- connect_outputs(stages=None, event_lane=False, output_lane_index=None)#
Connect outputs of this stage.
- Parameters
stages – A list of
streamsets.sdk.sdc_models.Stage
objects to connect as outputs. Default:None
.event_lane (
boolean
, optional) – Default:False
.output_lane_index (
int
, optional) – The index of THIS stage’s output lane to attach an output to. Default:None
.
- copy_inputs(stage, override=False)#
Copies inputs of the given stage.
- Parameters
stage – The
streamsets.sdk.sdc_models.Stage
object whose inputs we’d like to copy.override (
bool
, optional) – Whether to completely wipe and override the current stage’s input_lanes with the stage that has been passed.
- copy_outputs(stage)#
Copies outputs of the given stage.
- Parameters
stage – The
streamsets.sdk.sdc_models.Stage
object whose outputs we’d like to copy.
- property description#
The stage’s description.
- Type
str
- disconnect_input_lanes(stages=None, all_stages=False)#
Disconnect input lanes of this stage from the lanes of the stages that are provided.
- Example:
stage_one.disconnect_inputs(stages=[stage_two, stage_three]) Will disconnect the incoming input lanes from stage_two and stage_three to stage_one.
- Parameters
stages – A list of
streamsets.sdk.sdc_models.Stage
objects to disconnect. Default:None
.all_stages (
bool
, optional) – Disconnect all incoming input lanes to this stage. Default:False
.
- disconnect_output_lanes(stages=None, all_stages=False, output_lane_index=None)#
Disconnect output lanes of this stage from the lanes of the stages that are provided.
- Example:
stage_one.disconnect_outputs(stages=[stage_two, stage_three]) Will disconnect the outgoing output lanes FROM stage_one to stage_two and stage_three.
- Parameters
stages – A list of
streamsets.sdk.sdc_models.Stage
objects to disconnect. Default:None
.all_stages (
bool
, optional) – Disconnect all outgoing output lanes to this stage. Default:False
.output_lane_index (
int
, optional) – Index of this stage’s output lane to detach. Default:None
.
- property event_lanes#
Get the stage’s list of event lanes.
- Returns
A
list
of event lanes.
- property input_lanes#
Get the stage’s list of input lanes.
- Returns
A
list
of input lanes.
- property label#
The stage’s label.
- Type
str
- property library#
Get the stage’s library.
- Returns
The stage library as a
str
.
- property output_lanes#
Get the stage’s list of output lanes.
- Returns
A
list
of output lanes.
- set_attributes(**attributes)#
Set one or more stage attributes.
- Parameters
**attributes – Attributes to set.
- Returns
This stage as an instance of
streamsets.sdk.sdc_models.Stage
.
- property stage_on_record_error#
The stage’s on record error configuration value.
- property stage_record_preconditions#
The stage’s record preconditions configuration value.
- property stage_required_fields#
The stage’s required fields configuration value.
- use_connection(connection)#
Specify a Connection instance to use for this stage.
- Parameters
connection – One
streamsets.sdk.sch_models.Connection
instance. Only one Connection per stage is currently supported.
- class streamsets.sdk.sch_models.SchStStage(stage, pipeline=None, output_streams=None, supported_connection_types=None)[source]#
- add_output(*other_stages, event_lane=False, output_lane_index=None)#
Connect output of this stage to another stage.
The __rshift__ operator (>>) has been overloaded to invoke this method.
- Parameters
other_stages (
streamsets.sdk.st_models.Stage
) – Stage object.event_lane (
boolean
, optional) – Default:False
.output_lane_index (
int
, optional) – The index of the output lane to attach an output to. Default:None
.
- Returns
This stage as an instance of
streamsets.sdk.st_models.Stage
).
- connect_inputs(stages=None, event_lane=False, target_stage_output_lane_index=None)#
Connect inputs of this stage.
- Parameters
stages – A list of
streamsets.sdk.st_models.Stage
objects to connect as inputs. Default:None
.event_lane (
boolean
, optional) – Default:False
.target_stage_output_lane_index (
int
, optional) – Index of the target stage’s output lane to attach to. Default:None
.
- connect_outputs(stages=None, event_lane=False, output_lane_index=None)#
Connect outputs of this stage.
- Parameters
stages – A list of
streamsets.sdk.st_models.Stage
objects to connect as outputs. Default:None
.event_lane (
boolean
, optional) – Default:False
.output_lane_index (
int
, optional) – The index of THIS stage’s output lane to attach an output to. Default:None
.
- copy_inputs(stage, override=False)#
Copies inputs of the given stage.
- Parameters
stage – The
streamsets.sdk.sdc_models.Stage
object whose inputs we’d like to copy.override (
bool
, optional) – Whether to completely wipe and override the current stage’s input_lanes with the stage that has been passed.
- copy_outputs(stage)#
Copies outputs of the given stage.
- Parameters
stage – The
streamsets.sdk.sdc_models.Stage
object whose outputs we’d like to copy.
- disconnect_input_lanes(stages=None, all_stages=False)#
Disconnect input lanes of this stage from the lanes of the stages that are provided.
- Example:
stage_one.disconnect_inputs(stages=[stage_two, stage_three]) Will disconnect the incoming input lanes from stage_two and stage_three to stage_one.
- Parameters
stages – A list of
streamsets.sdk.st_models.Stage
objects to disconnect. Default:None
.all_stages (
bool
, optional) – Disconnect all incoming input lanes to this stage. Default:False
.
- disconnect_output_lanes(stages=None, all_stages=False, output_lane_index=None)#
Disconnect output lanes of this stage from the lanes of the stages that are provided.
- Example:
stage_one.disconnect_outputs(stages=[stage_two, stage_three]) Will disconnect the outgoing output lanes FROM stage_one to stage_two and stage_three. stage_one.disconnect_outputs(output_lane_index=0) Will disconnect the first output lane of this stage.
- Parameters
stages – A list of
streamsets.sdk.st_models.Stage
objects to disconnect. Default:None
.all_stages (
bool
, optional) – Disconnect all outgoing output lanes to this stage. Default:False
.output_lane_index (
int
, optional) – Index of this stage’s output lane to detach. Default:None
.
- property event_lanes#
Get the stage’s list of event lanes.
- Returns
A
list
of event lanes.
- property input_lanes#
Get the stage’s list of input lanes.
- Returns
A
list
of input lanes.
- property label#
The stage’s label.
- Type
str
- property library#
Get the stage’s library.
- Returns
The stage library as a
str
.
- property output_lanes#
Get the stage’s list of output lanes.
- Returns
A
list
of output lanes.
- set_attributes(**attributes)#
Set one or more stage attributes.
- Parameters
**attributes – Attributes to set.
- Returns
This stage as an instance of
streamsets.sdk.st_models.Stage
.
- property stage_on_record_error#
The stage’s on record error configuration value.
- property stage_record_preconditions#
The stage’s record preconditions configuration value.
- property stage_required_fields#
The stage’s required fields configuration value.
- use_connection(connection)#
Specify a Connection instance to use for this stage.
- Parameters
connection – One
streamsets.sdk.sch_models.Connection
instance. Only one Connection per stage is currently supported.
Subscriptions#
- class streamsets.sdk.sch_models.Subscription(*args, **kwargs)[source]#
Subscription.
- Parameters
subscription (
dict
) – JSON representation of Subscription.control_hub (
streamsets.sdk.sch.ControlHub
) – ControlHub instance.
- events ( A :py:obj:`streamsets.sdk.utils.SeekableList` of
streamsets.sdk.sch_models.SubscriptionEvent
instances): The Subscription’s events.
- action#
Action of a Subscription.
- acl#
ACL of a Subscription.
- property acl#
Get the ACL of an Event Subscription.
- Returns
An instance of
streamsets.sdk.sch_models.ACL
.
- property action#
Action of the Subscription.
- property events#
Events of the Subscription.
- class streamsets.sdk.sch_models.Subscriptions(control_hub)[source]#
Collection of
streamsets.sdk.sch_models.Subscription
instances.- Parameters
control_hub (
streamsets.sdk.sch.ControlHub
) – ControlHub object.
- class streamsets.sdk.sch_models.SubscriptionAction(*args, **kwargs)[source]#
Action to take when the Subscription is triggered.
- Parameters
action (
dict
) – JSON representation of an Action for a Subscription.
- class streamsets.sdk.sch_models.SubscriptionAudit(*args, **kwargs)[source]#
Model for subscription audit.
- Parameters
audit (
dict
) – JSON representation of a subscription audit.
- class streamsets.sdk.sch_models.SubscriptionBuilder(subscription, control_hub)[source]#
Builder for Subscription.
- Instead of instantiating this class directly, most users should use
streamsets.sdk.sch.ControlHub.get_subscription_builder()
.
- Parameters
subscription (
dict
) – JSON representation of event subscription.control_hub (
streamsets.sdk.sch.ControlHub
) – ControlHub instance.
- add_event(event_type, filter='')[source]#
Add event to the Subscription.
- Parameters
event_type (
str
) – Type of event in {‘Job Status Change’, ‘Data SLA Triggered’, ‘Pipeline Committed’, ‘Pipeline Status Change’, ‘Report Generated’, ‘Data Collector not Responding’}.filter (
str
, optional) – Filter to be applied on event. Default:""
.
- build(name, description=None)[source]#
Builder for Scheduled Task.
- Parameters
name (
str
) – Name of Subscription.description (
str
, optional) – Description of subscription. Default:None
.
- Returns
An instance of
streamsets.sdk.sch_models.Subscription
.
- import_subscription(subscription)[source]#
Import an existing Subscription into the builder to update it.
- Parameters
subscription (
streamsets.sdk.sch_models.Subscription
) – Subscription instance.
- remove_event(event_type)[source]#
Remove event from the subscription.
- Parameters
event_type (
str
) – Type of event in {‘Job Status Change’, ‘Data SLA Triggered’, ‘Pipeline Committed’, ‘Pipeline Status Change’, ‘Report Generated’, ‘Data Collector not Responding’}.- Returns
An instance of
streamsets.sdk.sch_models.SubscriptionEvent
.
- set_email_action(recipients, subject=None, body=None, external_smtp_url=None, external_smtp_port=None)[source]#
Set the Email action.
- Parameters
recipients (
list
) – List of email addresses.subject (
str
, optional) – Subject of the email. Default:None
.body (
str
, optional) – Body of the email. Default:None
.external_smtp_url (
str
, optional) – SMTP URL to route. Default:None
.external_smtp_port (
str
, optional) – SMTP Port to route. Default:None
.
- set_webhook_action(uri, method='GET', content_type=None, payload=None, auth_type=None, username=None, password=None, timeout=30000, headers=None, bearer_token=None, api_key_key=None, api_key_value=None, api_key_location=None, token_url=None, client_id=None, client_secret=None, scope=None, location=None, resources=None, audiences=None)[source]#
Set the Webhook action.
- Parameters
uri (
str
) – URI for the Webhook.method (
str
, optional) – HTTP method to use. Default:'GET'
.content_type (
str
, optional) – Content Type of the request. Default:None
.payload (
str
, optional) – Payload of the request. Default:None
.auth_type (
str
, optional) –'basic'
orNone
. Default:None
.username (
str
, optional) – Username for the authentication. Default:None
.password (
str
, optional) – Password for the authentication. Default:None
.timeout (
int
, optional) – Timeout for the Webhook action. Default:30000
.headers (
dict
, optional) – Headers to be sent to the Webhook call. Default:None
.bearer_token (
str
, optional) – Bearer token for the authentication. Default:None
.token_url – (
str
, optional): Token URL for the authentication. Default:None
.api_key_key (
str
, optional) – API key name for the authentication. Default:None
.api_key_value (
str
, optional) – API key value for the authentication. Default:None
.api_key_location (
str
, optional) – Where to send the api key values. Default:None
.audiences – (
list
, optional): Audiences for OAuth2 auth method. Default:None
.resources – (
list
, optional): Resources for OAuth2 auth method. Default:None
.location – (
str
, optional): Location of OAuth2 credentials (body or header). Default:None
.scope – (
str
, optional): Scope for OAuth2 auth method. Default:None
.client_secret – (
str
, optional): Client secret for OAuth2 auth method. Default:None
.client_id – (
str
, optional): Client id for OAuth2 auth method. Default:None
.
- class streamsets.sdk.sch_models.SubscriptionEvent(*args, **kwargs)[source]#
An Event of a Subscription.
- Parameters
event (
dict
) – JSON representation of Events of a Subscription.control_hub (
streamsets.sdk.sch.ControlHub
) – ControlHub instance.
Topologies#
- class streamsets.sdk.sch_models.DataSla(*args, **kwargs)[source]#
Model for DataSla.
- Parameters
data_sla (
dict
) – JSON representation of SLA.control_hub (
streamsets.sdk.sch.ControlHub
) – An instance of the Control Hub.
- class streamsets.sdk.sch_models.DataSlaBuilder(data_sla, control_hub)[source]#
Class with which to build instances of
streamsets.sdk.sch_models.DataSla
.- Instead of instantiating this class directly, most users should use
streamsets.sdk.sch.ControlHub.get_data_sla_builder()
.
- Parameters
data_sla (
dict
) – Python object built from our Swagger DataSlaJson definition.control_hub (
streamsets.sdk.sch.ControlHub
) – An instance of the Control Hub.
- build(topology, label, job, alert_text, qos_parameter='THROUGHPUT_RATE', function_type='Max', min_max_value=100, enabled=True)[source]#
Build the Data Sla.
- Parameters
topology (
streamsets.sdk.sch_models.Topology
) – Topology object.label (
str
) – Label for the SLA.job (
list
) – List ofstreamsets.sdk.sch_models.Job
objects.alert_text (
str
) – Alert text.qos_parameter (
str
, optional) – paramter in {‘THROUGHPUT_RATE’, ‘ERROR_RATE’}. Default:'THROUGHPUT_RATE'
.function_type (
str
, optional) – paramter in {‘Max’, ‘Min’}. Default:'Max'
.min_max_value (
str
, optional) – Default:100
.enabled (
boolean
, optional) – Default:True
.
- Returns
An instance of
streamsets.sdk.sch_models.DataSla
.
- class streamsets.sdk.sch_models.Topology(*args, **kwargs)[source]#
Model for Topology.
- Parameters
topology (
dict
) – JSON representation of Topology.
- acl#
Topology ACL.
- commit_id#
Pipeline commit id.
- Type
str
- commit_message#
Commit Message.
- Type
str
- commit_time#
Time at which commit was made.
- Type
int
- committed_by#
User that made the commit.
- Type
str
- data_slas#
Data SLAs currently part of the topology.
- default_topology#
Default Topology.
- Type
bool
- description#
Topology description.
- Type
str
- draft#
Indicates whether this topology is a draft.
- Type
bool
- jobs#
Jobs currently part of the topology.
- last_modified_by#
User that last modified this topology.
- Type
str
- last_modified_on#
Time at which this topology was last modified.
- Type
int
- new_pipeline_version_available#
Whether any job in the topology has a new pipeline version to be updated to.
- Type
bool
- nodes#
Nodes currently part of the topology.
- organization#
Id of the organization.
- Type
str
- parent_version#
Version of the parent topology.
- Type
str
- topology_definition#
Definition of the topology.
- Type
str
- topology_id#
Id of the topology.
- Type
str
- topology_name#
Name of the topology.
- Type
str
- validation_issues#
Any validation issues that exist for this Topology.
- Type
dict
- version#
Version of this topology.
- Type
str
- acknowledge_job_errors()[source]#
Acknowledge all errors for the jobs in a topology.
- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- property acl#
Get the ACL of a Topology.
- Returns
An instance of
streamsets.sdk.sch_models.ACL
.
- activate_data_sla(*data_slas)[source]#
Activate Data SLAs.
- Parameters
*data_slas – One or more instances of
streamsets.sdk.sch_models.DataSla
.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- add_data_sla(data_sla)[source]#
Add SLA.
- Parameters
data_sla (
streamsets.sdk.sch_models.DataSla
) – Data SLA object.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- auto_discover_connections()[source]#
Auto discover connecting systems between nodes in a Topology.
- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- auto_fix()[source]#
Auto-fix a topology by rectifying invalid or removed jobs, outdated jobs, etc.
- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- deactivate_data_sla(*data_slas)[source]#
Deactivate Data SLAs.
- Parameters
*data_slas – One or more instances of
streamsets.sdk.sch_models.DataSla
.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- delete_data_sla(*data_slas)[source]#
Delete Data SLAs.
- Parameters
*data_slas – One or more instances of
streamsets.sdk.sch_models.DataSla
.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- property jobs#
Get the jobs that are contained within the Topology.
- Returns
A
streamsets.sdk.utils.SeekableList
ofstreamsets.sdk.sch_models.Job
instances.
- property new_pipeline_version_available#
Determine if a new pipeline version is available for any jobs in the Topology.
- Returns
A (
bool
) value.
- property nodes#
Get the job and system nodes that make up the Topology.
- Returns
- start_all_jobs()[source]#
Start all jobs of a topology.
- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- stop_all_jobs(force=False)[source]#
Stop all jobs of a topology.
- Parameters
force (
bool
, optional) – Force topology jobs to stop. Default:False
.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- update_jobs_to_latest_change()[source]#
Upgrade a topology’s job(s) to the latest corresponding pipeline change.
- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- property validation_issues#
Get any validation issues that are detected for the Topology.
- Returns
A (
list
) of validation issues in JSON format.
- class streamsets.sdk.sch_models.Topologies(control_hub)[source]#
Collection of
streamsets.sdk.sch_models.Topology
instances. :param control_hub: An instance of the ControlHub. :type control_hub:streamsets.sdk.sch.ControlHub
- class streamsets.sdk.sch_models.TopologyBuilder(topology, control_hub=None)[source]#
Class with which to build instances of
streamsets.sdk.sch_models.Topology
.- Instead of instantiating this class directly, most users should use
streamsets.sdk.sch.ControlHub.get_topology_builder()
.
- topology_nodes#
(
streamsets.sdk.sch_models.TopologyNode
): Nodes currently part of the topology held by the TopologyBuilder.
- Parameters
topology (
dict
) – Python object built from our Swagger TopologyJson definition.control_hub (
streamsets.sdk.ControlHub
, optional) – Control Hub instance. Default:None
- add_job(job)[source]#
Add a job node to the Topology being built.
- Parameters
job (
streamsets.sdk.sch_models.Job
) – An instance of a job to be added.
- add_system(name)[source]#
Add a system node to the Topology being built.
- Parameters
name (
str
) – The name of the system to add to the topology.
- build(topology_name=None, description=None)[source]#
Build the topology.
- Parameters
topology_name (
str
, optional) – Name of the topology. This parameter is required when building a new topology.description (
str
, optional) – Description of the topology. Default:None
.
- Returns
An instance of
streamsets.sdk.sch_models.Topology
.
- delete_node(topology_node)[source]#
Delete a system or job node from the topology.
- Parameters
topology_node (
streamsets.sdk.sch_models.TopologyNode
) – An instance of a TopologyNode to delete from the topology.
- import_topology(topology)[source]#
Import an existing topology to be used in the builder.
- Parameters
topology (
streamsets.sdk.sch_models.Topology
) – An existing Topology instance to modify.
- property topology_nodes#
Get all of the nodes currently part of the topology held by the TopologyBuilder.
- Returns
- class streamsets.sdk.sch_models.TopologyNode(*args, **kwargs)[source]#
Model for a node within a Topology.
- Parameters
topology_node_json (
dict
) – JSON representation of a Topology Node.
- node_type#
The type of this node, i.e. SYSTEM, JOB, etc.
- Type
str
- instance_name#
The name of this node instance.
- Type
str
- stage_name#
The name of the stage in this node.
- Type
str
- stage_version#
The version of the stage in this node.
- Type
str
- job_id#
The ID of the job in this node.
- Type
str
- pipeline_id#
The pipeline ID associated with the job in this node.
- Type
str
- pipeline_commit_id#
The commit ID of the pipeline.
- Type
str
- pipeline_version#
The version of the pipeline.
- Type
str
- input_lanes#
A list of
str
representing the input lanes for this node.- Type
list
- output_lanes#
A list of
str
representing the output lanes for this node.- Type
list
- name#
Name of the Topology Node.
- Type
str
Users#
- class streamsets.sdk.sch_models.User(*args, **kwargs)[source]#
Model for User.
- Parameters
user (
dict
) – JSON representation of User.roles (
dict
) – A mapping of role IDs to role labels.control_hub (
streamsets.sdk.ControlHub
, optional) – An instance of Control Hub. Default:None
- active#
Whether the user is active or not.
- Type
bool
- created_by#
Creator of this user.
- Type
str
- created_on#
Creation time of this user.
- Type
str
- display_name#
Display name of this user.
- Type
str
- email_address#
Email address of this user.
- Type
str
- id#
Id of this user.
- Type
str
- groups#
Groups this user belongs to.
- Type
streamsets.sdk.util.SeekableList
ofstreamsets.sdk.sch_models.Group
instances
- last_modified_by#
User last modified by.
- Type
str
- last_modified_on#
User last modification time.
- Type
str
- organization#
Organization ID that the user is part of.
- Type
str
- password_expires_on#
User’s password expiration time.
- Type
str
- password_system_generated#
Whether User’s password is system generated or not.
- Type
bool
- roles#
A set of role labels.
- Type
set
- saml_user_name#
SAML username of user.
- Type
str
- status#
The status of the user.
- Type
str
- property groups#
Get the group memberships for the user.
- property roles#
Get the roles for the user.
- property status#
Get the status of the user. Values that can be returned are either ‘ACTIVE’ or ‘DEACTIVATED’.
- class streamsets.sdk.sch_models.Users(control_hub, roles, organization)[source]#
Collection of
streamsets.sdk.sch_models.User
instances.- Parameters
control_hub – An instance of
streamsets.sdk.sch.ControlHub
.roles (
dict
) – A mapping of role IDs to role labels.organization (
str
) – Organization ID.
- class streamsets.sdk.sch_models.UserBuilder(user, roles, control_hub=None)[source]#
Class with which to build instances of
streamsets.sdk.sch_models.User
.- Instead of instantiating this class directly, most users should use
streamsets.sdk.sch.ControlHub.get_user_builder()
.
- Parameters
user (
dict
) – Python object built from our Swagger UserJson definition.roles (
dict
) – A mapping of role IDs to role labels.control_hub – An instance of
streamsets.sdk.sch.ControlHub
. Default:None
.
- build(email_address)[source]#
Build the user.
- Parameters
email_address (
str
) – User Email Address.- Returns
An instance of
streamsets.sdk.sch_models.User
.
Common#
Models used by StreamSets Data Collector and StreamSets Control Hub:
- class streamsets.sdk.models.Configuration(configuration=None, compatibility_map=None, property_key='name', property_value='value', update_callable=None, update_callable_kwargs=None, id_to_remap=None)[source]#
Abstraction for configurations.
This class enables easy access to and modification of data stored as a list of dictionaries. A Configuration is stored in the form:
[{"name" : "<name_1>","value" : "<value_1>"}, {"name" : "<name_2>", value" : "<value_2>"},...]
However, the passed in configuration parameter can be a list of Configurations such as:
[[{"name" : "<name_1>","value" : "<value_1>"}, {"name" : "<name_2>", value" : "<value_2>"},...], [{"name" : "<name_3>","value" : "<value_3>"}, {"name" : "<name_4>", value" : "<value_4>"},...],...]
- Parameters
compatibility_map (
dict
, optional) – A dictionary mapping values used for backwards compatibility.configuration (
list
) – List of configurations (see above for format).property_key (
str
, optional) – The dictionary entry denoting the property key. Default:name
property_value (
str
, optional) – The dictionary entry denoting the property value. Default:value
update_callable (optional) – A callable to which
self._data
will be passed as part of__setitem__
.update_callable_kwargs (
dict
, optional) – A dictionary of kwargs to pass (along with a body) to the callable.id_to_remap (
dict
, optional) – A dictionary mapping configuration IDs to human-readable container keys. Example: {‘custom_region’:’googleCloudConfig.customRegion’, … }
- get(key, default=None)[source]#
Return the value of key or, if not in the configuration, the default value.
Exceptions#
Common exceptions.
- exception streamsets.sdk.exceptions.BadRequestError(response)[source]#
Bad request error (HTTP 400).
- exception streamsets.sdk.exceptions.ConnectionError(code, message)[source]#
Connection Catalog errors.
- exception streamsets.sdk.exceptions.EnginelessError(message)[source]#
Publishing an engineless designed pipeline error
- exception streamsets.sdk.exceptions.InvalidCredentialsError(message)[source]#
Invalid credentials error.
- exception streamsets.sdk.exceptions.InvalidVersionError(input)[source]#
The Version number could not be parsed.
- exception streamsets.sdk.exceptions.JobInactiveError(message)[source]#
Job status changed into INACTIVE_ERROR.
- exception streamsets.sdk.exceptions.LegacyDeploymentInactiveError(message)[source]#
Legacy deployment status changed into INACTIVE_ERROR.
- exception streamsets.sdk.exceptions.MultipleIssuesError(errors)[source]#
Multiple errors were returned.
- exception streamsets.sdk.exceptions.ServiceDefinitionNotFound(message)[source]#
ServiceDefinition errors.
- exception streamsets.sdk.exceptions.StatusError(response)[source]#
Parent class for pipeline status errors.
- exception streamsets.sdk.exceptions.UnprocessableEntityError(response)[source]#
Unprocessable Entity Error (HTTP 422).