StreamSets Control Hub
Section Contents
StreamSets Control Hub#
Main interface#
This is the main entry point used by users when interacting with SCH instances.
- class streamsets.sdk.ControlHub(credential_id, token, use_websocket_tunneling=True, **kwargs)[source]#
Class to interact with StreamSets Control Hub.
- Parameters
credential_id (
str
) – ControlHub credential ID.token (
str
) – ControlHub token.use_websocket_tunneling (
bool
, optional) – use websocket tunneling. Default:True
.
- acknowledge_event_subscription_error(subscription)[source]#
Acknowledge an error on given Event Subscription.
- Parameters
subscription (
streamsets.sdk.sch_models.Subscription
) – A Subscription instance.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- acknowledge_job_error(*jobs)[source]#
Acknowledge errors for one or more jobs.
- Parameters
*jobs – One or more instances of
streamsets.sdk.sch_models.Job
.
- acknowledge_legacy_deployment_error(*deployments)[source]#
Acknowledge errors for one or more deployments.
- Parameters
*deployments – One or more instances of
streamsets.sdk.sch_models.LegacyDeployment
.
- property action_audits#
Action Audits.
- Returns
An instance of
streamsets.sdk.sch_models.ActionAudits
.
- activate_api_credential(api_credential)[source]#
Activate an api credential.
- Parameters
api_credential (
streamsets.sdk.sch_models.ApiCredential
) – ApiCredential object.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- activate_datacollector(data_collector)[source]#
Activate data collector.
- Parameters
data_collector (
streamsets.sdk.sch_models.DataCollector
) – Data Collector object.
- activate_environment(*environments, timeout_sec=300)[source]#
Activate environments.
- Parameters
*environments – One or more instances of
streamsets.sdk.sch_models.Environment
.- Returns
An instance of
streamsets.sdk.sch_api.Command
or None.
- activate_provisioning_agent(provisioning_agent)[source]#
Activate provisioning agent.
- Parameters
provisioning_agent (
streamets.sdk.sch_models.ProvisioningAgent
) –- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- activate_user(*users)[source]#
Activate users for all given User IDs.
- Parameters
*users – One or more instance of
streamsets.sdk.sch_models.User
.- Returns
An instance of
streamsets.sdk.aster_api.Command
.
- add_api_credential(api_credential)[source]#
- Add an api credential. Some api credential attributes are updated by ControlHub such as
created_by.
- Parameters
api_credential (
streamsets.sdk.sch_models.ApiCredential
) – ApiCredential object.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- add_classification_rule(classification_rule, commit=False)[source]#
Add a classification rule.
- Parameters
classification_rule (
streamsets.sdk.sch_models.ClassificationRule
) – Classification Rule object.commit (
bool
, optional) – Whether to commit the rule after adding it. Default:False
.
- add_connection(connection)[source]#
Add a connection.
- Parameters
connection (
streamsets.sdk.sch_models.Connection
) – Connection object.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- add_deployment(deployment)[source]#
Add a deployment.
- Parameters
deployment (
streamsets.sdk.sch_models.Deployment
) – Deployment object.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- add_environment(environment)[source]#
Add an environment.
- Parameters
environment (
streamsets.sdk.sch_models.Environment
) – Environment object.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- add_group(group)[source]#
Add a group.
- Parameters
group (
streamsets.sdk.sch_models.Group
) – Group object.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- add_job(job)[source]#
Add a job.
- Parameters
job (
streamsets.sdk.sch_models.Job
) – Job object.- Returns
An instance of
streamsets.sdk.sch_models.Job
.
- add_legacy_deployment(deployment)[source]#
Add a legacy deployment.
- Parameters
deployment (
streamsets.sdk.sch_models.LegacyDeployment
) – Deployment object.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- add_organization(organization)[source]#
Add an organization.
- Parameters
organization (
streamsets.sdk.sch_models.Organization
) – Organization object.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- add_protection_policy(protection_policy)[source]#
Add a protection policy.
- Parameters
protection_policy (
streamsets.sdk.sch_models.ProtectionPolicy
) – Protection Policy object.
- add_report_definition(report_definition)[source]#
Add Report Definition to ControlHub.
- Parameters
report_definition (
streamsets.sdk.sch_models.ReportDefinition
) – Report Definition instance.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- add_subscription(subscription)[source]#
Add Subscription to ControlHub.
- Parameters
subscription (
streamsets.sdk.sch_models.Subscription
) – A Subscription instance.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- property alerts#
Alerts.
- Returns
A
streamsets.sdk.utils.SeekableList
ofstreamsets.sdk.sch_models.Alert
.
- property api_credentials#
Api Credentials.
- Returns
A
streamsets.sdk.utils.SeekableList
ofstreamsets.sdk.sch_models.ApiCredential
.
- assign_administrator_role(user)[source]#
Designate the specified user as an Organization Administrator.
- Parameters
user (
streamsets.sdk.sch_models.User
) – User object.- Returns
An instance of
streamsets.sdk.aster_api.Command
.
- balance_data_collectors(*data_collectors)[source]#
Balance all jobs running on given Data Collectors.
- Parameters
*sdcs – One or more instances of
streamsets.sdk.sch_models.DataCollector
.
- balance_job(*jobs)[source]#
Balance one or more jobs.
- Parameters
*jobs – One or more instances of
streamsets.sdk.sch_models.Job
.
- property connection_audits#
Connection Audits.
- Returns
An instance of
streamsets.sdk.sch_models.ConnectionAudits
.
- property connection_tags#
Connection Tags.
- Returns
An instance of
streamsets.sdk.sch_models.ConnectionTags
.
- property connections#
Connections.
- Returns
An instance of
streamsets.sdk.sch_models.Connections
.
- create_components(component_type, number_of_components=1, active=True)[source]#
Create components.
- Parameters
component_type (
str
) – Component type.number_of_components (
int
, optional) – Default:1
.active (
bool
, optional) – Default:True
.
- Returns
An instance of
streamsets.sdk.sch_api.CreateComponentsCommand
.
- property data_collectors#
Data Collectors registered to the ControlHub instance.
- Returns
Returns a
streamsets.sdk.utils.SeekableList
ofstreamsets.sdk.sch_models.DataCollector
instances.
- property data_protector_enabled#
Whether Data Protector is enabled for the current organization.
- Type
bool
- deactivate_api_credential(api_credential)[source]#
Deactivate an api credential.
- Parameters
api_credential (
streamsets.sdk.sch_models.ApiCredential
) – ApiCredential object.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- deactivate_datacollector(data_collector)[source]#
Deactivate data collector.
- Parameters
data_collector (
streamsets.sdk.sch_models.DataCollector
) – Data Collector object.
- deactivate_environment(*environments)[source]#
Deactivate environments.
- Parameters
*environments – One or more instances of
streamsets.sdk.sch_models.Environment
.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- deactivate_provisioning_agent(provisioning_agent)[source]#
Deactivate provisioning agent.
- Parameters
provisioning_agent (
streamets.sdk.sch_models.ProvisioningAgent
) –- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- deactivate_user(*users)[source]#
Deactivate Users for all given User IDs.
- Parameters
*users – One or more instances of
streamsets.sdk.sch_models.User
.- Returns
An instance of
streamsets.sdk.aster_api.Command
.
- delete_and_unregister_data_collector(data_collector)[source]#
Delete and Unregister data collector.
- Parameters
data_collector (
streamsets.sdk.sch_models.DataCollector
) – Data Collector object.
- delete_api_credentials(*api_credentials)[source]#
Delete api_credentials.
- Parameters
*api_credentials – One or more instances of
streamsets.sdk.sch_models.ApiCredential
.
- delete_connection(*connections)[source]#
Delete connections.
- Parameters
*connections – One or more instances of
streamsets.sdk.sch_models.Connection
.
- delete_data_collector(data_collector)[source]#
Delete data collector.
- Parameters
data_collector (
streamsets.sdk.sch_models.DataCollector
) – Data Collector object.
- delete_deployment(*deployments)[source]#
Delete deployments.
- Parameters
*deployments – One or more instances of
streamsets.sdk.sch_models.Deployment
.
- delete_environment(*environments)[source]#
Delete environments.
- Parameters
*environments – One or more instances of
streamsets.sdk.sch_models.Environment
.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- delete_group(*groups)[source]#
Delete groups.
- Parameters
*groups – One or more instances of
streamsets.sdk.sch_models.Group
.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- delete_job(*jobs)[source]#
Delete one or more jobs.
- Parameters
*jobs – One or more instances of
streamsets.sdk.sch_models.Job
.
- delete_legacy_deployment(*deployments)[source]#
Delete deployments.
- Parameters
*deployments – One or more instances of
streamsets.sdk.sch_models.LegacyDeployment
.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- delete_pipeline(pipeline, only_selected_version=False)[source]#
Delete a pipeline.
- Parameters
pipeline (
streamsets.sdk.sch_models.Pipeline
) – Pipeline object.only_selected_version (
boolean
) – Delete only current commit.
- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- delete_pipeline_labels(*pipeline_labels)[source]#
Delete pipeline labels.
- Parameters
*pipeline_labels – One or more instances of
streamsets.sdk.sch_models.PipelineLabel
.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- delete_provisioning_agent(provisioning_agent)[source]#
Delete provisioning agent.
- Parameters
provisioning_agent (
streamets.sdk.sch_models.ProvisioningAgent
) –- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- delete_provisioning_agent_token(provisioning_agent)[source]#
Delete provisioning agent token.
- Parameters
provisioning_agent (
streamets.sdk.sch_models.ProvisioningAgent
) –- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- delete_report_definition(report_definition)[source]#
Delete an existing Report Definition.
- Parameters
report_definition (
streamsets.sdk.sch_models.ReportDefinition
) – Report Definition instance.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- delete_snowflake_pipeline_defaults()[source]#
Delete the Snowflake pipeline defaults for this user.
- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- delete_snowflake_user_credentials()[source]#
Delete the Snowflake user credentials.
- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- delete_subscription(subscription)[source]#
Delete an exisiting Subscription.
- Parameters
subscription (
streamsets.sdk.sch_models.Subscription
) – A Subscription instance.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- delete_topology(topology, only_selected_version=False)[source]#
Delete a topology.
- Parameters
topology (
streamsets.sdk.sch_models.Topology
) – Topology object.only_selected_version (
boolean
) – Delete only current commit.
- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- delete_unregistered_auth_tokens(executor_type)[source]#
Delete auth tokens for engines that have been unregistered.
- Parameters
executor_type (
str
) – The executor type. Acceptable values are ‘DATACOLLECTOR’ or ‘TRANSFORMER’.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- delete_user(*users, deactivate=False)[source]#
Delete users. Deactivate users before deleting if configured.
- Parameters
*users – One or more instances of
streamsets.sdk.sch_models.User
.deactivate (
bool
, optional) – Default:False
.
- Returns
An instance of
streamsets.sdk.aster_api.Command
.
- property deployments#
Deployments.
- Returns
An instance of
streamsets.sdk.sch_models.Deployments
.
- duplicate_job(job, name=None, description=None, number_of_copies=1)[source]#
Duplicate an existing job.
- Parameters
job (
streamsets.sdk.sch_models.Job
) – Job object.name (
str
, optional) – Name of the new job(s). Default:None
. If not specified, name of the job with' copy'
appended to the end will be used.description (
str
, optional) – Description for new job(s). Default:None
.number_of_copies (
int
, optional) – Number of copies. Default:1
.
- Returns
A
streamsets.sdk.utils.SeekableList
ofstreamsets.sdk.sch_models.Job
.
- duplicate_pipeline(pipeline, name=None, description='New Pipeline', number_of_copies=1)[source]#
Duplicate an existing pipeline.
- Parameters
pipeline (
streamsets.sdk.sch_models.Pipeline
) – Pipeline object.name (
str
, optional) – Name of the new pipeline(s). Default:None
.description (
str
, optional) – Description for new pipeline(s). Default:'New Pipeline'
.number_of_copies (
int
, optional) – Number of copies. Default:1
.
- Returns
A
streamsets.sdk.utils.SeekableList
ofstreamsets.sdk.sch_models.Pipeline
.
- edit_job(job)[source]#
Edit a job.
- Parameters
job (
streamsets.sdk.sch_models.Job
) – Job object.- Returns
An instance of
streamsets.sdk.sch_models.Job
.
- property engine_configurations#
Deployment engine configurations.
- Returns
An instance of
streamsets.sdk.sch_models.DeploymentEngineConfiguration
.
- property environments#
environments.
- Returns
An instance of
streamsets.sdk.sch_models.Environments
.
- export_jobs(jobs)[source]#
Export jobs to a compressed archive.
- Parameters
jobs (
list
) – A list ofstreamsets.sdk.sch_models.Job
instances.- Returns
An instance of type
bytes
indicating the content of zip file with job json files.
- export_pipelines(pipelines, fragments=False, include_plain_text_credentials=False)[source]#
Export pipelines.
- Parameters
pipelines (
list
) – A list ofstreamsets.sdk.sch_models.Pipeline
instances.fragments (
bool
) – Indicates if exporting fragments is needed.include_plain_text_credentials (
bool
) – Indicates if plain text credentials should be included.
- Returns
An instance of type
bytes
indicating the content of zip file with pipeline json files.
- export_protection_policies(protection_policies)[source]#
Export protection policies to a compressed archive.
- Parameters
protection_policies (
list
) – A list ofstreamsets.sdk.sch_models.ProtectionPolicy
instances. –
- Returns
An instance of type
bytes
indicating the content of zip file with protection policy json files.
- export_topologies(topologies)[source]#
Export topologies.
- Parameters
topologies (
list
) – A list ofstreamsets.sdk.sch_models.Topology
instances.- Returns
An instance of type
bytes
indicating the content of zip file with pipeline json files.
- get_admin_tool(base_url, username, password)[source]#
Get ControlHub admin tool.
- Returns
An instance of
streamsets.sdk.sch_models.AdminTool
.
- get_api_credential_builder()[source]#
Get api credential Builder.
- Returns
An instance of
streamsets.sdk.sch_models.ApiCredentialBuilder
.
- get_classification_rule_builder()[source]#
Get a classification rule builder instance with which a classification rule can be created.
- Returns
An instance of
streamsets.sdk.sch_models.ClassificationRuleBuilder
.
- get_components(component_type_id, offset=None, len_=None, order_by='LAST_VALIDATED_ON', order='ASC')[source]#
Get components.
- Parameters
component_type_id (
str
) – Component type id.offset (
str
, optional) – Default:None
.len (
str
, optional) – Default:None
.order_by (
str
, optional) – Default:'LAST_VALIDATED_ON'
.order (
str
, optional) – Default:'ASC'
.
- get_connection_builder()[source]#
Get a connection builder instance with which a connection can be created.
- Returns
An instance of
streamsets.sdk.sch_models.ConnectionBuilder
.
- get_current_job_status(job)[source]#
Returns the current job status for given job id.
- Parameters
job (
streamsets.sdk.sch_models.Job
) – Job object.
- get_data_collector_labels(data_collector)[source]#
Returns all labels assigned to data collector.
- Parameters
data_collector (
streamsets.sdk.sch_models.DataCollector
) – Data Collector object.- Returns
A
list
of data collector assigned labels.
- get_data_sla_builder()[source]#
Get a Data SLA builder instance with which a Data SLA can be created.
- Returns
An instance of
streamsets.sdk.sch_models.DataSlaBuilder
.
- get_deployment_builder(deployment_type='SELF')[source]#
Get a deployment builder instance with which a deployment can be created.
- Parameters
( (deployment_type) – obj: str, optional) Valid values are ‘AWS’, ‘GCP’ and ‘SELF’. Default
'SELF'
- Returns
An instance of
streamsets.sdk.sch_models.DeploymentBuilder
.
- get_environment_builder(environment_type='SELF')[source]#
Get an environment builder instance with which an environment can be created.
- Parameters
( (environment_type) – obj: str, optional) Valid values are ‘AWS’, ‘GCP’ and ‘SELF’. Default
'SELF'
- Returns
An instance of
streamsets.sdk.sch_models.EnvironmentBuilder
.
- get_group_builder()[source]#
Get a group builder instance with which a group can be created.
- Returns
An instance of
streamsets.sdk.sch_models.GroupBuilder
.
- get_job_builder()[source]#
Get a job builder instance with which a job can be created.
- Returns
An instance of
streamsets.sdk.sch_models.JobBuilder
.
- get_legacy_deployment_builder()[source]#
Get a deployment builder instance with which a legacy deployment can be created.
- Returns
An instance of
streamsets.sdk.sch_models.LegacyDeploymentBuilder
.
- get_organization_builder()[source]#
Get an organization builder instance with which an organization can be created.
- Returns
An instance of
streamsets.sdk.sch_models.OrganizationBuilder
.
- get_pipeline_builder(engine_type, engine_id=None, fragment=False)[source]#
Get a pipeline builder instance with which a pipeline can be created.
- Parameters
engine_type (
str
) – The type of pipeline that will be created. The options are'data_collector'
,'snowflake'
or'transformer'
.engine_id (
str
) – The ID of the Data Collector or Transformer in which to author the pipeline if not using Transformer for Snowflake.fragment (
boolean
, optional) – Specify if a fragment builder. Default:False
.
- Returns
An instance of
streamsets.sdk.sch_models.PipelineBuilder
orstreamsets.sdk.sch_models.StPipelineBuilder
.
- get_protection_method_builder()[source]#
Get a protection method builder instance with which a protection method can be created.
- Returns
An instance of
streamsets.sdk.sch_models.ProtectionMethodBuilder
.
- get_protection_policy_builder()[source]#
Get a protection policy builder instance with which a protection policy can be created.
- Returns
An instance of
streamsets.sdk.sch_models.ProtectionPolicyBuilder
.
- get_scheduled_task_builder()[source]#
Get a scheduled task builder instance with which a scheduled task can be created.
- Returns
An instance of
streamsets.sdk.sch_models.ScheduledTaskBuilder
.
- get_self_managed_deployment_install_script(deployment, install_mechanism='DEFAULT')[source]#
Get install script for a Self Managed deployment.
- Parameters
deployment (
streamsets.sdk.sch_models.Deployment
) – deployment object.install_mechanism (
str
, optional) – Possible values for install are “DEFAULT”, “BACKGROUND” and “FOREGROUND”. Default:DEFAULT
- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- get_snowflake_pipeline_defaults()[source]#
Get the Snowflake pipeline defaults for this user (if it exists).
- Returns
A
dict
of Snowflake pipeline defaults.
- get_snowflake_user_credentials()[source]#
Get the Snowflake user credentials (if they exist). They will be redacted.
- Returns
A
dict
of Snowflake user credentials (redacted).
- get_subscription_builder()[source]#
Get Event Subscription Builder.
- Returns
An instance of
streamsets.sdk.sch_models.SubscriptionBuilder
.
- get_topology_builder()[source]#
Get a topology builder instance with which a topology can be created.
- Returns
An instance of
streamsets.sdk.sch_models.TopologyBuilder
.
- get_user_builder()[source]#
Get a user builder instance with which a user can be created.
- Returns
An instance of
streamsets.sdk.sch_models.UserBuilder
.
- property groups#
Groups.
- Returns
An instance of
streamsets.sdk.sch_models.Groups
.
- import_jobs(archive, pipeline=True, number_of_instances=False, labels=False, runtime_parameters=False, **kwargs)[source]#
Import jobs from archived zip directory.
- Parameters
archive (
file
) – file containing the jobs.pipeline (
boolean
, optional) – Indicate if pipeline should be imported. Default:True
.number_of_instances (
boolean
, optional) – Indicate if number of instances should be imported. Default:False
.labels (
boolean
, optional) – Indicate if labels should be imported. Default:False
.runtime_parameters (
boolean
, optional) – Indicate if runtime parameters should be imported. Default:False
.
- Returns
A
streamsets.sdk.utils.SeekableList
ofstreamsets.sdk.sch_models.Job
.
- import_pipeline(pipeline, commit_message, name=None, data_collector_instance=None)[source]#
Import pipeline from json file.
- Parameters
pipeline (
dict
) – A python dict representation of ControlHub Pipeline.commit_message (
str
) – Commit message.name (
str
, optional) – Name of the pipeline. If left out, pipeline name from JSON object will be used. DefaultNone
.data_collector_instance (
streamsets.sdk.sch_models.DataCollector
) – If excluded, system sdc will be used. DefaultNone
.
- Returns
An instance of
streamsets.sdk.sch_models.Pipeline
.
- import_pipelines_from_archive(archive, commit_message, fragments=False)[source]#
Import pipelines from archived zip directory.
- Parameters
archive (
file
) – file containing the pipelines.commit_message (
str
) – Commit message.fragments (
bool
, optional) – Indicates if pipeline contains fragments.
- Returns
A
streamsets.sdk.utils.SeekableList
ofstreamsets.sdk.sch_models.Pipeline
.
- import_protection_policies(policies_archive)[source]#
Import protection policies from a compressed archive.
- Parameters
policies_archive (
file
) – file containing the protection policies.- Returns
A py:class:streamsets.sdk.utils.SeekableList of
streamsets.sdk.sch_models.ProtectionPolicy
.
- import_topologies(archive, import_number_of_instances=False, import_labels=False, import_runtime_parameters=False, **kwargs)[source]#
Import topologies from archived zip directory.
- Parameters
archive (
file
) – file containing the topologies.import_number_of_instances (
boolean
, optional) – Indicate if number of instances should be imported. Default:False
.import_labels (
boolean
, optional) – Indicate if labels should be imported. Default:False
.import_runtime_parameters (
boolean
, optional) – Indicate if runtime parameters should be imported. Default:False
.
- Returns
A
streamsets.sdk.utils.SeekableList
ofstreamsets.sdk.sch_models.Topology
.
- invite_user(user)[source]#
Invite a user to the DataOps Platform.
- Parameters
user (
streamsets.sdk.sch_models.User
) – User object.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- property jobs#
Jobs.
- Returns
An instance of
streamsets.sdk.sch_models.Jobs
.
- property ldap_enabled#
Indication if LDAP is enabled or not.
- Returns
An instance of
boolean
.
- property legacy_deployments#
Deployments.
- Returns
An instance of
streamsets.sdk.sch_models.LegacyDeployments
.
- property login_audits#
Login Audits.
- Returns
An instance of
streamsets.sdk.sch_models.LoginAudits
.
- property metering_and_usage#
Metering and usage for the Organization. By default, this will return a report for the last 30 days of metering data. A report for a custom time window can be retrieved by indexing this object with a slice that contains a datetime object for the start (required) and stop (optional, defaults to datetime.now()).
- ex. metering_and_usage[datetime(2022, 1, 1):datetime(2022, 1, 14)]
metering_and_usage[datetime.now() - timedelta(7):]
- Returns
An instance of
streamsets.sdk.sch_models.MeteringUsage
- property organizations#
Organizations.
- Returns
An instance of
streamsets.sdk.sch_models.Organizations
.
- property pipeline_labels#
Pipeline labels.
- Returns
An instance of
streamsets.sdk.sch_models.PipelineLabels
.
- property pipelines#
Pipelines.
- Returns
An instance of
streamsets.sdk.sch_models.Pipelines
.
- preview_classification_rule(classification_rule, parameter_data, data_collector=None)[source]#
Dynamic preview of a classification rule.
- Parameters
classification_rule (
streamsets.sdk.sch_models.ClassificationRule
) – Classification Rule object.parameter_data (
dict
) – A python dict representation of raw JSON parameters required for preview.data_collector (
streamsets.sdk.sch_models.DataCollector
, optional) – The Data Collector in which to preview the pipeline. If omitted, ControlHub’s first executor SDC will be used. Default:None
.
- Returns
An instance of
streamsets.sdk.sdc_api.PreviewCommand
.
- property protection_policies#
Protection policies.
- Returns
An instance of
streamsets.sdk.sch_models.ProtectionPolicies
.
- property provisioning_agents#
Provisioning Agents registered to the ControlHub instance.
- Returns
An instance of
streamsets.sdk.sch_models.ProvisioningAgents
.
- publish_pipeline(pipeline, commit_message='New pipeline', draft=False)[source]#
Publish a pipeline.
- Parameters
pipeline (
streamsets.sdk.sch_models.Pipeline
) – Pipeline object.commit_message (
str
, optional) – Default:'New pipeline'
.draft (
boolean
, optional) – Default:False
.
- publish_scheduled_task(task)[source]#
Send the scheduled task to ControlHub.
- Parameters
task (
streamsets.sdk.sch_models.ScheduledTask
) – Scheduled task object.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- publish_topology(topology, commit_message=None)[source]#
Publish a topology.
- Parameters
topology (
streamsets.sdk.sch_models.Topology
) – Topology object to publish.commit_message (
str
, optional) – Commit message to supply with the Topology. Default:None
- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- regenerate_api_credential_auth_token(api_credential)[source]#
Regenerate the auth token for an api credential.
- Parameters
api_credential (
streamsets.sdk.sch_models.ApiCredential
) – ApiCredential object.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- remove_administrator_role(user)[source]#
Remove the Organization Administrator status from the specified user.
- Parameters
user (
streamsets.sdk.sch_models.User
) – User object.- Returns
An instance of
streamsets.sdk.aster_api.Command
.
- rename_api_credential(api_credential)[source]#
Rename an api credential.
- Parameters
api_credential (
streamsets.sdk.sch_models.ApiCredential
) – ApiCredential object.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- property report_definitions#
Report Definitions.
- Returns
An instance of
streamsets.sdk.sch_models.ReportDefinitions
.
- reset_origin(*jobs)[source]#
Reset all pipeline offsets for given jobs.
- Parameters
*jobs – One or more instances of
streamsets.sdk.sch_models.Job
.- Returns
A
streamsets.sdk.utils.SeekableList
ofstreamsets.sdk.sch_models.Job
.
- restart_engines(*engines)[source]#
Restart the given engine(s).
- Parameters
*engines – One or more instances of
streamsets.sdk.sch_models.DataCollector
orstreamsets.sdk.sch_models.Transformer
.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- run_pipeline_preview(pipeline, batches=1, batch_size=10, skip_targets=True, skip_lifecycle_events=True, timeout=120000, test_origin=False, read_policy=None, write_policy=None, executor=None, **kwargs)[source]#
Run pipeline preview.
- Parameters
pipeline (
streamsets.sdk.sch_models.Pipeline
) – Pipeline object.batches (
int
, optional) – Number of batches. Default:1
.batch_size (
int
, optional) – Batch size. Default:10
.skip_targets (
bool
, optional) – Skip targets. Default:True
.skip_lifecycle_events (
bool
, optional) – Skip life cycle events. Default:True
.timeout (
int
, optional) – Timeout. Default:120000
.test_origin (
bool
, optional) – Test origin. Default:False
.read_policy (
streamsets.sdk.sch_models.ProtectionPolicy
) – Read Policy for preview. If not provided, uses default read policy if one available. Default:None
.write_policy (
streamsets.sdk.sch_models.ProtectionPolicy
) – Write Policy for preview. If not provided, uses default write policy if one available. Default:None
.executor (
streamsets.sdk.sch_models.DataCollector
, optional) – The Data Collector in which to preview the pipeline. If omitted, ControlHub’s first executor SDC will be used. Default:None
.
- Returns
An instance of
streamsets.sdk.sdc_api.PreviewCommand
.
- run_snowflake_pipeline_preview(pipeline_id, rev=0, batches=1, batch_size=10, skip_targets=True, end_stage=None, timeout=2000, test_origin=False, stage_outputs_to_override_json=None, **kwargs)[source]#
Run Snowflake pipeline preview.
- Parameters
pipeline_id (
str
) – The pipeline instance’s ID.rev (
int
, optional) – Pipeline revision. Default:0
.batches (
int
, optional) – Number of batches. Default:1
.batch_size (
int
, optional) – Batch size. Default:10
.skip_targets (
bool
, optional) – Skip targets. Default:True
.end_stage (
str
, optional) – End stage. Default:None
.timeout (
int
, optional) – Timeout. Default:2000
.test_origin (
bool
, optional) – Test origin. Default:False
stage_outputs_to_override_json (
str
, optional) – Stage outputs to override. Default:None
.wait (
bool
, optional) – Wait for pipeline preview to finish. Default:True
.
- Returns
An instance of
streamsets.sdk.sdc_api.PreviewCommand
.
- scale_legacy_deployment(deployment, num_instances)[source]#
Scale up/down active deployment.
- Parameters
deployment (
streamsets.sdk.sch_models.LegacyDeployment
) – Deployment object.num_instances (
int
) – Number of sdc instances.
- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- property scheduled_tasks#
Scheduled Tasks.
- Returns
An instance of
streamsets.sdk.sch_models.ScheduledTasks
.
- set_default_read_protection_policy(protection_policy)[source]#
Set a default read protection policy.
- Parameters
protection_policy –
:param (
streamsets.sdk.sch_models.ProtectionPolicy
): Protection :param Policy object to be set as the default read policy.:- Returns
An updated instance of
streamsets.sdk.sch_models.ProtectionPolicy
.
- set_default_write_protection_policy(protection_policy)[source]#
Set a default write protection policy.
- Parameters
protection_policy –
:param (
streamsets.sdk.sch_models.ProtectionPolicy
): Protection :param Policy object to be set as the default write policy.:- Returns
An updated instance of
streamsets.sdk.sch_models.ProtectionPolicy
.
- shutdown_engines(*engines)[source]#
Call shutdown on the given engine(s).
- Parameters
*engines – One or more instances of
streamsets.sdk.sch_models.DataCollector
orstreamsets.sdk.sch_models.Transformer
.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- start_deployment(*deployments)[source]#
Start Deployments.
- Parameters
*deployments – One or more instances of
streamsets.sdk.sch_models.Deployment
.
- start_job(*jobs, wait=True, **kwargs)[source]#
Start one or more jobs.
- Parameters
*jobs – One or more instances of
streamsets.sdk.sch_models.Job
.wait (
bool
, optional) – Wait for pipelines to reach RUNNING status before returning. Default:True
.**kwargs – Arbitrary keyword arguments.
- Returns
An instance of
streamsets.sdk.sch_api.StartJobsCommand
.
- start_job_template(job_template, name=None, description=None, attach_to_template=True, delete_after_completion=False, instance_name_suffix='COUNTER', number_of_instances=1, parameter_name=None, raw_job_tags=None, runtime_parameters=None, wait_for_data_collectors=False)[source]#
Start Job instances from a Job Template.
- Parameters
job_template (
streamsets.sdk.sch_models.Job
) – A Job instance with the property job_template set toTrue
.name (
str
, optional) – Name of the new job(s). Default:None
. If not specified, name of the job template with' copy'
appended to the end will be used.description (
str
, optional) – Description for new job(s). Default:None
.attach_to_template (
bool
, optional) – Default:True
.delete_after_completion (
bool
, optional) – Default:False
.instance_name_suffix (
str
, optional) – Suffix to be used for Job names in {‘COUNTER’, ‘TIME_STAMP’, ‘PARAM_VALUE’}. Default:COUNTER
.number_of_instances (
int
, optional) – Number of instances to be started using given parameters. Default:1
.parameter_name (
str
, optional) – Specified when instance_name_suffix is ‘PARAM_VALUE’. Default:None
.raw_job_tags (
list
, optional) – Default:None
.runtime_parameters (
dict
) or (list
) – Runtime Parameters to be used in the jobs. If a dict is specified,number_of_instances
jobs will be started. If a list is specified,number_of_instances
is ignored and job instances will be started using the elements of the list as Runtime Parameters for each job. If left out, Runtime Parameters from Job Template will be used. Default:None
.wait_for_data_collectors (
bool
, optional) – Default:False
.
- Returns
A
streamsets.sdk.utils.SeekableList
ofstreamsets.sdk.sch_models.Job
instances.
- start_legacy_deployment(deployment, **kwargs)[source]#
Start Deployment.
- Parameters
deployment (
streamsets.sdk.sch_models.LegacyDeployment
) – Deployment instance.wait (
bool
, optional) – Wait for deployment to start. Default:True
.wait_for_statuses (
list
, optional) – Deployment statuses to wait on. Default:['ACTIVE']
.
- Returns
An instance of
streamsets.sdk.sch_api.DeploymentStartStopCommand
.
- stop_deployment(*deployments)[source]#
Stop Deployments.
- Parameters
*deployments – One or more instances of
streamsets.sdk.sch_models.Deployment
.
- stop_job(*jobs, force=False, timeout_sec=300)[source]#
Stop one or more jobs.
- Parameters
*jobs – One or more instances of
streamsets.sdk.sch_models.Job
.force (
bool
, optional) – Force job to stop. Default:False
.timeout_sec (
int
, optional) – Timeout in secs. Default:300
.
- stop_legacy_deployment(deployment, wait_for_statuses=['INACTIVE'])[source]#
Stop Deployment.
- Parameters
deployment (
streamsets.sdk.sch_models.LegacyDeployment
) – Deployment instance.wait_for_statuses (
list
, optional) – List of statuses to wait for. Default:['INACTIVE']
.
- Returns
An instance of
streamsets.sdk.sch_api.DeploymentStartStopCommand
.
- stop_test_pipeline_run(start_pipeline_command)[source]#
Stop the test run of pipeline.
- Parameters
start_pipeline_command (
streamsets.sdk.sdc_api.StartPipelineCommand
) –- Returns
An instance of
streamsets.sdk.sdc_api.StopPipelineCommand
.
- property subscription_audits#
Subscription audits.
- Returns
A
streamsets.sdk.utils.SeekableList
ofstreamsets.sdk.sch_models.SubscriptionAudit
.
- property subscriptions#
Event Subscriptions.
- Returns
An instance of
streamsets.sdk.sch_models.Subscriptions
.
- sync_job(*jobs)[source]#
Sync one or more jobs.
- Parameters
*jobs – One or more instances of
streamsets.sdk.sch_models.Job
.
- test_pipeline_run(pipeline, reset_origin=False, parameters=None)[source]#
Test run a pipeline.
- Parameters
pipeline (
streamsets.sdk.sch_models.Pipeline
) – Pipeline object.reset_origin (
boolean
, optional) – Default:False
.parameters (
dict
, optional) – Pipeline parameters. Default:None
.
- Returns
An instance of
streamsets.sdk.sdc_api.StartPipelineCommand
.
- property topologies#
Topologies.
- Returns
An instance of
streamsets.sdk.sch_models.Topologies
.
- property transformers#
Transformers registered to the ControlHub instance.
- Returns
Returns a
streamsets.sdk.utils.SeekableList
ofstreamsets.sdk.sch_models.Transformer
instances.
- update_connection(connection)[source]#
Update a connection.
- Parameters
connection (
streamsets.sdk.sch_models.Connection
) – Connection object.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- update_data_collector_labels(data_collector)[source]#
Update data collector labels.
- Parameters
data_collector (
streamsets.sdk.sch_models.DataCollector
) – Data Collector object.- Returns
An instance of
streamsets.sdk.sch_models.DataCollector
.
- update_data_collector_resource_thresholds(data_collector, max_cpu_load=None, max_memory_used=None, max_pipelines_running=None)[source]#
Updates data collector resource thresholds.
- Parameters
data_collector (
streamsets.sdk.sch_models.DataCollector
) – Data Collector object.max_cpu_load (
float
, optional) – Max CPU load in percentage. Default:None
.max_memory_used (
int
, optional) – Max memory used in MB. Default:None
.max_pipelines_running (
int
, optional) – Max pipelines running. Default:None
.
- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- update_deployment(deployment)[source]#
Update a deployment.
- Parameters
deployment (
streamsets.sdk.sch_models.Deployment
) – deployment object.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- update_environment(environment, timeout_sec=300)[source]#
Update an environment.
- Parameters
environment (
streamsets.sdk.sch_models.Environment
) – environment object.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- update_group(group)[source]#
Update a group.
- Parameters
group (
streamsets.sdk.sch_models.Group
) – Group object.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- update_job(job)[source]#
Update a job.
- Parameters
job (
streamsets.sdk.sch_models.Job
) – Job object.- Returns
An instance of
streamsets.sdk.sch_models.Job
.
- update_legacy_deployment(deployment)[source]#
Update a deployment.
- Parameters
deployment (
streamsets.sdk.sch_models.LegacyDeployment
) – Deployment object.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- update_organization(organization)[source]#
Update an organization.
- Parameters
organization (
streamsets.sdk.sch_models.Organization
) – Organization instance.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- update_pipelines_with_different_fragment_version(pipelines, from_fragment_version, to_fragment_version)[source]#
Update pipelines with latest pipeline fragment commit version.
- Parameters
pipelines (
list
) – List ofstreamsets.sdk.sch_models.Pipeline
instances.from_fragment_version (
streamsets.sdk.sch_models.PipelineCommit
) – commit of fragment from which the pipeline needs to be updated.to_fragment_version (
streamsets.sdk.sch_models.PipelineCommit
) – commit of fragment to which the pipeline needs to be updated.
- Returns
An instance of
streamsets.sdk.sch_api.Command
- update_report_definition(report_definition)[source]#
Update an existing Report Definition.
- Parameters
report_definition (
streamsets.sdk.sch_models.ReportDefinition
) – Report Definition instance.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- update_snowflake_pipeline_defaults(account_url=None, database=None, warehouse=None, schema=None, role=None)[source]#
Create or update the Snowflake pipeline defaults for this user.
- Parameters
account_url (
str
, optional) – Snowflake account url. Default:None
database (
str
, optional) – Snowflake database to query against. Default:None
warehouse (
str
, optional) – Snowflake warehouse. Default:None
schema (
str
, optional) – Schema used. Default:None
role (
str
, optional) – Role used. Default:None
- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- update_snowflake_user_credentials(username, snowflake_login_type, password=None, private_key=None, role=None)[source]#
Create or update the Snowflake user credentials.
- Parameters
username (
str
) – Snowflake account username.snowflake_login_type (
str
) – Snowflake login type to use. Options arepassword
andprivateKey
.password (
str
, optional) – Snowflake account password. Default:None
private_key (
str
, optional) – Snowflake account private key. Default:None
role (
str
, optional) – Snowflake role of the account. Default:None
- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- update_subscription(subscription)[source]#
Update an existing Subscription.
- Parameters
subscription (
streamsets.sdk.sch_models.Subscription
) – A Subscription instance.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- update_user(user)[source]#
- Update a user. Some user attributes are updated by ControlHub such as
last_modified_by, last_modified_on.
- Parameters
user (
streamsets.sdk.sch_models.User
) – User object.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- upgrade_job(*jobs)[source]#
Upgrade job(s) to latest pipeline version.
- Parameters
*jobs – One or more instances of
streamsets.sdk.sch_models.Job
.- Returns
A
streamsets.sdk.utils.SeekableList
ofstreamsets.sdk.sch_models.Job
.
- upload_offset(job, offset_file=None, offset_json=None)[source]#
Upload offset for given job.
- Parameters
job (
streamsets.sdk.sch_models.Job
) – Job object.offset_file (
file
, optional) – File containing the offsets. Default:None
. Exactly one ofoffset_file
,offset_json
should specified.offset_json (
dict
, optional) – Contents of offset. Default:None
. Exactly one ofoffset_file
,offset_json
should specified.
- Returns
An instance of
streamsets.sdk.sch_models.Job
.
- property users#
Users.
- Returns
An instance of
streamsets.sdk.sch_models.Users
.
- validate_pipeline(pipeline)[source]#
Validate pipeline. Only Transformer for Snowflake pipelines are supported at this time.
- Parameters
pipeline (
streamsets.sdk.sch_models.Pipeline
) – Pipeline instance.- Raises
streamsets.sdk.exceptions.ValidationError – If validation fails.
NotImplementedError – If used for a pipeline that isn’t a Transformer for Snowflake pipeline.
- verify_connection(connection)[source]#
Verify connection.
- Parameters
connection (
streamsets.sdk.sch_models.Connection
) – Connection object.- Returns
An instance of
streamsets.sdk.sch_models.ConnectionVerificationResult
.
- wait_for_job_metric(job, metric, value, timeout_sec=200)[source]#
Block until a job’s realtime summary reaches the desired value for the desired metric.
- Parameters
job (
streamsets.sdk.sch_models.Job
) – The job instance.metric (
str
) – The desired metric (e.g.'output_record_count'
or'input_record_count'
).value (
int
) – The desired value to wait for.timeout (
int
, optional) – Timeout to wait formetric
to reachvalue
, in seconds. Default:streamsets.sdk.sch.DEFAULT_WAIT_FOR_METRIC_TIMEOUT
.
- Raises
TimeoutError – If
timeout
passes withoutmetric
reachingvalue
.
- wait_for_job_metrics_record_count(job, count, timeout_sec=200)[source]#
Block until a job’s metrics reaches the desired count.
- Parameters
job (
streamsets.sdk.sch_models.Job
) – The job instance.count (
int
) – The desired value to wait for.timeout_sec (
int
, optional) – Timeout to wait formetric
to reachcount
, in seconds. Default:streamsets.sdk.sch.DEFAULT_WAIT_FOR_METRIC_TIMEOUT
.
- Raises
TimeoutError – If
timeout_sec
passes withoutmetric
reachingvalue
.
- wait_for_job_status(job, status, timeout_sec=300)[source]#
Block until a job reaches the desired status.
- Parameters
job (
streamsets.sdk.sch_models.Job
) – The job instance.status (
str
) – The desired status to wait for.timeout_sec (
int
, optional) – Timeout to wait forjob
to reachstatus
, in seconds. Default:streamsets.sdk.sch.DEFAULT_WAIT_FOR_STATUS_TIMEOUT
.
- Raises
TimeoutError – If
timeout_sec
passes withoutjob
reachingstatus
.
Models#
These models wrap and provide useful functionality for interacting with common SCH abstractions.
ACLs#
- class streamsets.sdk.sch_models.ACL(acl, control_hub)[source]#
Represents an ACL.
- Parameters
acl (
dict
) – JSON representation of an ACL.control_hub (
streamsets.sdk.sch.ControlHub
) – ControlHub object.
- permissions#
A Collection of Permissions.
- Type
streamsets.sdk.sch_models.Permissions
- add_permission(permission)[source]#
Add new permission to the ACL.
- Parameters
permission (
streamsets.sdk.sch_models.Permission
) – A permission object.- Returns
An instance of
streamsets.sdk.sch_api.Command
- property permission_builder#
Get a permission builder instance with which a pipeline can be created.
- Returns
An instance of
streamsets.sdk.sch_models.ACLPermissionBuilder
.
- remove_permission(permission)[source]#
Remove a permission from ACL.
- Parameters
permission (
streamsets.sdk.sch_models.Permission
) – A permission object.- Returns
An instance of
streamsets.sdk.sch_api.Command
- class streamsets.sdk.sch_models.ACLPermissionBuilder(permission, acl)[source]#
Class to help build the ACL permission.
- Parameters
permission (
streamsets.sdk.sch_models.Permission
) – A permission object.acl (
streamsets.sdk.sch_models.ACL
) – An ACL object.
- build(subject_id, subject_type, actions)[source]#
Method to help build the ACL permission.
- Parameters
subject_id (
str
) – Id of the subject e.g. ‘test@test’.subject_type (
str
) – Type of the subject e.g. ‘USER’.actions (
list
) – A list of actions of typestr
e.g. [‘READ’, ‘WRITE’, ‘EXECUTE’].
- Returns
An instance of
streamsets.sdk.sch_models.Permission
.
- class streamsets.sdk.sch_models.Permission(permission, resource_type, api_client)[source]#
A container for a permission.
- Parameters
permission (
dict
) – A Python object representation of a permission.resource_type (
str
) – String representing the type of resource e.g. ‘JOB’, ‘PIPELINE’.api_client (
streamsets.sdk.sch_api.ApiClient
) – An instance of ApiClient.
- resource_id#
Id of the resource e.g. Pipeline or Job.
- Type
str
- subject_id#
Id of the subject e.g. user id
'admin@admin'
.- Type
str
- subject_type#
Type of the subject e.g.
'USER'
.- Type
str
- last_modified_by#
User who last modified this permission e.g.
'admin@admin'
.- Type
str
- last_modified_on#
Timestamp at which this permission was last modified e.g.
1550785079811
.- Type
int
Alerts#
- class streamsets.sdk.sch_models.Alert(alert, control_hub)[source]#
Model for Alerts.
- message#
The Alert’s message text.
- Type
str
- alert_status#
The status of the Alert.
- Type
str
- Parameters
alert (
dict
) – JSON representation of an Alert.control_hub (
streamsets.sdk.ControlHub
) – Control Hub instance.
Classifiers#
Classification Rules#
- class streamsets.sdk.sch_models.ClassificationRule(classification_rule, classifiers)[source]#
Classification Rule Model.
- Parameters
classification_rule (
dict
) – A Python dict representation of classification rule.classifiers (
list
) – A list ofstreamsets.sdk.sch_models.Classifier
instances.
- class streamsets.sdk.sch_models.ClassificationRuleBuilder(classification_rule, classifier)[source]#
Class with which to build instances of
streamsets.sdk.sch_models.ClassificationRule
.- Instead of instantiating this class directly, most users should use
streamsets.sdk.sch.ControlHub.get_classification_rule_builder()
.
- Parameters
classification_rule (
dict
) – Python object defining a classification rule.classifier (
dict
) – Python object defining a classifier.
- add_classifier(patterns=None, match_with=None, regular_expression_type='RE2/J', case_sensitive=False)[source]#
Add classifier to the classification rule.
- Parameters
patterns (
list
, optional) – List of strings of patterns. Default:None
.match_with (
str
, optional) – Default:None
.regular_expression_type (
str
, optional) – Default:'RE2/J'
.case_sensitive (
bool
, optional) – Default:False
.
- Returns
An instance of
streamsets.sdk.sch_models.Classifier
.
- build(name, category, score)[source]#
Build the classification rule.
- Parameters
name (
str
) – Classification Rule name.category (
str
) – Classification Rule category.score (
float
) – Classification Rule score.
- Returns
An instance of
streamsets.sdk.sch_models.ClassificationRule
.
Connections#
- class streamsets.sdk.sch_models.Connection(connection, control_hub)[source]#
Model for connection.
- Parameters
connection (
dict
) – A Python object representation of Connection.control_hub (
streamsets.sdk.sch.ControlHub
) – ControlHub object.
- property acl#
Get Connection ACL.
- Returns
An instance of
streamsets.sdk.sch_models.ACL
.
- property pipeline_commits#
Get the pipeline commits using this connection.
- Returns
- A
streamsets.sdk.utils.SeekableList
ofstreamsets.sdk.sch_models.PipelineCommit
instances.
- A
- property tags#
Get the connection tags.
- Returns
A
streamsets.sdk.utils.SeekableList
of instances ofstreamsets.sdk.sch_models.Tag
.
- class streamsets.sdk.sch_models.Connections(control_hub, organization)[source]#
Collection of
streamsets.sdk.sch_models.Connection
instances.- Parameters
control_hub – An instance of
streamsets.sdk.sch.ControlHub
.organization (
str
) – Organization Id.
- class streamsets.sdk.sch_models.ConnectionBuilder(connection, control_hub)[source]#
Class with which to build instances of
streamsets.sdk.sch_models.Connection
.- Instead of instantiating this class directly, most users should use
streamsets.sdk.sch.ControlHub.get_connection_builder()
.
- Parameters
connection (
dict
) – Python object built from our Swagger ConnectionJson definition.control_hub (
streamsets.sdk.sch.ControlHub
) – ControlHub object.
- build(title, connection_type, authoring_data_collector, tags=None)[source]#
Define the connection.
- Parameters
title (
str
) – Connection title.connecion_type (
str
) – Type of connection.authoring_data_collector (
streamsets.sdk.sch.DataCollector
) – Authoring Data Collector.tags (
list
, optional) – List of tags (strings). Default:None
.
- Returns
An instance of
streamsets.sdk.sch_models.Connection
.
- class streamsets.sdk.sch_models.ConnectionVerificationResult(connection_preview_json)[source]#
Model for connection verification result.
- Parameters
connection_preview_json (
dict
) – dynamic preview API response JSON
- property issue_count#
The count of the number of issues for the connection verification result.
- Returns
A
int
that represents the number of issues.
- property issue_message#
The message provided for the connection verification result.
- Returns
A
str
message detailing the response for the connection verification result.
DataCollectors#
- class streamsets.sdk.sch_models.DataCollector(data_collector, control_hub)[source]#
Model for Data Collector.
- execution_mode#
True
for Edge andFalse
for SDC.- Type
bool
- id#
Data Collectort id.
- Type
str
- labels#
Labels for Data Collector.
- Type
list
- last_validated_on#
Last validated time for Data Collector.
- Type
str
- reported_labels#
Reported labels for Data Collector.
- Type
list
- url#
Data Collector’s url.
- Type
str
- version#
Data Collector’s version.
- Type
str
- property accessible#
Returns a
bool
for whether the Data Collector instance is accessible.
- property acl#
Get DataCollector ACL.
- Returns
An instance of
streamsets.sdk.sch_models.ACL
.
- property attributes#
Returns a
dict
of Data Collector attributes.
- property pipelines_committed#
ControlHub Job IDs that are about to be started but have no corresponding pipeline status yet.
- Returns
A
list
of Job IDs (str
objects).
- property resource_thresholds#
Return DataCollector resource thresholds.
- Returns
- A
dict
of DataCollector thresholds named as “max_memory_used”, “max_cpu_load” and “max_pipelines_running”
- A
- property responding#
Returns a
bool
for whether the Data Collector instance is responding.
Deployments#
- class streamsets.sdk.sch_models.AzureVMDeployment(deployment, control_hub=None, **kwargs)[source]#
Model for Azure VM Deployment.
- attach_public_ip#
- Type
bool
- azure_tags#
Azure tags to apply to all provisioned resources in this Azure deployment.
- Type
dict
- key_pair_name#
SSH key pair.
- Type
str
- managed_identity#
- Type
str
- public_ssh_key#
- Type
str
- resource_group#
- Type
str
- ssh_key_source#
SSH key source.
- Type
str
- vm_size#
- Type
str
- zones#
- Type
list
- class streamsets.sdk.sch_models.Deployment(deployment, control_hub=None, **kwargs)[source]#
Model for Deployment. This is an abstract class.
- created_by#
User that created this deployment.
- Type
str
- created_on#
Time at which this deployment was created.
- Type
int
- deployment_id#
Id of the deployment.
- Type
str
- deployment_name#
Name of the deployment.
- Type
str
- deployment_tags#
Raw deployment tags of the deployment.
- Type
list
- deployment_type#
Type of the deployment.
- Type
str
- desired_instances#
- Type
int
- engine_configuration#
- engine_type#
- Type
str
- engine_version#
- Type
str
- environment_id#
Enabled environment where engine will be deployed.
- Type
list
- last_modified_by#
User that last modified this deployment.
- Type
str
- last_modified_on#
Time at which this deployment was last modified.
- Type
int
- network_tags#
- Type
list
- scala_binary_version#
scala Binary Version.
- Type
str
- organization#
- Type
str
- json_state#
State of the deployment - this is json field called state.
- Type
str
- state#
State of the deployment - displayed as state on UI.
- Type
str
- status#
Status of the deployment.
- Type
str
- status_detail#
Status detail of the deployment.
- Type
str
- property acl#
Get deployment ACL.
- Returns
An instance of
streamsets.sdk.sch_models.ACL
.
- property engine_configuration#
The engine configuration of deployment engine. :returns: An instance of
streamsets.sdk.sch_models.DeploymentEngineConfiguration
.
- property tags#
Get the tags.
- Returns
A
streamsets.sdk.utils.SeekableList
ofstr
instances.
- class streamsets.sdk.sch_models.DeploymentBuilder(deployment, control_hub)[source]#
Class with which to build instances of
streamsets.sdk.sch_models.Deployment
.- Instead of instantiating this class directly, most users should use
streamsets.sdk.sch.ControlHub.get_deployment_builder()
.
- Parameters
deployment (
dict
) – Python object built from our Swagger CspDeploymentJson definition.control_hub (
streamsets.sdk.sch.ControlHub
) – ControlHub object.
- build(deployment_name, engine_type, engine_version, environment, **kwargs)[source]#
Define the deployment.
- Parameters
deployment_name (
str
) – deployment name.engine_type (
str
) – Type of engine to deploy.engine_version (
str
) – Version of engine to deploy.environment (
str
) – ID of the enabled environment where the engine will be deployed.deployment_tags (
list
, optional) – List of tags (strings). Default:None
.engine_build (
str
, optional) – Build of engine to deploy. Default:None
.scala_binary_version (
str
, optional) – Scala binary version required in case of engine_type=’TF’ Default:None
.
- Returns
An instance of subclass of
streamsets.sdk.sch_models.Deployment
.
- class streamsets.sdk.sch_models.DeploymentEngineAdvancedConfiguration(deployment_engine_advanced_config, deployment=None)[source]#
Model for advanced configuration in Deployment engine configuration.
- class streamsets.sdk.sch_models.DeploymentEngineConfiguration(deployment_engine_config, deployment=None, control_hub=None)[source]#
Model for Deployment engine configuration.
- advanced_configuration#
- Type
str
- aws_image_id#
- Type
str
- azure_image_id#
- Type
str
- created_by#
User that created this deployment.
- Type
str
- created_on#
Time at which this deployment was created.
- Type
int
- download_url#
- Type
list
- engine_type#
engine type.
- Type
list
- engine_version#
Engine version to deploy.
- Type
list
- id#
User that created this deployment.
- Type
str
- gcp_image_id#
- Type
str
- last_modified_by#
User that last modified this deployment.
- Type
str
- last_modified_on#
Time at which this deployment was last modified.
- Type
int
- scala_binary_version#
scala Binary Version.
- Type
str
- class streamsets.sdk.sch_models.DeploymentEngineConfigurations(control_hub)[source]#
Collection of
streamsets.sdk.sch_models.DeploymentEngineConfiguration
instances.- Parameters
control_hub – An instance of
streamsets.sdk.sch.ControlHub
.
- class streamsets.sdk.sch_models.DeploymentEngineJavaConfiguration(deployment_engine_java_configuration, deployment=None)[source]#
Model for Deployment engine Java configuration.
- id#
- Type
str
- java_options#
- Type
str
- java_memory_strategy (
object:ABSOLUTE/PERCENTAGE)
- maximum_java_heap_size_in_mb#
- Type
long
- minimum_java_heap_size_in_mb#
- Type
long
- maximum_java_heap_size_in_percent#
- Type
int
- minimum_java_heap_size_in_percent#
- Type
int
- class streamsets.sdk.sch_models.DeploymentStageLibraries(values, deployment_config)[source]#
Wrapper class for the list of stage libraries in a deployment.
- Parameters
values (
list
) – A list of stage library names.deployment_config (
streamsets.sdk.sch_models.DeploymentEngineConfiguration
) – The deployment engine configuration object that these stage libraries pertain to.
- class streamsets.sdk.sch_models.EC2Deployment(deployment, control_hub=None, **kwargs)[source]#
Model for EC2 Deployment.
- ec2_instance_type#
Type EC2 engine instance.
- Type
str
- engine_instances#
No. of EC2 engine instances.
- Type
int
- instance_profile#
arn of instance profile created as an environment prerequisite.
- Type
str
- key_pair_name#
SSH key pair.
- Type
str
- aws_tags#
AWS tags to apply to all provisioned resources in this AWS deployment.
- Type
dict
- ssh_key_source#
SSH key source.
- Type
str
- tracking_url#
- Type
str
- class streamsets.sdk.sch_models.GCEDeployment(deployment, control_hub=None, **kwargs)[source]#
Model for GCE Deployment.
- block_project_ssh_keys#
Blocks the use of project-wide public SSH keys to access the provisioned VM instances.
- Type
bool
- engine_instances#
Number of engine instances to deploy.
- Type
int
- gcp_labels#
GCP labels to apply to all provisioned resources in your GCP project.
- Type
dict
- instance_service_account#
Instance service account.
- Type
str
- machine_type#
machine type to use for the provisioned VM instances.
- Type
str
- public_ssh_key#
full contents of public SSH key associated with each VM instance.
- Type
str
- region#
Region to provision VM instances in.
- Type
str
- tracking_url#
Tracking URL.
- Type
str
- zone#
GCP zone
- Type
list
Environments#
- class streamsets.sdk.sch_models.AWSEnvironment(environment, control_hub=None, **kwargs)[source]#
Model for AWS Environment.
- access_key_id#
AWS access key id to access AWS account. Required when credential_type is AWS_STATIC_KEYS.
- Type
str
- aws_tags#
AWS tags to apply to all provisioned resources in this AWS deployment.
- Type
dict
- credentials#
Credentials of the environment.
- Type
dict
- default_instance_profile#
aarn of default instance profile created as an environment prerequisite.
- Type
dict
- region#
AWS region where VPC is located in case of AWS environment.
- Type
str
- role_arn#
Role AR of the cross-account role created as an environment prerequisite in user’s AWS account. Required when credential_type is AWS_CROSS_ACCOUNT_ROLE.
- Type
str
- secret_access_key_id#
AWS secret access key to access AWS account. Required when credential_type is AWS_STATIC_KEYS.
- Type
str
- subnet_ids#
Range of Subnet IDs in the VPC.
- Type
str
- security_group_id#
Security group ID.
- Type
str
- vpc_id#
Id of the Amazon VPC created as an environment prerequisite in user’s AWS account.
- Type
str
- class streamsets.sdk.sch_models.AzureEnvironment(environment, control_hub=None, **kwargs)[source]#
Model for Azure Environment.
- azure_tags#
Azure tags for this environment.
- Type
dict
- credential_type#
Credential type of the environment.
- Type
str
- credentials#
Credentials of the environment.
- Type
dict
- default_managed_identity#
default managed identity.
- Type
str
- default_resource_group#
default resource group.
- Type
str
- region#
Azure region where vnet is located.
- Type
str
- subnet_id#
Subnet ID in the vnet.
- Type
str
- security_group_id#
Security group ID.
- Type
str
- vnet_id#
Id of the vnet.
- Type
str
- class streamsets.sdk.sch_models.Environment(environment, control_hub=None, **kwargs)[source]#
Model for Environment. This is an abstract class.
- allow_nightly_engine_builds#
- Type
bool
- created_by#
User that created this environment.
- Type
str
- created_on#
Time at which this environment was created.
- Type
int
- environment_id#
Id of the environment.
- Type
str
- environment_name#
Name of the environment.
- Type
str
- environment_tags#
Raw environment tags of the environment.
- Type
list
- environment_type#
Type of the environment.
- Type
str
- last_modified_by#
User that last modified this environment.
- Type
str
- last_modified_on#
Time at which this environment was last modified.
- Type
int
- json_state#
State of the environment - which is stored in json field called state.
- Type
str
- state#
State of the environment stateDisplayLabel - shown on the UI as state.
- Type
str
- status#
Status of the environment.
- Type
str
- status_detail#
Status detail of the environment.
- Type
str
- property acl#
Get environment ACL.
- Returns
An instance of
streamsets.sdk.sch_models.ACL
.
- property tags#
Get the tags.
- Returns
A
streamsets.sdk.utils.SeekableList
ofstr
instances.
- class streamsets.sdk.sch_models.EnvironmentBuilder(environment, control_hub)[source]#
Class with which to build instances of
streamsets.sdk.sch_models.Environment
.- Instead of instantiating this class directly, most users should use
streamsets.sdk.sch.ControlHub.get_environment_builder()
.
- Parameters
environment (
dict
) – Python object built from our Swagger CspEnvironmentJson definition.control_hub (
streamsets.sdk.sch.ControlHub
) – ControlHub object.
- build(environment_name, **kwargs)[source]#
Define the environment.
- Parameters
environment_name (
str
) – environment name.allow_nightly_engine_builds (
bool
, optional) – Default:False
.environment_tags (
list
, optional) – List of tags (strings). Default:None
.
- Returns
An instance of subclass of
streamsets.sdk.sch_models.Environment
.
- class streamsets.sdk.sch_models.GCPEnvironment(environment, control_hub=None, **kwargs)[source]#
Model for GCP Environment.
- credentials#
Credentials of the environment.
- Type
dict
- account_key_json#
JSON file contents for the Service Account Key. Required when credential_type is GCP_SERVICE_ACCOUNT_KEY.
- Type
str
- project#
Project where VPC is located.
- Type
str
- gcp_labels#
GCP labels for this environment.
- Type
dict
- gcp_tags#
GCP tags to apply to all resources provisioned in GCP account.
- Type
str
- service_account_email#
Service account ID. Required when credential_type is GCP_SERVICE_ACCOUNT_IMPERSONATION.
- Type
str
- vpc_id#
Id of the GCP VPC created as an environment prerequisite in user’s GCP account.
- Type
str
- fetch_available_machine_types(zone_id)[source]#
Returns the available machine types for the given Environment and GCP Project and GCP Zone.
- fetch_available_regions()[source]#
Returns the available regions for the given Environment and GCP Project.
- fetch_available_service_accounts()[source]#
Returns the available service accounts for the given Environment and GCP Project.
- fetch_available_vpcs()[source]#
Returns the available Networks for the given Environment and GCP Project. Here it is called vpcs since on UI it is shown as VPC.
- fetch_available_zones(region_id)[source]#
Returns the available zones for the given environment and GCP project and GCP region.
Transformers#
- class streamsets.sdk.sch_models.Transformer(transformer, control_hub)[source]#
Model for Transformer.
- execution_mode#
- Type
str
- id#
Transformer id.
- Type
str
- labels#
Labels for Transformer.
- Type
list
- last_validated_on#
Last validated time for Transformer.
- Type
str
- reported_labels#
Reported labels for Transformer.
- Type
list
- url#
Transformer’s url.
- Type
str
- version#
Transformer’s version.
- Type
str
- property accessible#
Returns a
bool
for whether the Transformer instance is accessible.
- property acl#
Get Transformer ACL.
- Returns
An instance of
streamsets.sdk.sch_models.ACL
.
- property attributes#
Returns a
dict
of Transformer attributes.
Group#
- class streamsets.sdk.sch_models.Group(group, roles)[source]#
Model for Group.
- Parameters
group (
dict
) – A Python object representation of Group.roles (
dict
) – A mapping of role IDs to role labels.
- class streamsets.sdk.sch_models.Groups(control_hub, roles, organization)[source]#
Collection of
streamsets.sdk.sch_models.Group
instances.- Parameters
control_hub – An instance of
streamsets.sdk.sch.ControlHub
.roles (
dict
) – A mapping of role IDs to role labels.organization (
str
) – Organization ID.
- class streamsets.sdk.sch_models.GroupBuilder(group, roles, control_hub=None)[source]#
Class with which to build instances of
streamsets.sdk.sch_models.Group
.- Instead of instantiating this class directly, most users should use
streamsets.sdk.sch.ControlHub.get_group_builder()
.
- Parameters
group (
dict
) – Python object built from our Swagger GroupJson definition.roles (
dict
) – A mapping of role IDs to role labels.control_hub (
streamsets.sdk.sch.ControlHub
, optional) – ControlHub object. Default:None
- build(display_name, group_id=None, ldap_groups=None)[source]#
Build the group.
- Parameters
display_name (
str
) – Group display name.group_id (
str
, optional) – Group ID. Can only include letters, numbers, underscores and periods. Default:None
.ldap_groups (
list
, optional) – List of LDAP groups (strings).
- Returns
An instance of
streamsets.sdk.sch_models.Group
.
Jobs#
- class streamsets.sdk.sch_models.Job(job, control_hub=None)[source]#
Model for Job.
- archived#
Flag that indicates if this job is archived.
- Type
bool
- commit_id#
Pipeline commit id.
- Type
str
- commit_label#
Pipeline commit label.
- Type
str
- created_by#
User that created this job.
- Type
str
- created_on#
Time at which this job was created.
- Type
int
- data_collector_labels#
Labels of the data collectors.
- Type
list
- delete_after_completion#
Flag that indicates if this job should be deleted after completion.
- Type
bool
- description#
Job description.
- Type
str
- destroyer#
Job destroyer.
- Type
str
- enable_failover#
Flag that indicates if failover is enabled.
- Type
bool
- enable_time_series_analysis#
Flag that indicates if time series is enabled.
- Type
bool
- execution_mode#
True for Edge and False for SDC.
- Type
bool
- job_deleted#
Flag that indicates if this job is deleted.
- Type
bool
- job_id#
Id of the job.
- Type
str
- job_name#
Name of the job.
- Type
str
- last_modified_by#
User that last modified this job.
- Type
str
- last_modified_on#
Time at which this job was last modified.
- Type
int
- number_of_instances#
Number of instances.
- Type
int
- pipeline_force_stop_timeout#
Timeout for Pipeline force stop.
- Type
int
- pipeline_id#
Id of the pipeline that is running the job.
- Type
str
- pipeline_name#
Name of the pipeline that is running the job.
- Type
str
- pipeline_rule_id#
Rule Id of the pipeline that is running the job.
- Type
str
- read_policy#
Read Policy of the job.
- runtime_parameters#
Run-time parameters of the job.
- Type
str
- static_parameters#
List of parameters wjat cannot be overriden.
- Type
list
- statistics_refresh_interval_in_millisecs#
Refresh interval for statistics in milliseconds.
- Type
int
- status#
Status of the job.
- Type
string
- template_run_history_list#
List of job template run history.
- Type
list
- write_policy#
Write Policy of the job.
- property acl#
Get job ACL.
- Returns
An instance of
streamsets.sdk.sch_models.ACL
.
- property commit#
Get pipeline commit of the job.
- Returns
An instance of
streamsets.sdk.sch_models.PipelineCommit
.
- property committed_offsets#
Get the committed offsets for a given job id.
- Returns
An instance of
streamsets.sdk.sch_models.JobCommittedOffset
.
- property metrics#
The metrics from all runs of a Job.
- Returns
A
streamsets.sdk.utils.SeekableList
ofstreamsets.sdk.sch_models.JobMetrics
instances.
- property pipeline#
Get the pipeline object corresponding to this job.
- property system_job#
Get the sytem Job for this job if exists.
- Returns
An instance of
streamsets.sdk.sch_models.Job
.
- property tag#
Get pipeline tag of the job.
- Returns
An instance of
streamsets.sdk.sch_models.PipelineTag
.
- property tags#
Get the job tags.
- Returns
A
streamsets.sdk.utils.SeekableList
of instances ofstreamsets.sdk.sch_models.Tag
.
- time_series_metrics(metric_type, time_filter_condition='LAST_5M', **kwargs)[source]#
Get historic time series metrics for the job.
- Parameters
metric_type (
str
) – metric type in {‘Record Count Time Series’, ‘Record Throughput Time Series’, ‘Batch Throughput Time Series’, ‘Stage Batch Processing Timer seconds’}.time_filter_condition (
str
, optional) – Default:'LAST_5M'
.
- Returns
An instance of
streamsets.sdk.sch_models.JobTimeSeriesMetrics
.
- class streamsets.sdk.sch_models.Jobs(control_hub)[source]#
Collection of
streamsets.sdk.sch_models.Job
instances.- Parameters
control_hub (
streamsets.sdk.sch.ControlHub
) – ControlHub object.
- class streamsets.sdk.sch_models.JobBuilder(job, control_hub)[source]#
Class with which to build instances of
streamsets.sdk.sch_models.Job
.- Instead of instantiating this class directly, most users should use
streamsets.sdk.sch.ControlHub.get_job_builder()
.
- Parameters
job (
dict
) – Python object built from our Swagger JobJson definition.
- build(job_name, pipeline, job_template=False, runtime_parameters=None, pipeline_commit=None, pipeline_tag=None, pipeline_commit_or_tag=None, tags=None)[source]#
Build the job.
- Parameters
job_name (
str
) – Name of the job.pipeline (
streamsets.sdk.sch_models.Pipeline
) – Pipeline object.job_template (
boolean
, optional) – Indicate if it is a Job Template. Default:False
.runtime_parameters (
dict
, optional) – Runtime Parameters for the Job or Job Template. Default:None
.pipeline_commit (
streamsets.sdk.sch_models.PipelineCommit
) – Default: ``None`, which resolves to the latest pipeline commit.pipeline_tag (
streamsets.sdk.sch_models.PipelineTag
) – Default: ``None`, which resolves to the latest pipeline tag.pipeline_commit_or_tag (
str
, optional) – Default:None
, which resolves to the latest pipeline commit.tags (
list
, optional) – Job tags. Default:None
.
- Returns
An instance of
streamsets.sdk.sch_models.Job
.
- class streamsets.sdk.sch_models.JobMetrics(metrics)[source]#
Model for job metrics.
- error_count#
The number of error records generated by this run of the Job.
- Type
int
- error_records_per_sec#
The number of error records per second generated by this run of the Job.
- Type
float
- input_count#
The number of records ingested by this run of the Job.
- Type
int
- input_records_per_sec#
The number of records per second ingested by this run of the Job.
- Type
float
- output_count#
The number of records output by this run of the Job.
- Type
int
- output_records_per_sec#
The number of records output per second by the run of the Job.
- Type
float
- pipeline_version#
The version of the pipeline that was used in this Job run.
- Type
str
- run_count#
The count corresponding to this Job run.
- Type
int
- sdc_id#
The ID of the SDC instance on which this Job run was executed.
- Type
str
- stage_errors_count#
The number of stage error records generated by this run of the Job.
- Type
int
- stage_error_records_per_sec#
The number of stage error records generated per second by this run of the job.
- Type
float
- total_error_count#
The total number of both error records and stage errors generated by this run of the job.
- Type
int
- Parameters
metrics (
dict
) – Metrics counts in JSON format.
- class streamsets.sdk.sch_models.JobOffset(offset)[source]#
Model for offset.
- Parameters
offset (
dict
) – Offset in JSON format.
- class streamsets.sdk.sch_models.JobRunEvent(event)[source]#
Model for an event in a Job Run.
- Parameters
event (
dict
) – Job Run Event in JSON format.
- class streamsets.sdk.sch_models.JobStatus(status, control_hub, **kwargs)[source]#
Model for Job Status.
- run_history#
(
streamsets.sdk.utils.JobRunHistoryEvent
): History of a particular job run.- Type
streamsets.sdk.utils.SeekableList
- offsets#
(
streamsets.sdk.utils.JobPipelineOffset
): Offsets after the job run.- Type
streamsets.sdk.utils.SeekableList
- Parameters
status (
dict
) – Job status in JSON format.control_hub (
streamsets.sdk.sch.ControlHub
) – ControlHub object.
- class streamsets.sdk.sch_models.JobTimeSeriesMetric(metric, metric_type)[source]#
Model for job metrics.
- name#
Name of measurement.
- Type
str
- values#
Timeseries data.
- Type
list
- time_series#
Timeseries data with timestamp as key and metric value as value.
- Type
dict
- Parameters
metric (
dict
) – Metrics in JSON format.
- class streamsets.sdk.sch_models.JobTimeSeriesMetrics(metrics, metric_type)[source]#
Model for job metrics.
- input_records#
Appears when queried for ‘Record Count Time Series’ or ‘Record Throughput Time Series’.
- output_records#
Appears when queried for ‘Record Count Time Series’ or ‘Record Throughput Time Series’.
- error_records#
Appears when queried for ‘Record Count Time Series’ or ‘Record Throughput Time Series’.
- batch_counter#
Appears when queried for ‘Batch Throughput Time Series’.
- batch_processing_timer#
Appears when queried for ‘Batch Processing Timer seconds’.
- Parameters
metrics (
dict
) – Metrics in JSON format.
- class streamsets.sdk.sch_models.RuntimeParameters(runtime_parameters, job)[source]#
Wrapper for ControlHub job runtime parameters.
- Parameters
runtime_parameters (
str
) – Runtime parameter.job (
streamsets.sdk.sch_models.Job
) – Job object.
Organizations#
- class streamsets.sdk.sch_models.Organization(organization, organization_admin_user=None, api_client=None)[source]#
Model for Organization.
- Parameters
organization (
str
) – Organization Id.organization_admin_user (
str
, optional) – Default:None
.api_client (
streamsets.sdk.sch_api.ApiClient
, optional) – Default:None
.
- class streamsets.sdk.sch_models.Organizations(control_hub)[source]#
Collection of
streamsets.sdk.sch_models.Organization
instances.
- class streamsets.sdk.sch_models.OrganizationBuilder(organization, organization_admin_user)[source]#
Class with which to build instances of
streamsets.sdk.sch_models.Organization
.- Instead of instantiating this class directly, most users should use
streamsets.sdk.sch.ControlHub.get_organization_builder()
.
- Parameters
organization (
dict
) – Python object built from our Swagger UserJson definition.
- build(id, name, admin_user_id, admin_user_display_name, admin_user_email_address, admin_user_ldap_user_name=None)[source]#
Build the organization.
- Parameters
id (
str
) – Organization ID.name (
str
) – Organization name.admin_user_id (
str
) – User Id of the admin of this organization.admin_user_display_name (
str
) – User display name of admin of this organization.admin_user_email_address (
str
) – User email address of admin of this organization.admin_user_ldap_user_name (
str
, optional) – LDAP username. Default:None
.
- Returns
An instance of
streamsets.sdk.sch_models.Organization
.
Pipelines#
- class streamsets.sdk.sch_models.Pipeline(pipeline, builder, pipeline_definition, rules_definition, control_hub=None, library_definitions=None)[source]#
Model for Pipeline.
- Parameters
pipeline (
dict
) – Pipeline in JSON format.builder (
streamsets.sdk.sch_models.PipelineBuilder
) – Pipeline Builder object.pipeline_definition (
dict
) – Pipeline Definition in JSON format.rules_definition (
dict
) – Rules Definition in JSON format.control_hub (
streamsets.sdk.sch.ControlHub
, optional) – ControlHub object. Default:None
.library_definitions (
dict
, optional) – Library Definition in JSON format. Default:None
.
- property Labels#
Get the pipeline labels. This attribute will be deprecated in a future release. Please use labels instead.
- Returns
A
list
ofdict
- property acl#
Get pipeline ACL.
- Returns
An instance of
streamsets.sdk.sch_models.ACL
.
- property commits#
Get commits for this pipeline.
- Returns
A
streamsets.sdk.utils.SeekableList
of instances ofstreamsets.sdk.sch_models.PipelineCommit
.
- property configuration#
Get pipeline’s configuration.
- Returns
An instance of
streamsets.sdk.sch_models.Configuration
.
- property labels#
Get the pipeline labels.
- Returns
A
streamsets.sdk.utils.SeekableList
of instances ofstreamsets.sdk.sch_models.PipelineLabel
.
- property parameters#
Get the pipeline parameters.
- Returns
A dict like,
streamsets.sdk.sch_models.PipelineParameters
object of parameter key-value pairs.
- property tags#
Get tags for this pipeline.
- Returns
A
streamsets.sdk.utils.SeekableList
of instances ofstreamsets.sdk.sch_models.PipelineTag
.
- class streamsets.sdk.sch_models.Pipelines(control_hub, organization)[source]#
Collection of
streamsets.sdk.sch_models.Pipeline
instances.- Parameters
control_hub – An instance of
streamsets.sdk.sch.ControlHub
.organization (
str
) – Organization Id.
- class streamsets.sdk.sch_models.PipelineBuilder(pipeline, data_collector_pipeline_builder, control_hub=None, fragment=False)[source]#
Class with which to build instances of
streamsets.sdk.sch_models.Pipeline
.- Instead of instantiating this class directly, most users should use
streamsets.sdk.sch.ControlHub.get_pipeline_builder()
.
- Parameters
pipeline (
dict
) – Python object built from our Swagger PipelineJson definition.data_collector_pipeline_builder (
streamsets.sdk.sdc_models.PipelineBuilder
) – Data Collector Pipeline Builder object.control_hub (
streamsets.sdk.sch.ControlHub
) – Default:None
.fragment (
boolean
, optional) – Specify if a fragment builder. Default:False
.
- add_stage(label=None, name=None, type=None, library=None)[source]#
Add a stage to the pipeline.
When specifying a stage, either
label
orname
must be used.type
andlibrary
may also be used to select a particular stage if ambiguities exist. Iftype
and/orlibrary
are omitted, the first stage definition matching the givenlabel
orname
will be used.- Parameters
label (
str
, optional) – Stage label to use when selecting stage from definitions. Default:None
.name (
str
, optional) – Stage name to use when selecting stage from definitions. Default:None
.type (
str
, optional) – Stage type to use when selecting stage from definitions (e.g. origin, destination, processor, executor). Default:None
.library (
str
, optional) – Stage library to use when selecting stage from definitions. Default:None
.
- Returns
An instance of
streamsets.sdk.sch_models.SchSdcStage
.
- class streamsets.sdk.sch_models.PipelineCommit(pipeline_commit, control_hub=None)[source]#
Model for pipeline commit.
- Parameters
pipeline_commit (
dict
) – Pipeline commit in JSON format.control_hub (
streamsets.sdk.sch.ControlHub
) – ControlHub object.
- class streamsets.sdk.sch_models.PipelineLabel(pipeline_label)[source]#
Model for pipeline label.
- Parameters
pipeline_label (
dict
) – Pipeline label in JSON format.
- class streamsets.sdk.sch_models.PipelineLabels(control_hub, organization)[source]#
Collection of
streamsets.sdk.sch_models.PipelineLabel
instances.- Parameters
control_hub – An instance of
streamsets.sdk.sch.ControlHub
.organization (
str
) – Organization Id.
- class streamsets.sdk.sch_models.PipelineParameters(pipeline)[source]#
Parameters for pipelines.
- Parameters
pipeline (
streamsets.sdk.sch_models.Pipeline
) – Pipeline Instance.
- class streamsets.sdk.sch_models.StPipelineBuilder(pipeline, transformer_pipeline_builder, control_hub=None, fragment=False)[source]#
Class with which to build instances of
streamsets.sdk.sch_models.Pipeline
.- Instead of instantiating this class directly, most users should use
streamsets.sdk.sch.ControlHub.get_pipeline_builder()
.
- Parameters
pipeline (
dict
) – Python object built from our Swagger PipelineJson definition.transformer_pipeline_builder (
streamsets.sdk.sdc_models.PipelineBuilder
) – Transformer Pipeline Builder object.control_hub (
streamsets.sdk.sch.ControlHub
) – Default:None
.fragment (
boolean
, optional) – Specify if a fragment builder. Default:False
.
- add_stage(label=None, name=None, type=None, library=None)[source]#
Add a stage to the pipeline.
When specifying a stage, either
label
orname
must be used.type
andlibrary
may also be used to select a particular stage if ambiguities exist. Iftype
and/orlibrary
are omitted, the first stage definition matching the givenlabel
orname
will be used.- Parameters
label (
str
, optional) – Transformer stage label to use when selecting stage from definitions. Default:None
.name (
str
, optional) – Transformer stage name to use when selecting stage from definitions. Default:None
.type (
str
, optional) – Transformer stage type to use when selecting stage from definitions (e.g. origin, destination, processor, executor). Default:None
.library (
str
, optional) – Transformer stage library to use when selecting stage from definitions. Default:None
.
- Returns
An instance of
streamsets.sdk.sch_models.SchStStage
.
Protection Methods#
- class streamsets.sdk.sch_models.ProtectionMethod(stage)[source]#
Protection Method Model.
- Parameters
stage (
dict
) – JSON representation of a stage.
- class streamsets.sdk.sch_models.ProtectionMethodBuilder(pipeline_builder)[source]#
Class with which to build instances of
streamsets.sdk.sch_models.ProtectionMethod
.- Instead of instantiating this class directly, most users should use
streamsets.sdk.sch.ControlHub.get_protection_method_builder()
.
- Parameters
pipeline_builder (
streamsets.sdk.sch_models.PipelineBuilder
) – Pipeline Builder object.
Protection Policies#
- class streamsets.sdk.sch_models.ProtectionPolicy(protection_policy, procedures=None)[source]#
Model for Protection Policy.
- Parameters
protection_policy (
dict
) – JSON representation of Protection Policy.procedures (
list
) – A list ofstreamsets.sdk.sch_models.PolicyProcedure
instances, Default:None
.
- class streamsets.sdk.sch_models.ProtectionPolicies(control_hub)[source]#
Collection of
streamsets.sdk.sch_models.ProtectionPolicy
instances.
- class streamsets.sdk.sch_models.ProtectionPolicyBuilder(control_hub, protection_policy, policy_procedure)[source]#
Class with which to build instances of
streamsets.sdk.sch_models.ProtectionPolicy
.- Instead of instantiating this class directly, most users should use
streamsets.sdk.sch.ControlHub.get_protection_policy_builder()
.
- Parameters
protection_policy (
dict
) – Python object defining a protection policy.policy_procedure (
dict
) – Python object defining a policy procedure.
ProvisioningAgents#
- class streamsets.sdk.sch_models.Deployment(deployment, control_hub=None, **kwargs)[source]#
Model for Deployment. This is an abstract class.
- created_by#
User that created this deployment.
- Type
str
- created_on#
Time at which this deployment was created.
- Type
int
- deployment_id#
Id of the deployment.
- Type
str
- deployment_name#
Name of the deployment.
- Type
str
- deployment_tags#
Raw deployment tags of the deployment.
- Type
list
- deployment_type#
Type of the deployment.
- Type
str
- desired_instances#
- Type
int
- engine_configuration#
- engine_type#
- Type
str
- engine_version#
- Type
str
- environment_id#
Enabled environment where engine will be deployed.
- Type
list
- last_modified_by#
User that last modified this deployment.
- Type
str
- last_modified_on#
Time at which this deployment was last modified.
- Type
int
- network_tags#
- Type
list
- scala_binary_version#
scala Binary Version.
- Type
str
- organization#
- Type
str
- json_state#
State of the deployment - this is json field called state.
- Type
str
- state#
State of the deployment - displayed as state on UI.
- Type
str
- status#
Status of the deployment.
- Type
str
- status_detail#
Status detail of the deployment.
- Type
str
- property acl#
Get deployment ACL.
- Returns
An instance of
streamsets.sdk.sch_models.ACL
.
- property engine_configuration#
The engine configuration of deployment engine. :returns: An instance of
streamsets.sdk.sch_models.DeploymentEngineConfiguration
.
- property tags#
Get the tags.
- Returns
A
streamsets.sdk.utils.SeekableList
ofstr
instances.
- class streamsets.sdk.sch_models.Deployments(control_hub, organization)[source]#
Collection of
streamsets.sdk.sch_models.Deployment
instances.- Parameters
control_hub – An instance of
streamsets.sdk.sch.ControlHub
.organization (
str
) – Organization Id.
- class streamsets.sdk.sch_models.DeploymentBuilder(deployment, control_hub)[source]#
Class with which to build instances of
streamsets.sdk.sch_models.Deployment
.- Instead of instantiating this class directly, most users should use
streamsets.sdk.sch.ControlHub.get_deployment_builder()
.
- Parameters
deployment (
dict
) – Python object built from our Swagger CspDeploymentJson definition.control_hub (
streamsets.sdk.sch.ControlHub
) – ControlHub object.
- build(deployment_name, engine_type, engine_version, environment, **kwargs)[source]#
Define the deployment.
- Parameters
deployment_name (
str
) – deployment name.engine_type (
str
) – Type of engine to deploy.engine_version (
str
) – Version of engine to deploy.environment (
str
) – ID of the enabled environment where the engine will be deployed.deployment_tags (
list
, optional) – List of tags (strings). Default:None
.engine_build (
str
, optional) – Build of engine to deploy. Default:None
.scala_binary_version (
str
, optional) – Scala binary version required in case of engine_type=’TF’ Default:None
.
- Returns
An instance of subclass of
streamsets.sdk.sch_models.Deployment
.
- class streamsets.sdk.sch_models.ProvisioningAgent(provisioning_agent, control_hub)[source]#
Model for Provisioning Agent.
- Parameters
provisioning_agent (
dict
) – A Python object representation of Provisioning Agent.control_hub – An instance of
streamsets.sdk.sch.ControlHub
.
- property acl#
Get the ACL of a Provisioning Agent.
- Returns
An instance of
streamsets.sdk.sch_models.ACL
.
- property deployments#
Get the deployments associated with the Provisioning Agent.
- Returns
A
list
ofstreamsets.sdk.sch_models.LegacyDeployment
instances.
- class streamsets.sdk.sch_models.ProvisioningAgents(control_hub, organization)[source]#
Collection of
streamsets.sdk.sch_models.ProvisioningAgent
instances.- Parameters
control_hub – An instance of
streamsets.sdk.sch.ControlHub
.organization (
str
) – Organization Id.
Reports#
- class streamsets.sdk.sch_models.GenerateReportCommand(control_hub, report_defintion, response)[source]#
Command to interact with the response from generate_report.
- Parameters
control_hub (
streamsets.sdk.sch.ControlHub
) – ControlHub instance.report_definition (
dict
) – JSON representation of Report Definition.response (
dict
) – Api response from generating the report.
- class streamsets.sdk.sch_models.Report(report, control_hub, report_definition_id)[source]#
Model for Report.
- Parameters
report (
dict
) – JSON representation of Report.
- class streamsets.sdk.sch_models.Reports(control_hub, report_definition_id)[source]#
Collection of
streamsets.sdk.sch_models.Report
instances.- Parameters
control_hub (
streamsets.sdk.sch.ControlHub
) – ControlHub object.report_definition_id (
str
) – Report Definition Id.
- class streamsets.sdk.sch_models.ReportDefinition(report_definition, control_hub)[source]#
Model for Report Definition.
- Parameters
report_definition (
dict
) – JSON representation of Report Definition.control_hub (
streamsets.sdk.sch.ControlHub
) – ControlHub instance.
- property acl#
Get Report Definition ACL.
- Returns
An instance of
streamsets.sdk.sch_models.ACL
.
- generate_report()[source]#
Generate a Report for Report Definition.
- Returns
An instance of
streamsets.sdk.sch_models.Report
.
- property report_resources#
Get Report Resources of the Report Definition.
- Returns
An instance of
streamsets.sdk.sch_models.ReportResources
.
- property reports#
Get Reports of the Report Definition.
- Returns
An instance of
streamsets.sdk.sch_models.Reports
.
- class streamsets.sdk.sch_models.ReportDefinitions(control_hub)[source]#
Collection of
streamsets.sdk.sch_models.ReportDefinition
instances.
- class streamsets.sdk.sch_models.ReportDefinitionBuilder(report_definition, control_hub)[source]#
Class with which to build instances of
streamsets.sdk.sch_models.ReportDefinition
.- Instead of instantiating this class directly, most users should use
streamsets.sdk.sch.ControlHub.get_report_definition_builder()
.
- Parameters
report_definition (
dict
) – JSON representation of Report Definition.control_hub (
streamsets.sdk.sch.ControlHub
) – ControlHub instance.
- add_report_resource(resource)[source]#
Add a given resource to Report Definition resources.
- Parameters
resource (
streamsets.sdk.sch_models.Job
) or (streamsets.sdk.sch_models.Topology
) –
- build(name, description=None)[source]#
Build the report definition.
- Parameters
name (
str
) – Name of the Report Definition.description (
str
, optional) – Description of the Report Definition. Default:None
.
- Returns
An instance of
streamsets.sdk.sch_models.ReportDefinition
.
- import_report_definition(report_definition)[source]#
Import an existing Report Definition to update it.
- Parameters
report_definition (
streamsets.sdk.sch_models.ReportDefinition
) – Report Definition object.
- remove_report_resource(resource)[source]#
Remove a resource from Report Definition Resources.
- Parameters
resource (
streamsets.sdk.sch_models.Job
) or (streamsets.sdk.sch_models.Topology
) –- Returns
A resource of type
dict
that is removed from Report Definition Resources.
- class streamsets.sdk.sch_models.ReportResource(report_resource)[source]#
Model for Report Resource.
- Parameters
report_resource (
dict
) – JSON representation of Report Resource.
- class streamsets.sdk.sch_models.ReportResources(report_resources, report_definition)[source]#
Model for the collection of Report Resources.
- Parameters
report_resources (
list
) – List of Report Resources.report_definition (
streamsets.sdk.sch_models.ReportDefinition
) – Report Definition object.
Roles#
- class streamsets.sdk.sch_models.Roles(values, entity, role_label_to_id)[source]#
Wrapper class over the list of Roles.
- Parameters
values (
list
) – List of roles.entity (
streamsets.sdk.sch_models.Group
) – (streamsets.sdk.sch_models.User
): Group or User object.role_label_to_id (
dict
) – Role label to Role ID mapping.
Scheduler#
- class streamsets.sdk.sch_models.ScheduledTask(task, control_hub=None)[source]#
Model for Scheduled Task.
- Parameters
task (
dict
) – JSON representation of task.control_hub (
streamsets.sdk.sch.ControlHub
) – ControlHub instance.
- property acl#
Get the ACL of a Scheduled Task.
- Returns
An instance of
streamsets.sdk.sch_models.ACL
.
- property audits#
Get Scheduled Task Audits.
- Returns
A
streamsets.sdk.utils.SeekableList
of inherited instances ofstreamsets.sdk.sch_models.ScheduledTaskAudit
.
- property runs#
Get Scheduled Task Runs.
- Returns
A
streamsets.sdk.utils.SeekableList
of inherited instances ofstreamsets.sdk.sch_models.ScheduledTaskRun
.
- class streamsets.sdk.sch_models.ScheduledTaskAudit(audit)[source]#
Scheduled Task Audit.
- Parameters
run (
dict
) – JSON representation of scheduled task audit.
- class streamsets.sdk.sch_models.ScheduledTaskBaseModel(data, attributes_to_ignore=None, attributes_to_remap=None, repr_metadata=None)[source]#
Base Model for Scheduled Task related classes.
- class streamsets.sdk.sch_models.ScheduledTaskBuilder(job_selection_types, control_hub)[source]#
Builder for Scheduled Task.
- Instead of instantiating this class directly, most users should use
streamsets.sdk.sch.ControlHub.get_scheduled_task_builder()
.
- Parameters
job_selection_types (
dict
) – JSON representation of job selection types.control_hub (
streamsets.sdk.sch.ControlHub
) – ControlHub instance.
- build(task_object, action='START', name=None, description=None, cron_expression='0 0 1/1 * ? *', time_zone='UTC', status='RUNNING', start_time=None, end_time=None, missed_execution_handling='IGNORE')[source]#
Builder for Scheduled Task.
- Parameters
task_object (
streamsets.sdk.sch_models.Job
) – (streamsets.sdk.sch_models.ReportDefinition
): Job or ReportDefinition object.action (
str
, optional) – One of the {‘START’, ‘STOP’, ‘UPGRADE’} actions. Default:START
.name (
str
, optional) – Name of the task. Default:None
.description (
str
, optional) – Description of the task. Default:None
.crontab_mask (
str
, optional) – Schedule in cron syntax. Default:"0 0 1/1 * ? *"
. (Daily at 12).time_zone (
str
, optional) – Time zone. Default:"UTC"
.status (
str
, optional) – One of the {‘RUNNING’, ‘PAUSED’} statuses. Default:RUNNING
.start_time (
str
, optional) – Start time of task. Default:None
.end_time (
str
, optional) – End time of task. Default:None
.missed_trigger_handling (
str
, optional) – One of {‘IGNORE’, ‘RUN ALL’, ‘RUN ONCE’}. Default:IGNORE
.
- Returns
An instance of
streamsets.sdk.sch_models.ScheduledTask
.
- class streamsets.sdk.sch_models.ScheduledTaskRun(run)[source]#
Scheduled Task Run.
- Parameters
run (
dict
) – JSON representation if scheduled task run.
- class streamsets.sdk.sch_models.ScheduledTasks(control_hub)[source]#
Collection of
streamsets.sdk.sch_models.ScheduledTask
instances.
Subscriptions#
- class streamsets.sdk.sch_models.Subscription(subscription, control_hub)[source]#
Subscription.
- Parameters
subscription (
dict
) – JSON representation of Subscription.control_hub (
streamsets.sdk.sch.ControlHub
) – ControlHub instance.
- property acl#
Get the ACL of an Event Subscription.
- Returns
An instance of
streamsets.sdk.sch_models.ACL
.
- property action#
Action of the Subscription.
- property events#
Events of the Subscription.
- class streamsets.sdk.sch_models.Subscriptions(control_hub)[source]#
Collection of
streamsets.sdk.sch_models.Subscription
instances.- Parameters
control_hub (
streamsets.sdk.sch.ControlHub
) – ControlHub object.
- class streamsets.sdk.sch_models.SubscriptionAction(action)[source]#
Action to take when the Subscription is triggered.
- Parameters
action (
dict
) – JSON representation of an Action for a Subscription.
- class streamsets.sdk.sch_models.SubscriptionAudit(audit)[source]#
Model for subscription audit.
- Parameters
audit (
dict
) – JSON representation of a subscription audit.
- class streamsets.sdk.sch_models.SubscriptionBuilder(subscription, control_hub)[source]#
Builder for Subscription.
- Instead of instantiating this class directly, most users should use
streamsets.sdk.sch.ControlHub.get_subscription_builder()
.
- Parameters
subscription (
dict
) – JSON representation of event subscription.control_hub (
streamsets.sdk.sch.ControlHub
) – ControlHub instance.
- add_event(event_type, filter=None)[source]#
Add event to the Subscription.
- Parameters
event_type (
str
) – Type of event in {‘Job Status Change’, ‘Data SLA Triggered’, ‘Pipeline Committed’, ‘Pipeline Status Change’, ‘Report Generated’, ‘Data Collector not Responding’}.filter (
str
, optional) – Filter to be applied on event. Default:None
.
- build(name, description=None)[source]#
Builder for Scheduled Task.
- Parameters
name (
str
) – Name of Subscription.description (
str
, optional) – Description of subscription. Default:None
.
- Returns
An instance of
streamsets.sdk.sch_models.Subscription
.
- import_subscription(subscription)[source]#
Import an existing Subscription into the builder to update it.
- Parameters
subscription (
streamsets.sdk.sch_models.Subscription
) – Subscription instance.
- remove_event(event_type)[source]#
Remove event from the subscription.
- Parameters
event_type (
str
) – Type of event in {‘Job Status Change’, ‘Data SLA Triggered’, ‘Pipeline Committed’, ‘Pipeline Status Change’, ‘Report Generated’, ‘Data Collector not Responding’}.- Returns
An instance of
streamsets.sdk.sch_models.SubscriptionEvent
.
- set_email_action(recipients, subject=None, body=None)[source]#
Set the Email action.
- Parameters
recipients (
list
) – List of email addresses.subject (
str
, optional) – Subject of the email. Default:None
.body (
str
, optional) – Body of the email. Default:None
.
- set_webhook_action(uri, method='GET', content_type=None, payload=None, auth_type=None, username=None, password=None, timeout=30000, headers=None)[source]#
Set the Webhook action.
- Parameters
uri (
str
) – URI for the Webhook.method (
str
, optional) – HTTP method to use. Default:'GET'
.content_type (
str
, optional) – Content Type of the request. Default:None
.payload (
str
, optional) – Payload of the request. Default:None
.auth_type (
str
, optional) –'basic'
orNone
. Default:None
.username (
str
, optional) – username for the authentication. Default:None
.password (
str
, optional) – password for the authentication. Default:None
.timeout (
int
, optional) – timeout for the Webhook action. Default:30000
.headers (
dict
, optional) – Headers to be sent to the Webhook call. Default:None
.
- class streamsets.sdk.sch_models.SubscriptionEvent(event, control_hub)[source]#
An Event of a Subscription.
- Parameters
event (
dict
) – JSON representation of Events of a Subscription.control_hub (
streamsets.sdk.sch.ControlHub
) – ControlHub instance.
Topologies#
- class streamsets.sdk.sch_models.DataSla(data_sla, control_hub)[source]#
Model for DataSla.
- Parameters
data_sla (
dict
) – JSON representation of SLA.control_hub (
streamsets.sdk.sch.ControlHub
) – An instance of the Control Hub.
- class streamsets.sdk.sch_models.DataSlaBuilder(data_sla, control_hub)[source]#
Class with which to build instances of
streamsets.sdk.sch_models.DataSla
.- Instead of instantiating this class directly, most users should use
streamsets.sdk.sch.ControlHub.get_data_sla_builder()
.
- Parameters
data_sla (
dict
) – Python object built from our Swagger DataSlaJson definition.control_hub (
streamsets.sdk.sch.ControlHub
) – An instance of the Control Hub.
- build(topology, label, job, alert_text, qos_parameter='THROUGHPUT_RATE', function_type='Max', min_max_value=100, enabled=True)[source]#
Build the Data Sla.
- Parameters
topology (
streamsets.sdk.sch_models.Topology
) – Topology object.label (
str
) – Label for the SLA.job (
list
) – List ofstreamsets.sdk.sch_models.Job
objects.alert_text (
str
) – Alert text.qos_parameter (
str
, optional) – paramter in {‘THROUGHPUT_RATE’, ‘ERROR_RATE’}. Default:'THROUGHPUT_RATE'
.function_type (
str
, optional) – paramter in {‘Max’, ‘Min’}. Default:'Max'
.min_max_value (
str
, optional) – Default:100
.enabled (
boolean
, optional) – Default:True
.
- Returns
An instance of
streamsets.sdk.sch_models.DataSla
.
- class streamsets.sdk.sch_models.Topology(topology, control_hub=None)[source]#
Model for Topology.
- Parameters
topology (
dict
) – JSON representation of Topology.
- commit_id#
Pipeline commit id.
- Type
str
- commit_message#
Commit Message.
- Type
str
- commit_time#
Time at which commit was made.
- Type
int
- committed_by#
User that made the commit.
- Type
str
- default_topology#
Default Topology.
- Type
bool
- description#
Topology description.
- Type
str
- draft#
Indicates whether this topology is a draft.
- Type
bool
- last_modified_by#
User that last modified this topology.
- Type
str
- last_modified_on#
Time at which this topology was last modified.
- Type
int
- new_pipeline_version_available#
Whether any job in the topology has a new pipeline version to be updated to.
- Type
bool
- organization#
Id of the organization.
- Type
str
- parent_version#
Version of the parent topology.
- Type
str
- topology_definition#
Definition of the topology.
- Type
str
- topology_id#
Id of the topology.
- Type
str
- topology_name#
Name of the topology.
- Type
str
- validation_issues#
Any validation issues that exist for this Topology.
- Type
dict
- version#
Version of this topology.
- Type
str
- acknowledge_job_errors()[source]#
Acknowledge all errors for the jobs in a topology.
- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- property acl#
Get the ACL of a Topology.
- Returns
An instance of
streamsets.sdk.sch_models.ACL
.
- activate_data_sla(*data_slas)[source]#
Activate Data SLAs.
- Parameters
*data_slas – One or more instances of
streamsets.sdk.sch_models.DataSla
.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- add_data_sla(data_sla)[source]#
Add SLA.
- Parameters
data_sla (
streamsets.sdk.sch_models.DataSla
) – Data SLA object.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- auto_discover_connections()[source]#
Auto discover connecting systems between nodes in a Topology.
- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- auto_fix()[source]#
Auto-fix a topology by rectifying invalid or removed jobs, outdated jobs, etc.
- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- deactivate_data_sla(*data_slas)[source]#
Deactivate Data SLAs.
- Parameters
*data_slas – One or more instances of
streamsets.sdk.sch_models.DataSla
.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- delete_data_sla(*data_slas)[source]#
Delete Data SLAs.
- Parameters
*data_slas – One or more instances of
streamsets.sdk.sch_models.DataSla
.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- property jobs#
Get the jobs that are contained within the Topology.
- Returns
A
streamsets.sdk.utils.SeekableList
ofstreamsets.sdk.sch_models.Job
instances.
- property new_pipeline_version_available#
Determine if a new pipeline version is available for any jobs in the Topology.
- Returns
A (
bool
) value.
- property nodes#
Get the job and system nodes that make up the Topology.
- Returns
- A
streamsets.sdk.utils.SeekableList
ofstreamsets.sdk.sch_models.TopologyNode
instances.
- A
- start_all_jobs()[source]#
Start all jobs of a topology.
- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- stop_all_jobs(force=False)[source]#
Stop all jobs of a topology.
- Parameters
force (
bool
, optional) – Force topology jobs to stop. Default:False
.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- update_jobs_to_latest_change()[source]#
Upgrade a topology’s job(s) to the latest corresponding pipeline change.
- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- property validation_issues#
Get any validation issues that are detected for the Topology.
- Returns
A (
list
) of validation issues in JSON format.
- class streamsets.sdk.sch_models.Topologies(control_hub)[source]#
Collection of
streamsets.sdk.sch_models.Topology
instances.- Parameters
control_hub (
streamsets.sdk.sch.ControlHub
) – An instance of the ControlHub.
- class streamsets.sdk.sch_models.TopologyBuilder(topology, control_hub=None)[source]#
Class with which to build instances of
streamsets.sdk.sch_models.Topology
.- Instead of instantiating this class directly, most users should use
streamsets.sdk.sch.ControlHub.get_topology_builder()
.
- Parameters
topology (
dict
) – Python object built from our Swagger TopologyJson definition.control_hub (
streamsets.sdk.ControlHub
, optional) – Control Hub instance. Default:None
- add_job(job)[source]#
Add a job node to the Topology being built.
- Parameters
job (
streamsets.sdk.sch_models.Job
) – An instance of a job to be added.
- add_system(name)[source]#
Add a system node to the Topology being built.
- Parameters
name (
str
) – The name of the system to add to the topology.
- build(topology_name=None, description=None)[source]#
Build the topology.
- Parameters
topology_name (
str
, optional) – Name of the topology. This parameter is required when building a new topology.description (
str
, optional) – Description of the topology. Default:None
.
- Returns
An instance of
streamsets.sdk.sch_models.Topology
.
- delete_node(topology_node)[source]#
Delete a system or job node from the topology.
- Parameters
topology_node (
streamsets.sdk.sch_models.TopologyNode
) – An instance of a TopologyNode to delete from the topology.
- import_topology(topology)[source]#
Import an existing topology to be used in the builder.
- Parameters
topology (
streamsets.sdk.sch_models.Topology
) – An existing Topology instance to modify.
- property topology_nodes#
Get all of the nodes currently part of the topology held by the TopologyBuilder.
- Returns
- A
streamsets.sdk.utils.SeekableList
ofstreamsets.sdk.sch_models.TopologyNode
instances.
- A
- class streamsets.sdk.sch_models.TopologyNode(topology_node_json)[source]#
Model for a node within a Topology.
- Parameters
topology_node_json (
dict
) – JSON representation of a Topology Node.
- node_type#
The type of this node, i.e. SYSTEM, JOB, etc.
- Type
str
- instance_name#
The name of this node instance.
- Type
str
- stage_name#
The name of the stage in this node.
- Type
str
- stage_version#
The version of the stage in this node.
- Type
str
- job_id#
The ID of the job in this node.
- Type
str
- pipeline_id#
The pipeline ID associated with the job in this node.
- Type
str
- pipeline_commit_id#
The commit ID of the pipeline.
- Type
str
- pipeline_version#
The version of the pipeline.
- Type
str
- input_lanes#
A list of
str
representing the input lanes for this node.- Type
list
- output_lanes#
A list of
str
representing the output lanes for this node.- Type
list
Users#
- class streamsets.sdk.sch_models.User(user, roles)[source]#
Model for User.
- Parameters
user (
dict
) – JSON representation of User.roles (
dict
) – A mapping of role IDs to role labels.
- active#
Whether the user is active or not.
- Type
bool
- created_by#
Creator of this user.
- Type
str
- created_on#
Creation time of this user.
- Type
str
- display_name#
Display name of this user.
- Type
str
- email_address#
Email address of this user.
- Type
str
- id#
Id of this user.
- Type
str
- groups#
Groups this user belongs to.
- Type
list
- last_modified_by#
User last modified by.
- Type
str
- last_modified_on#
User last modification time.
- Type
str
- password_expires_on#
User’s password expiration time.
- Type
str
- password_system_generated#
Whether User’s password is system generated or not.
- Type
bool
- roles#
A set of role labels.
- Type
set
- saml_user_name#
SAML username of user.
- Type
str
- class streamsets.sdk.sch_models.Users(control_hub, roles, organization)[source]#
Collection of
streamsets.sdk.sch_models.User
instances.- Parameters
control_hub – An instance of
streamsets.sdk.sch.ControlHub
.roles (
dict
) – A mapping of role IDs to role labels.organization (
str
) – Organization ID.
- class streamsets.sdk.sch_models.UserBuilder(user, roles, control_hub=None)[source]#
Class with which to build instances of
streamsets.sdk.sch_models.User
.- Instead of instantiating this class directly, most users should use
streamsets.sdk.sch.ControlHub.get_user_builder()
.
- Parameters
user (
dict
) – Python object built from our Swagger UserJson definition.roles (
dict
) – A mapping of role IDs to role labels.control_hub – An instance of
streamsets.sdk.sch.ControlHub
. Default:None
.
- build(email_address)[source]#
Build the user.
- Parameters
email_address (
str
) – User Email Address.- Returns
An instance of
streamsets.sdk.sch_models.User
.
Common#
Models used by StreamSets Data Collector and StreamSets Control Hub:
- class streamsets.sdk.models.Configuration(configuration=None, property_key='name', property_value='value', **kwargs)[source]#
Abstraction for stage configurations.
This class enables easy access to and modification of data stored as a list of dictionaries. As an example, SDC’s pipeline configuration is stored in the form
[ { "name" : "executionMode", "value" : "STANDALONE" }, { "name" : "deliveryGuarantee", "value" : "AT_LEAST_ONCE" }, ... ]
By implementing simple
__getitem__
and__setitem__
methods, this class allows items in this list to be accessed usingconfiguration['executionMode'] = 'CLUSTER_BATCH'
Instead of the more verbose
for property in configuration: if property['name'] == 'executionMode': property['value'] = 'CLUSTER_BATCH' break
- Parameters
configuration (
str
) – List of dictionaries comprising the configuration.property_key (
str
, optional) – The dictionary entry denoting the property key. Default:name
property_value (
str
, optional) – The dictionary entry denoting the property value. Default:value
- get(key, default=None)[source]#
Return the value of key or, if not in the configuration, the default value.
Exceptions#
Common exceptions.
- exception streamsets.sdk.exceptions.BadRequestError(response)[source]#
Bad request error (HTTP 400).
- exception streamsets.sdk.exceptions.ConnectionError(code, message)[source]#
Connection Catalog errors.
- exception streamsets.sdk.exceptions.InvalidCredentialsError(message)[source]#
Invalid credentials error.
- exception streamsets.sdk.exceptions.JobInactiveError(message)[source]#
Job status changed into INACTIVE_ERROR.
- exception streamsets.sdk.exceptions.LegacyDeploymentInactiveError(message)[source]#
Legacy deployment status changed into INACTIVE_ERROR.
- exception streamsets.sdk.exceptions.MultipleIssuesError(errors)[source]#
Multiple errors were returned.
- exception streamsets.sdk.exceptions.UnprocessableEntityError(response)[source]#
Unprocessable Entity Error (HTTP 422).