StreamSets Control Hub
Section Contents
StreamSets Control Hub#
Main interface#
This is the main entry point used by users when interacting with SCH instances.
- class streamsets.sdk.ControlHub(server_url, username, password)[source]#
Class to interact with StreamSets Control Hub.
- Parameters
server_url (
str
) – SCH server base URL.username (
str
) – SCH username.password (
str
) – SCH password.
- acknowledge_deployment_error(*deployments)[source]#
Acknowledge errors for one or more deployments.
- Parameters
*deployments – One or more instances of
streamsets.sdk.sch_models.Deployment
.
- acknowledge_event_subscription_error(subscription)[source]#
Acknowledge an error on given Event Subscription.
- Parameters
subscription (
streamsets.sdk.sch_models.Subscription
) – A Subscription instance.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- acknowledge_job_error(*jobs)[source]#
Acknowledge errors for one or more jobs.
- Parameters
*jobs – One or more instances of
streamsets.sdk.sch_models.Job
.
- property action_audits#
Action Audits.
- Returns
An instance of
streamsets.sdk.sch_models.ActionAudits
.
- activate_datacollector(data_collector)[source]#
Activate data collector.
- Parameters
data_collector (
streamsets.sdk.sch_models.DataCollector
) – Data Collector object.
- activate_provisioning_agent(provisioning_agent)[source]#
Activate provisioning agent.
- Parameters
provisioning_agent (
streamets.sdk.sch_models.ProvisioningAgent
) –- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- add_classification_rule(classification_rule, commit=False)[source]#
Add a classification rule.
- Parameters
classification_rule (
streamsets.sdk.sch_models.ClassificationRule
) – Classification Rule object.commit (
bool
, optional) – Whether to commit the rule after adding it. Default:False
.
- add_connection(connection)[source]#
Add a connection.
- Parameters
connection (
streamsets.sdk.sch_models.Connection
) – Connection object.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- add_deployment(deployment)[source]#
Add a deployment.
- Parameters
deployment (
streamsets.sdk.sch_models.Deployment
) – Deployment object.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- add_group(group)[source]#
Add a group.
- Parameters
group (
streamsets.sdk.sch_models.Group
) – Group object.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- add_job(job)[source]#
Add a job.
- Parameters
job (
streamsets.sdk.sch_models.Job
) – Job object.- Returns
An instance of
streamsets.sdk.sch_models.Job
.
- add_organization(organization)[source]#
Add an organization.
- Parameters
organization (
streamsets.sdk.sch_models.Organization
) – Organization object.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- add_protection_policy(protection_policy)[source]#
Add a protection policy.
- Parameters
protection_policy (
streamsets.sdk.sch_models.ProtectionPolicy
) – Protection Policy object.
- add_report_definition(report_definition)[source]#
Add Report Definition to Control Hub.
- Parameters
report_definition (
streamsets.sdk.sch_models.ReportDefinition
) – Report Definition instance.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- add_scheduled_task(task)[source]#
Add the scheduled task to Control Hub.
- Parameters
task (
streamsets.sdk.sch_models.ScheduledTask
) – Scheduled task object.- Returns
An instance of
streamsets.sdk.sch_api.Command
.- Raises
Exception – Thrown if publishing the task was unsuccessful
- add_subscription(subscription)[source]#
Add Subscription to Control Hub.
- Parameters
subscription (
streamsets.sdk.sch_models.Subscription
) – A Subscription instance.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- add_user(user)[source]#
- Add a user. Some user attributes are updated by SCH such as
created_by, created_on, last_modified_by, last_modified_on, password_expires_on, password_system_generated.
- Parameters
user (
streamsets.sdk.sch_models.User
) – User object.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- property alerts#
Alerts.
- Returns
A
streamsets.sdk.utils.SeekableList
ofstreamsets.sdk.sch_models.Alert
.
- balance_data_collectors(*data_collectors)[source]#
Balance all jobs running on given Data Collectors.
- Parameters
*sdcs – One or more instances of
streamsets.sdk.sch_models.DataCollector
.
- balance_job(*jobs)[source]#
Balance one or more jobs.
- Parameters
*jobs – One or more instances of
streamsets.sdk.sch_models.Job
.
- property connection_audits#
Connection Audits.
- Returns
An instance of
streamsets.sdk.sch_models.ConnectionAudits
.
- property connection_tags#
Connection Tags.
- Returns
An instance of
streamsets.sdk.sch_models.ConnectionTags
.
- property connections#
Connections.
- Returns
An instance of
streamsets.sdk.sch_models.Connections
.
- create_components(component_type, number_of_components=1, active=True)[source]#
Create components.
- Parameters
component_type (
str
) – Component type.number_of_components (
int
, optional) – Default:1
.active (
bool
, optional) – Default:True
.
- Returns
An instance of
streamsets.sdk.sch_api.CreateComponentsCommand
.
- property data_collectors#
Data Collectors registered to the Control Hub instance.
- Returns
Returns a
streamsets.sdk.utils.SeekableList
ofstreamsets.sdk.sch_models.DataCollector
instances.
- property data_protector_enabled#
Whether Data Protector is enabled for the current organization.
- Type
bool
- property data_protector_version#
Returns the StreamSets Data Protector version string configured in the system data collector. If data protector is not enabled a None value is returned
- Type
str
- deactivate_datacollector(data_collector)[source]#
Deactivate data collector.
- Parameters
data_collector (
streamsets.sdk.sch_models.DataCollector
) – Data Collector object.
- deactivate_provisioning_agent(provisioning_agent)[source]#
Deactivate provisioning agent.
- Parameters
provisioning_agent (
streamets.sdk.sch_models.ProvisioningAgent
) –- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- deactivate_user(*users, organization=None)[source]#
Deactivate Users for all given User IDs.
- Parameters
*users – One or more instances of
streamsets.sdk.sch_models.User
.organization (
str
, optional) – Default:None
. If not specified, current organization will be used.
- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- delete_and_unregister_data_collector(data_collector)[source]#
Delete and Unregister data collector.
- Parameters
data_collector (
streamsets.sdk.sch_models.DataCollector
) – Data Collector object.
- delete_connection(*connections)[source]#
Delete connections.
- Parameters
*connections – One or more instances of
streamsets.sdk.sch_models.Connection
.
- delete_data_collector(data_collector)[source]#
Delete data collector.
- Parameters
data_collector (
streamsets.sdk.sch_models.DataCollector
) – Data Collector object.
- delete_deployment(*deployments)[source]#
Delete deployments.
- Parameters
*deployments – One or more instances of
streamsets.sdk.sch_models.Deployment
.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- delete_group(*groups)[source]#
Delete groups.
- Parameters
*groups – One or more instances of
streamsets.sdk.sch_models.Group
.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- delete_job(*jobs)[source]#
Delete one or more jobs.
- Parameters
*jobs – One or more instances of
streamsets.sdk.sch_models.Job
.
- delete_pipeline(pipeline, only_selected_version=False)[source]#
Delete a pipeline.
- Parameters
pipeline (
streamsets.sdk.sch_models.Pipeline
) – Pipeline object.only_selected_version (
boolean
) – Delete only current commit.
- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- delete_pipeline_labels(*pipeline_labels)[source]#
Delete pipeline labels.
- Parameters
*pipeline_labels – One or more instances of
streamsets.sdk.sch_models.PipelineLabel
.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- delete_provisioning_agent(provisioning_agent)[source]#
Delete provisioning agent.
- Parameters
provisioning_agent (
streamets.sdk.sch_models.ProvisioningAgent
) –- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- delete_provisioning_agent_token(provisioning_agent)[source]#
Delete provisioning agent token.
- Parameters
provisioning_agent (
streamets.sdk.sch_models.ProvisioningAgent
) –- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- delete_report_definition(report_definition)[source]#
Delete an existing Report Definition.
- Parameters
report_definition (
streamsets.sdk.sch_models.ReportDefinition
) – Report Definition instance.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- delete_subscription(subscription)[source]#
Delete an exisiting Subscription.
- Parameters
subscription (
streamsets.sdk.sch_models.Subscription
) – A Subscription instance.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- delete_topology(topology, only_selected_version=False)[source]#
Delete a topology.
- Parameters
topology (
streamsets.sdk.sch_models.Topology
) – Topology object.only_selected_version (
boolean
) – Delete only current commit.
- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- delete_user(*users, deactivate=False)[source]#
Delete users. Deactivate users before deleting if configured.
- Parameters
*users – One or more instances of
streamsets.sdk.sch_models.User
.deactivate (
bool
, optional) – Default:False
.
- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- property deployments#
Deployments.
- Returns
An instance of
streamsets.sdk.sch_models.Deployments
.
- duplicate_job(job, name=None, description=None, number_of_copies=1)[source]#
Duplicate an existing job.
- Parameters
job (
streamsets.sdk.sch_models.Job
) – Job object.name (
str
, optional) – Name of the new job(s). Default:None
. If not specified, name of the job with' copy'
appended to the end will be used.description (
str
, optional) – Description for new job(s). Default:None
.number_of_copies (
int
, optional) – Number of copies. Default:1
.
- Returns
A
streamsets.sdk.utils.SeekableList
ofstreamsets.sdk.sch_models.Job
.
- duplicate_pipeline(pipeline, name=None, description='New Pipeline', number_of_copies=1)[source]#
Duplicate an existing pipeline.
- Parameters
pipeline (
streamsets.sdk.sch_models.Pipeline
) – Pipeline object.name (
str
, optional) – Name of the new pipeline(s). Default:None
.description (
str
, optional) – Description for new pipeline(s). Default:'New Pipeline'
.number_of_copies (
int
, optional) – Number of copies. Default:1
.
- Returns
A
streamsets.sdk.utils.SeekableList
ofstreamsets.sdk.sch_models.Pipeline
.
- edit_job(job)[source]#
Edit a job.
- Parameters
job (
streamsets.sdk.sch_models.Job
) – Job object.- Returns
An instance of
streamsets.sdk.sch_models.Job
.
- export_jobs(jobs)[source]#
Export jobs to a compressed archive.
- Parameters
jobs (
list
) – A list ofstreamsets.sdk.sch_models.Job
instances.- Returns
An instance of type
bytes
indicating the content of zip file with job json files.
- export_pipelines(pipelines, fragments=False, include_plain_text_credentials=False)[source]#
Export pipelines.
- Parameters
pipelines (
list
) – A list ofstreamsets.sdk.sch_models.Pipeline
instances.fragments (
bool
) – Indicates if exporting fragments is needed.include_plain_text_credentials (
bool
) – Indicates if plain text credentials should be included.
- Returns
An instance of type
bytes
indicating the content of zip file with pipeline json files.
- export_protection_policies(protection_policies)[source]#
Export protection policies to a compressed archive.
- Parameters
protection_policies (
list
) – A list ofstreamsets.sdk.sch_models.ProtectionPolicy
instances. –
- Returns
An instance of type
bytes
indicating the content of zip file with protection policy json files.
- export_topologies(topologies)[source]#
Export topologies.
- Parameters
topologies (
list
) – A list ofstreamsets.sdk.sch_models.Topology
instances.- Returns
An instance of type
bytes
indicating the content of zip file with pipeline json files.
- get_admin_tool(base_url, username, password)[source]#
Get SCH admin tool.
- Returns
An instance of
streamsets.sdk.sch_models.AdminTool
.
- get_classification_rule_builder()[source]#
Get a classification rule builder instance with which a classification rule can be created.
- Returns
An instance of
streamsets.sdk.sch_models.ClassificationRuleBuilder
.
- get_components(component_type_id, offset=None, len_=None, order_by='LAST_VALIDATED_ON', order='ASC')[source]#
Get components.
- Parameters
component_type_id (
str
) – Component type id.offset (
str
, optional) – Default:None
.len (
str
, optional) – Default:None
.order_by (
str
, optional) – Default:'LAST_VALIDATED_ON'
.order (
str
, optional) – Default:'ASC'
.
- get_connection_builder()[source]#
Get a connection builder instance with which a connection can be created.
- Returns
An instance of
streamsets.sdk.sch_models.ConnectionBuilder
.
- get_current_job_status(job)[source]#
Returns the current job status for given job id.
- Parameters
job (
streamsets.sdk.sch_models.Job
) – Job object.
- get_data_collector_labels(data_collector)[source]#
Returns all labels assigned to data collector.
- Parameters
data_collector (
streamsets.sdk.sch_models.DataCollector
) – Data Collector object.- Returns
A
list
of data collector assigned labels.
- get_data_sla_builder()[source]#
Get a Data SLA builder instance with which a Data SLA can be created.
- Returns
An instance of
streamsets.sdk.sch_models.DataSlaBuilder
.
- get_deployment_builder()[source]#
Get a deployment builder instance with which a deployment can be created.
- Returns
An instance of
streamsets.sdk.sch_models.DeploymentBuilder
.
- get_group_builder()[source]#
Get a group builder instance with which a group can be created.
- Returns
An instance of
streamsets.sdk.sch_models.GroupBuilder
.
- get_job_builder()[source]#
Get a job builder instance with which a job can be created.
- Returns
An instance of
streamsets.sdk.sch_models.JobBuilder
.
- get_organization_builder()[source]#
Get an organization builder instance with which an organization can be created.
- Returns
An instance of
streamsets.sdk.sch_models.OrganizationBuilder
.
- get_pipeline_builder(data_collector=None, transformer=None, fragment=False)[source]#
Get a pipeline builder instance with which a pipeline can be created.
- Parameters
data_collector (
streamsets.sdk.sch_models.DataCollector
, optional) – The Data Collector in which to author the pipeline. If omitted, Control Hub’s system SDC will be used. Default:None
.transformer (
streamsets.sdk.sch_models.Transformer
, optional) – The Transformer in which to author the pipeline.fragment (
boolean
, optional) – Specify if a fragment builder. Default:False
.
- Returns
An instance of
streamsets.sdk.sch_models.PipelineBuilder
orstreamsets.sdk.sch_models.StPipelineBuilder
.
- get_protection_method_builder()[source]#
Get a protection method builder instance with which a protection method can be created.
- Returns
An instance of
streamsets.sdk.sch_models.ProtectionMethodBuilder
.
- get_protection_policy_builder()[source]#
Get a protection policy builder instance with which a protection policy can be created.
- Returns
An instance of
streamsets.sdk.sch_models.ProtectionPolicyBuilder
.
- get_report_definition_builder()[source]#
Get a Report Definition Builder instance with which a Report Definition can be created.
- Returns
An instance of
streamsets.sdk.sch_models.ReportDefinitionBuilder
.
- get_scheduled_task_builder()[source]#
Get a scheduled task builder instance with which a scheduled task can be created.
- Returns
An instance of
streamsets.sdk.sch_models.ScheduledTaskBuilder
.
- get_subscription_builder()[source]#
Get Event Subscription Builder.
- Returns
An instance of
streamsets.sdk.sch_models.SubscriptionBuilder
.
- get_support_bundle(redact_sensitive_fields=False, field_trim_threshold_num_chars=64, max_log_file_size_bytes=268435456)[source]#
Get Support Bundle.
- Parameters
redact_sensitive_fields (
bool
, optional) – Default: False.field_trim_threshold_num_chars (
bool
, optional) – Default: 64.max_log_file_size_bytes (
bool
, optional) – Default: 268435456.
- Returns
Binary data in the form of (
str
).
- get_topology_builder()[source]#
Get a topology builder instance with which a topology can be created.
- Returns
An instance of
streamsets.sdk.sch_models.TopologyBuilder
.
- get_user_builder()[source]#
Get a user builder instance with which a user can be created.
- Returns
An instance of
streamsets.sdk.sch_models.UserBuilder
.
- property groups#
Groups.
- Returns
An instance of
streamsets.sdk.sch_models.Groups
.
- import_jobs(archive, pipeline=True, number_of_instances=False, labels=False, runtime_parameters=False, **kwargs)[source]#
Import jobs from archived zip directory.
- Parameters
archive (
file
) – file containing the jobs.pipeline (
boolean
, optional) – Indicate if pipeline should be imported. Default:True
.number_of_instances (
boolean
, optional) – Indicate if number of instances should be imported. Default:False
.labels (
boolean
, optional) – Indicate if labels should be imported. Default:False
.runtime_parameters (
boolean
, optional) – Indicate if runtime parameters should be imported. Default:False
.
- Returns
A
streamsets.sdk.utils.SeekableList
ofstreamsets.sdk.sch_models.Job
.
- import_pipeline(pipeline, commit_message, name=None, data_collector_instance=None)[source]#
Import pipeline from json file.
- Parameters
pipeline (
dict
) – A python dict representation of ControlHub Pipeline.commit_message (
str
) – Commit message.name (
str
, optional) – Name of the pipeline. If left out, pipeline name from JSON object will be used. DefaultNone
.data_collector_instance (
streamsets.sdk.sch_models.DataCollector
) – If excluded, system sdc will be used. DefaultNone
.
- Returns
An instance of
streamsets.sdk.sch_models.Pipeline
.
- import_pipelines_from_archive(archive, commit_message, fragments=False)[source]#
Import pipelines from archived zip directory.
- Parameters
archive (
file
) – file containing the pipelines.commit_message (
str
) – Commit message.fragments (
bool
, optional) – Indicates if pipeline contains fragments.
- Returns
A
streamsets.sdk.utils.SeekableList
ofstreamsets.sdk.sch_models.Pipeline
.
- import_protection_policies(policies_archive)[source]#
Import protection policies from a compressed archive.
- Parameters
policies_archive (
file
) – file containing the protection policies.- Returns
A py:class:streamsets.sdk.utils.SeekableList of
streamsets.sdk.sch_models.ProtectionPolicy
.
- import_topologies(archive, import_number_of_instances=False, import_labels=False, import_runtime_parameters=False, **kwargs)[source]#
Import topologies from archived zip directory.
- Parameters
archive (
file
) – file containing the topologies.import_number_of_instances (
boolean
, optional) – Indicate if number of instances should be imported. Default:False
.import_labels (
boolean
, optional) – Indicate if labels should be imported. Default:False
.import_runtime_parameters (
boolean
, optional) – Indicate if runtime parameters should be imported. Default:False
.
- Returns
A
streamsets.sdk.utils.SeekableList
ofstreamsets.sdk.sch_models.Topology
.
- property jobs#
Jobs.
- Returns
An instance of
streamsets.sdk.sch_models.Jobs
.
- property ldap_enabled#
Indication if LDAP is enabled or not.
- Returns
An instance of
boolean
.
- property login_audits#
Login Audits.
- Returns
An instance of
streamsets.sdk.sch_models.LoginAudits
.
- property organizations#
Organizations.
- Returns
An instance of
streamsets.sdk.sch_models.Organizations
.
- property pipeline_labels#
Pipeline labels.
- Returns
An instance of
streamsets.sdk.sch_models.PipelineLabels
.
- property pipelines#
Pipelines.
- Returns
An instance of
streamsets.sdk.sch_models.Pipelines
.
- preview_classification_rule(classification_rule, parameter_data, data_collector=None)[source]#
Dynamic preview of a classification rule.
- Parameters
classification_rule (
streamsets.sdk.sch_models.ClassificationRule
) – Classification Rule object.parameter_data (
dict
) – A python dict representation of raw JSON parameters required for preview.data_collector (
streamsets.sdk.sch_models.DataCollector
, optional) – The Data Collector in which to preview the pipeline. If omitted, Control Hub’s first executor SDC will be used. Default:None
.
- Returns
An instance of
streamsets.sdk.sdc_api.PreviewCommand
.
- property protection_policies#
Protection policies.
- Returns
An instance of
streamsets.sdk.sch_models.ProtectionPolicies
.
- property provisioning_agents#
Provisioning Agents registered to the Control Hub instance.
- Returns
An instance of
streamsets.sdk.sch_models.ProvisioningAgents
.
- publish_pipeline(pipeline, commit_message='New pipeline', draft=False)[source]#
Publish a pipeline.
- Parameters
pipeline (
streamsets.sdk.sch_models.Pipeline
) – Pipeline object.commit_message (
str
, optional) – Default:'New pipeline'
.draft (
boolean
, optional) – Default:False
.
- publish_scheduled_task(task)[source]#
Send the scheduled task to Control Hub. DEPRECATED
- Parameters
task (
streamsets.sdk.sch_models.ScheduledTask
) – Scheduled task object.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- publish_topology(topology, commit_message=None)[source]#
Publish a topology.
- Parameters
topology (
streamsets.sdk.sch_models.Topology
) – Topology object to publish.commit_message (
str
, optional) – Commit message to supply with the Topology. Default:None
- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- property report_definitions#
Report Definitions.
- Returns
An instance of
streamsets.sdk.sch_models.ReportDefinitions
.
- reset_origin(*jobs)[source]#
Reset all pipeline offsets for given jobs.
- Parameters
*jobs – One or more instances of
streamsets.sdk.sch_models.Job
.- Returns
A
streamsets.sdk.utils.SeekableList
ofstreamsets.sdk.sch_models.Job
.
- run_pipeline_preview(pipeline, batches=1, batch_size=10, skip_targets=True, skip_lifecycle_events=True, timeout=120000, test_origin=False, read_policy=None, write_policy=None, executor=None, **kwargs)[source]#
Run pipeline preview.
- Parameters
pipeline (
streamsets.sdk.sch_models.Pipeline
) – Pipeline object.batches (
int
, optional) – Number of batches. Default:1
.batch_size (
int
, optional) – Batch size. Default:10
.skip_targets (
bool
, optional) – Skip targets. Default:True
.skip_lifecycle_events (
bool
, optional) – Skip life cycle events. Default:True
.timeout (
int
, optional) – Timeout. Default:120000
.test_origin (
bool
, optional) – Test origin. Default:False
.read_policy (
streamsets.sdk.sch_models.ProtectionPolicy
) – Read Policy for preview. If not provided, uses default read policy if one available. Default:None
.write_policy (
streamsets.sdk.sch_models.ProtectionPolicy
) – Write Policy for preview. If not provided, uses default write policy if one available. Default:None
.executor (
streamsets.sdk.sch_models.DataCollector
, optional) – The Data Collector in which to preview the pipeline. If omitted, Control Hub’s first executor SDC will be used. Default:None
.
- Returns
An instance of
streamsets.sdk.sdc_api.PreviewCommand
.
- scale_deployment(deployment, num_instances)[source]#
Scale up/down active deployment.
- Parameters
deployment (
streamsets.sdk.sch_models.Deployment
) – Deployment object.num_instances (
int
) – Number of sdc instances.
- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- property scheduled_tasks#
Scheduled Tasks.
- Returns
An instance of
streamsets.sdk.sch_models.ScheduledTasks
.
- set_default_read_protection_policy(protection_policy)[source]#
Set a default read protection policy.
- Parameters
protection_policy –
:param (
streamsets.sdk.sch_models.ProtectionPolicy
): Protection :param Policy object to be set as the default read policy.:- Returns
An updated instance of
streamsets.sdk.sch_models.ProtectionPolicy
.- Raises
UnsupportedMethodError – Only supported on Control Hub version 3.14.0+.
- set_default_write_protection_policy(protection_policy)[source]#
Set a default write protection policy.
- Parameters
protection_policy –
:param (
streamsets.sdk.sch_models.ProtectionPolicy
): Protection :param Policy object to be set as the default write policy.:- Returns
An updated instance of
streamsets.sdk.sch_models.ProtectionPolicy
.- Raises
UnsupportedMethodError – Only supported on Control Hub version 3.14.0+.
- set_user(username, password)[source]#
Set the user by which subsequent actions will be run.
- Parameters
username (
str
) – SCH username.password (
str
) – SCH password.
- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- start_deployment(deployment, **kwargs)[source]#
Start Deployment.
- Parameters
deployment (
streamsets.sdk.sch_models.Deployment
) – Deployment instance.wait (
bool
, optional) – Wait for deployment to start. Default:True
.wait_for_statuses (
list
, optional) – Deployment statuses to wait on. Default:['ACTIVE']
.
- Returns
An instance of
streamsets.sdk.sch_api.DeploymentStartStopCommand
.
- start_job(*jobs, wait=True, **kwargs)[source]#
Start one or more jobs.
- Parameters
*jobs – One or more instances of
streamsets.sdk.sch_models.Job
.wait (
bool
, optional) – Wait for pipelines to reach RUNNING status before returning. Default:True
.**kwargs – Arbitrary keyword arguments.
- Returns
An instance of
streamsets.sdk.sch_api.StartJobsCommand
.
- start_job_template(job_template, instance_name_suffix='COUNTER', parameter_name=None, runtime_parameters=None, number_of_instances=1, wait_for_data_collectors=False)[source]#
Start Job instances from a Job Template.
- Parameters
job_template (
streamsets.sdk.sch_models.Job
) – A Job instance with the property job_template set toTrue
.instance_name_suffix (
str
, optional) – Suffix to be used for Job names in {‘COUNTER’, ‘TIME_STAMP’, ‘PARAM_VALUE’}. Default:COUNTER
.parameter_name (
str
, optional) – Specified when instance_name_suffix is ‘PARAM_VALUE’. Default:None
.runtime_parameters (
dict
) or (list
) – Runtime Parameters to be used in the jobs. If a dict is specified,number_of_instances
jobs will be started. If a list is specified,number_of_instances
is ignored and job instances will be started using the elements of the list as Runtime Parameters for each job. If left out, Runtime Parameters from Job Template will be used. Default:None
.number_of_instances (
int
, optional) – Number of instances to be started using given parameters. Default:1
.wait_for_data_collectors (
bool
, optional) – Default:False
.
- Returns
A
streamsets.sdk.utils.SeekableList
ofstreamsets.sdk.sch_models.Job
instances.
- stop_deployment(deployment, wait_for_statuses=['INACTIVE'])[source]#
Stop Deployment.
- Parameters
deployment (
streamsets.sdk.sch_models.Deployment
) – Deployment instance.wait_for_statuses (
list
, optional) – List of statuses to wait for. Default:['INACTIVE']
.
- Returns
An instance of
streamsets.sdk.sch_api.DeploymentStartStopCommand
.
- stop_job(*jobs, force=False, timeout_sec=300)[source]#
Stop one or more jobs.
- Parameters
*jobs – One or more instances of
streamsets.sdk.sch_models.Job
.force (
bool
, optional) – Force job to stop. Default:False
.timeout_sec (
int
, optional) – Timeout in secs. Default:300
.
- stop_test_pipeline_run(start_pipeline_command)[source]#
Stop the test run of pipeline.
- Parameters
start_pipeline_command (
streamsets.sdk.sdc_api.StartPipelineCommand
) –- Returns
An instance of
streamsets.sdk.sdc_api.StopPipelineCommand
.
- property subscription_audits#
Subscription audits.
- Returns
A
streamsets.sdk.utils.SeekableList
ofstreamsets.sdk.sch_models.SubscriptionAudit
.
- property subscriptions#
Event Subscriptions.
- Returns
An instance of
streamsets.sdk.sch_models.Subscriptions
.
- sync_job(*jobs)[source]#
Sync one or more jobs.
- Parameters
*jobs – One or more instances of
streamsets.sdk.sch_models.Job
.
- test_pipeline_run(pipeline, reset_origin=False, parameters=None)[source]#
Test run a pipeline.
- Parameters
pipeline (
streamsets.sdk.sch_models.Pipeline
) – Pipeline object.reset_origin (
boolean
, optional) – Default:False
.parameters (
dict
, optional) – Pipeline parameters. Default:None
.
- Returns
An instance of
streamsets.sdk.sdc_api.StartPipelineCommand
.
- property topologies#
Topologies.
- Returns
An instance of
streamsets.sdk.sch_models.Topologies
.
- property transformers#
Transformers registered to the Control Hub instance.
- Returns
Returns a
streamsets.sdk.utils.SeekableList
ofstreamsets.sdk.sch_models.Transformer
instances.
- update_connection(connection)[source]#
Update a connection.
- Parameters
connection (
streamsets.sdk.sch_models.Connection
) – Connection object.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- update_data_collector_labels(data_collector)[source]#
Update data collector labels.
- Parameters
data_collector (
streamsets.sdk.sch_models.DataCollector
) – Data Collector object.- Returns
An instance of
streamsets.sdk.sch_models.DataCollector
.
- update_data_collector_resource_thresholds(data_collector, max_cpu_load=None, max_memory_used=None, max_pipelines_running=None)[source]#
Updates data collector resource thresholds.
- Parameters
data_collector (
streamsets.sdk.sch_models.DataCollector
) – Data Collector object.max_cpu_load (
float
, optional) – Max CPU load in percentage. Default:None
.max_memory_used (
int
, optional) – Max memory used in MB. Default:None
.max_pipelines_running (
int
, optional) – Max pipelines running. Default:None
.
- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- update_deployment(deployment)[source]#
Update a deployment.
- Parameters
deployment (
streamsets.sdk.sch_models.Deployment
) – Deployment object.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- update_group(group)[source]#
Update a group.
- Parameters
group (
streamsets.sdk.sch_models.Group
) – Group object.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- update_job(job)[source]#
Update a job.
- Parameters
job (
streamsets.sdk.sch_models.Job
) – Job object.- Returns
An instance of
streamsets.sdk.sch_models.Job
.
- update_organization(organization)[source]#
Update an organization.
- Parameters
organization (
streamsets.sdk.sch_models.Organization
) – Organization instance.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- update_pipelines_with_different_fragment_version(pipelines, from_fragment_version, to_fragment_version)[source]#
Update pipelines with latest pipeline fragment commit version.
- Parameters
pipelines (
list
) – List ofstreamsets.sdk.sch_models.Pipeline
instances.from_fragment_version (
streamsets.sdk.sch_models.PipelineCommit
) – commit of fragment from which the pipeline needs to be updated.to_fragment_version (
streamsets.sdk.sch_models.PipelineCommit
) – commit of fragment to which the pipeline needs to be updated.
- Returns
An instance of
streamsets.sdk.sch_api.Command
- update_report_definition(report_definition)[source]#
Update an existing Report Definition.
- Parameters
report_definition (
streamsets.sdk.sch_models.ReportDefinition
) – Report Definition instance.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- update_scheduled_task(task)[source]#
Update an existing scheduled task in Control Hub.
- Parameters
task (
streamsets.sdk.sch_models.ScheduleTask
) – Scheduled task object.- Returns
An instance of py:class:streamsets.sdk.sch_api.Command.
- update_subscription(subscription)[source]#
Update an existing Subscription.
- Parameters
subscription (
streamsets.sdk.sch_models.Subscription
) – A Subscription instance.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- update_user(user)[source]#
- Update a user. Some user attributes are updated by SCH such as
last_modified_by, last_modified_on.
- Parameters
user (
streamsets.sdk.sch_models.User
) – User object.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- upgrade_job(*jobs)[source]#
Upgrade job(s) to latest pipeline version.
- Parameters
*jobs – One or more instances of
streamsets.sdk.sch_models.Job
.- Returns
A
streamsets.sdk.utils.SeekableList
ofstreamsets.sdk.sch_models.Job
.
- upload_offset(job, offset_file=None, offset_json=None)[source]#
Upload offset for given job.
- Parameters
job (
streamsets.sdk.sch_models.Job
) – Job object.offset_file (
file
, optional) – File containing the offsets. Default:None
. Exactly one ofoffset_file
,offset_json
should specified.offset_json (
dict
, optional) – Contents of offset. Default:None
. Exactly one ofoffset_file
,offset_json
should specified.
- Returns
An instance of
streamsets.sdk.sch_models.Job
.
- property users#
Users.
- Returns
An instance of
streamsets.sdk.sch_models.Users
.
- validate_pipeline(pipeline)[source]#
Validate a pipeline.
- Parameters
pipeline (
streamsets.sdk.sch_models.pipeline
) – A pipeline instance to be validated.- Raises
streamsets.sdk.exceptions.ValidationError – If validation fails.
- verify_connection(connection)[source]#
Verify connection.
- Parameters
connection (
streamsets.sdk.sch_models.Connection
) – Connection object.- Returns
An instance of
streamsets.sdk.sch_models.ConnectionVerificationResult
.
- wait_for_job_metrics_record_count(job, count, timeout_sec=200)[source]#
Block until a job’s metrics reaches the desired count.
- Parameters
job (
streamsets.sdk.sch_models.Job
) – The job instance.count (
int
) – The desired value to wait for.timeout_sec (
int
, optional) – Timeout to wait formetric
to reachcount
, in seconds. Default:streamsets.sdk.sch.DEFAULT_WAIT_FOR_METRIC_TIMEOUT
.
- Raises
TimeoutError – If
timeout_sec
passes withoutmetric
reachingvalue
.
- wait_for_job_pipeline_metric(job, metric, value, timeout_sec=200)[source]#
Block until a job’s pipeline metric reaches the desired value. Note: Currently this method works only for job with DataCollector executor.
- Parameters
job (
streamsets.sdk.sch_models.Job
) – The job instance.metric (
str
) – The desired metric (e.g.'output_record_count'
or'data_batch_count'
).value (
int
) – The desired value to wait for.timeout_sec (
int
, optional) – Timeout to wait formetric
to reachvalue
, in seconds. Default:streamsets.sdk.sch.DEFAULT_WAIT_FOR_METRIC_TIMEOUT
.
- Raises
TimeoutError – If
timeout_sec
passes withoutmetric
reachingvalue
.
- wait_for_job_status(job, status, check_failures=False, timeout_sec=200)[source]#
Block until a job reaches the desired status.
- Parameters
job (
streamsets.sdk.sch_models.Job
) – The job instance.status (
str
) – The desired status to wait for.check_failures (
bool
, optional) – Flag to check for job exceptions. Default: Falsetimeout_sec (
int
, optional) – Timeout to wait forjob
to reachstatus
, in seconds. Default:streamsets.sdk.sch.DEFAULT_WAIT_FOR_STATUS_TIMEOUT
.
- Raises
TimeoutError – If
timeout_sec
passes withoutjob
reachingstatus
.
Models#
These models wrap and provide useful functionality for interacting with common SCH abstractions.
ACLs#
- class streamsets.sdk.sch_models.ACL(acl, control_hub)[source]#
Represents an ACL.
- Parameters
acl (
dict
) – JSON representation of an ACL.control_hub (
streamsets.sdk.sch.ControlHub
) – Control Hub object.
- permissions#
A Collection of Permissions.
- Type
streamsets.sdk.sch_models.Permissions
- add_permission(permission)[source]#
Add new permission to the ACL.
- Parameters
permission (
streamsets.sdk.sch_models.Permission
) – A permission object.- Returns
An instance of
streamsets.sdk.sch_api.Command
- property permission_builder#
Get a permission builder instance with which a pipeline can be created.
- Returns
An instance of
streamsets.sdk.sch_models.ACLPermissionBuilder
.
- remove_permission(permission)[source]#
Remove a permission from ACL.
- Parameters
permission (
streamsets.sdk.sch_models.Permission
) – A permission object.- Returns
An instance of
streamsets.sdk.sch_api.Command
- class streamsets.sdk.sch_models.ACLPermissionBuilder(permission, acl)[source]#
Class to help build the ACL permission.
- Parameters
permission (
streamsets.sdk.sch_models.Permission
) – A permission object.acl (
streamsets.sdk.sch_models.ACL
) – An ACL object.
- build(subject_id, subject_type, actions)[source]#
Method to help build the ACL permission.
- Parameters
subject_id (
str
) – Id of the subject e.g. ‘test@test’.subject_type (
str
) – Type of the subject e.g. ‘USER’.actions (
list
) – A list of actions of typestr
e.g. [‘READ’, ‘WRITE’, ‘EXECUTE’].
- Returns
An instance of
streamsets.sdk.sch_models.Permission
.
- class streamsets.sdk.sch_models.Permission(permission, resource_type, api_client)[source]#
A container for a permission.
- Parameters
permission (
dict
) – A Python object representation of a permission.resource_type (
str
) – String representing the type of resource e.g. ‘JOB’, ‘PIPELINE’.api_client (
streamsets.sdk.sch_api.ApiClient
) – An instance of ApiClient.
- resource_id#
Id of the resource e.g. Pipeline or Job.
- Type
str
- subject_id#
Id of the subject e.g. user id
'admin@admin'
.- Type
str
- subject_type#
Type of the subject e.g.
'USER'
.- Type
str
- last_modified_by#
User who last modified this permission e.g.
'admin@admin'
.- Type
str
- last_modified_on#
Timestamp at which this permission was last modified e.g.
1550785079811
.- Type
int
Alerts#
- class streamsets.sdk.sch_models.Alert(alert, control_hub)[source]#
Model for Alerts.
- message#
The Alert’s message text.
- Type
str
- alert_status#
The status of the Alert.
- Type
str
- Parameters
alert (
dict
) – JSON representation of an Alert.control_hub (
streamsets.sdk.ControlHub
) – Control Hub instance.
Classifiers#
Classification Rules#
- class streamsets.sdk.sch_models.ClassificationRule(classification_rule, classifiers)[source]#
Classification Rule Model.
- Parameters
classification_rule (
dict
) – A Python dict representation of classification rule.classifiers (
list
) – A list ofstreamsets.sdk.sch_models.Classifier
instances.
- class streamsets.sdk.sch_models.ClassificationRuleBuilder(classification_rule, classifier)[source]#
Class with which to build instances of
streamsets.sdk.sch_models.ClassificationRule
.- Instead of instantiating this class directly, most users should use
streamsets.sdk.sch.ControlHub.get_classification_rule_builder()
.
- Parameters
classification_rule (
dict
) – Python object defining a classification rule.classifier (
dict
) – Python object defining a classifier.
- add_classifier(patterns=None, match_with=None, regular_expression_type='RE2/J', case_sensitive=False)[source]#
Add classifier to the classification rule.
- Parameters
patterns (
list
, optional) – List of strings of patterns. Default:None
.match_with (
str
, optional) – Default:None
.regular_expression_type (
str
, optional) – Default:'RE2/J'
.case_sensitive (
bool
, optional) – Default:False
.
- Returns
An instance of
streamsets.sdk.sch_models.Classifier
.
- build(name, category, score)[source]#
Build the classification rule.
- Parameters
name (
str
) – Classification Rule name.category (
str
) – Classification Rule category.score (
float
) – Classification Rule score.
- Returns
An instance of
streamsets.sdk.sch_models.ClassificationRule
.
Connections#
- class streamsets.sdk.sch_models.Connection(connection, control_hub)[source]#
Model for connection.
- Parameters
connection (
dict
) – A Python object representation of Connection.control_hub (
streamsets.sdk.sch.ControlHub
) – Control Hub object.
- property acl#
Get Connection ACL.
- Returns
An instance of
streamsets.sdk.sch_models.ACL
.
- property pipeline_commits#
Get the pipeline commits using this connection.
- Returns
- A
streamsets.sdk.utils.SeekableList
ofstreamsets.sdk.sch_models.PipelineCommit
instances.
- A
- property tags#
Get the connection tags.
- Returns
A
streamsets.sdk.utils.SeekableList
of instances ofstreamsets.sdk.sch_models.Tag
.
- class streamsets.sdk.sch_models.Connections(control_hub, organization)[source]#
Collection of
streamsets.sdk.sch_models.Connection
instances.- Parameters
control_hub – An instance of
streamsets.sdk.sch.ControlHub
.organization (
str
) – Organization Id.
- class streamsets.sdk.sch_models.ConnectionBuilder(connection, control_hub)[source]#
Class with which to build instances of
streamsets.sdk.sch_models.Connection
.- Instead of instantiating this class directly, most users should use
streamsets.sdk.sch.ControlHub.get_connection_builder()
.
- Parameters
connection (
dict
) – Python object built from our Swagger ConnectionJson definition.control_hub (
streamsets.sdk.sch.ControlHub
) – Control Hub object.
- build(title, connection_type, authoring_data_collector, tags=None)[source]#
Define the connection.
- Parameters
title (
str
) – Connection title.connecion_type (
str
) – Type of connection.authoring_data_collector (
streamsets.sdk.ControlHub.DataCollector
) – Authoring Data Collector.tags (
list
, optional) – List of tags (strings). Default:None
.
- Returns
An instance of
streamsets.sdk.sch_models.Connection
.
- class streamsets.sdk.sch_models.ConnectionVerificationResult(connection_preview_json)[source]#
Model for connection verification result.
- Parameters
connection_preview_json (
dict
) – dynamic preview API response JSON
- property issue_count#
The count of the number of issues for the connection verification result.
- Returns
A
int
that represents the number of issues.
- property issue_message#
The message provided for the connection verification result.
- Returns
A
str
message detailing the response for the connection verification result.
DataCollectors#
- class streamsets.sdk.sch_models.DataCollector(data_collector, control_hub)[source]#
Model for Data Collector.
- execution_mode#
True
for Edge andFalse
for SDC.- Type
bool
- id#
Data Collectort id.
- Type
str
- labels#
Labels for Data Collector.
- Type
list
- last_validated_on#
Last validated time for Data Collector.
- Type
str
- reported_labels#
Reported labels for Data Collector.
- Type
list
- url#
Data Collector’s url.
- Type
str
- version#
Data Collector’s version.
- Type
str
- property accessible#
Returns a
bool
for whether the Data Collector instance is accessible.
- property acl#
Get DataCollector ACL.
- Returns
An instance of
streamsets.sdk.sch_models.ACL
.
- property attributes#
Returns a
dict
of Data Collector attributes.
- property pipelines_committed#
Control Hub Job IDs that are about to be started but have no corresponding pipeline status yet.
- Returns
A
list
of Job IDs (str
objects).
- property responding#
Returns a
bool
for whether the Data Collector instance is responding.
Transformers#
- class streamsets.sdk.sch_models.Transformer(transformer, control_hub)[source]#
Model for Transformer.
- execution_mode#
- Type
str
- id#
Transformer id.
- Type
str
- labels#
Labels for Transformer.
- Type
list
- last_validated_on#
Last validated time for Transformer.
- Type
str
- reported_labels#
Reported labels for Transformer.
- Type
list
- url#
Transformer’s url.
- Type
str
- version#
Transformer’s version.
- Type
str
- property accessible#
Returns a
bool
for whether the Transformer instance is accessible.
- property acl#
Get Transformer ACL.
- Returns
An instance of
streamsets.sdk.sch_models.ACL
.
- property attributes#
Returns a
dict
of Transformer attributes.
Group#
- class streamsets.sdk.sch_models.Group(group, roles)[source]#
Model for Group.
- Parameters
group (
dict
) – A Python object representation of Group.roles (
dict
) – A mapping of role IDs to role labels.
- class streamsets.sdk.sch_models.Groups(control_hub, roles, organization)[source]#
Collection of
streamsets.sdk.sch_models.Group
instances.- Parameters
control_hub – An instance of
streamsets.sdk.sch.ControlHub
.roles (
dict
) – A mapping of role IDs to role labels.organization (
str
) – Organization ID.
- class streamsets.sdk.sch_models.GroupBuilder(group, roles)[source]#
Class with which to build instances of
streamsets.sdk.sch_models.Group
.- Instead of instantiating this class directly, most users should use
streamsets.sdk.sch.ControlHub.get_group_builder()
.
- Parameters
group (
dict
) – Python object built from our Swagger GroupJson definition.roles (
dict
) – A mapping of role IDs to role labels.
- build(id, display_name, ldap_groups=None)[source]#
Build the group.
- Parameters
id (
str
) – Group ID.display_name (
str
) – Group display name.ldap_groups (
list
) – List of LDAP groups (strings).
- Returns
An instance of
streamsets.sdk.sch_models.Group
.
Jobs#
- class streamsets.sdk.sch_models.Job(job, control_hub=None)[source]#
Model for Job.
- commit_id#
Pipeline commit id.
- Type
str
- commit_label#
Pipeline commit label.
- Type
str
- created_by#
User that created this job.
- Type
str
- created_on#
Time at which this job was created.
- Type
int
- data_collector_labels#
Labels of the data collectors.
- Type
list
- description#
Job description.
- Type
str
- destroyer#
Job destroyer.
- Type
str
- enable_failover#
Flag that indicates if failover is enabled.
- Type
bool
- enable_time_series_analysis#
Flag that indicates if time series is enabled.
- Type
bool
- execution_mode#
True for Edge and False for SDC.
- Type
bool
- job_deleted#
Flag that indicates if this job is deleted.
- Type
bool
- job_id#
Id of the job.
- Type
str
- job_name#
Name of the job.
- Type
str
- last_modified_by#
User that last modified this job.
- Type
str
- last_modified_on#
Time at which this job was last modified.
- Type
int
- number_of_instances#
Number of instances.
- Type
int
- pipeline_force_stop_timeout#
Timeout for Pipeline force stop.
- Type
int
- pipeline_id#
Id of the pipeline that is running the job.
- Type
str
- pipeline_name#
Name of the pipeline that is running the job.
- Type
str
- pipeline_rule_id#
Rule Id of the pipeline that is running the job.
- Type
str
- read_policy#
Read Policy of the job.
- require_job_error_acknowledgement#
acknowledgement.
- Type
bool`
- runtime_parameters#
Run-time parameters of the job.
- Type
str
- statistics_refresh_interval_in_millisecs#
Refresh interval for statistics in milliseconds.
- Type
int
- status#
Status of the job.
- Type
string
- write_policy#
Write Policy of the job.
- property acl#
Get job ACL.
- Returns
An instance of
streamsets.sdk.sch_models.ACL
.
- property commit#
Get pipeline commit of the job.
- Returns
An instance of
streamsets.sdk.sch_models.PipelineCommit
.
- property committed_offsets#
Get the committed offsets for a given job id.
- Returns
An instance of
streamsets.sdk.sch_models.JobCommittedOffset
.
- property latest_committed_offsets#
Get the latest committed offsets for a given job id.
- Returns
A (
dict
) object.
- property metrics#
The metrics from all runs of a Job.
- Returns
A
streamsets.sdk.utils.SeekableList
ofstreamsets.sdk.sch_models.JobMetrics
instances.
- property pipeline#
Get the pipeline object corresponding to this job.
- property system_job#
Get the sytem Job for this job if exists.
- Returns
An instance of
streamsets.sdk.sch_models.Job
.
- property tag#
Get pipeline tag of the job.
- Returns
An instance of
streamsets.sdk.sch_models.PipelineTag
.
- property tags#
Get the job tags.
- Returns
A
streamsets.sdk.utils.SeekableList
of instances ofstreamsets.sdk.sch_models.Tag
.
- time_series_metrics(metric_type, time_filter_condition='LAST_5M', **kwargs)[source]#
Get historic time series metrics for the job.
- Parameters
metric_type (
str
) – metric type in {‘Record Count Time Series’, ‘Record Throughput Time Series’, ‘Batch Throughput Time Series’, ‘Stage Batch Processing Timer seconds’}.time_filter_condition (
str
, optional) – Default:'LAST_5M'
.
- Returns
An instance of
streamsets.sdk.sch_models.JobTimeSeriesMetrics
.
- class streamsets.sdk.sch_models.Jobs(control_hub)[source]#
Collection of
streamsets.sdk.sch_models.Job
instances.- Parameters
control_hub (
streamsets.sdk.sch.ControlHub
) – Control Hub object.
- class streamsets.sdk.sch_models.JobBuilder(job, control_hub)[source]#
Class with which to build instances of
streamsets.sdk.sch_models.Job
.- Instead of instantiating this class directly, most users should use
streamsets.sdk.sch.ControlHub.get_job_builder()
.
- Parameters
job (
dict
) – Python object built from our Swagger JobJson definition.
- build(job_name, pipeline, job_template=False, runtime_parameters=None, pipeline_commit=None, pipeline_tag=None, pipeline_commit_or_tag=None, tags=None)[source]#
Build the job.
- Parameters
job_name (
str
) – Name of the job.pipeline (
streamsets.sdk.sch_models.Pipeline
) – Pipeline object.job_template (
boolean
, optional) – Indicate if it is a Job Template. Default:False
.runtime_parameters (
dict
, optional) – Runtime Parameters for the Job or Job Template. Default:None
.pipeline_commit (
streamsets.sdk.sch_models.PipelineCommit
) – Default: ``None`, which resolves to the latest pipeline commit.pipeline_tag (
streamsets.sdk.sch_models.PipelineTag
) – Default: ``None`, which resolves to the latest pipeline tag.pipeline_commit_or_tag (
str
, optional) – Default:None
, which resolves to the latest pipeline commit.tags (
list
, optional) – Job tags. Default:None
.
- Returns
An instance of
streamsets.sdk.sch_models.Job
.
- class streamsets.sdk.sch_models.JobMetrics(metrics)[source]#
Model for job metrics.
- error_count#
The number of error records generated by this run of the Job.
- Type
int
- error_records_per_sec#
The number of error records per second generated by this run of the Job.
- Type
float
- input_count#
The number of records ingested by this run of the Job.
- Type
int
- input_records_per_sec#
The number of records per second ingested by this run of the Job.
- Type
float
- output_count#
The number of records output by this run of the Job.
- Type
int
- output_records_per_sec#
The number of records output per second by the run of the Job.
- Type
float
- pipeline_version#
The version of the pipeline that was used in this Job run.
- Type
str
- run_count#
The count corresponding to this Job run.
- Type
int
- sdc_id#
The ID of the SDC instance on which this Job run was executed.
- Type
str
- stage_errors_count#
The number of stage error records generated by this run of the Job.
- Type
int
- stage_error_records_per_sec#
The number of stage error records generated per second by this run of the job.
- Type
float
- total_error_count#
The total number of both error records and stage errors generated by this run of the job.
- Type
int
- Parameters
metrics (
dict
) – Metrics counts in JSON format.
- class streamsets.sdk.sch_models.JobOffset(offset)[source]#
Model for offset.
- Parameters
offset (
dict
) – Offset in JSON format.
- class streamsets.sdk.sch_models.JobRunEvent(event)[source]#
Model for an event in a Job Run.
- Parameters
event (
dict
) – Job Run Event in JSON format.
- class streamsets.sdk.sch_models.JobStatus(status, control_hub, **kwargs)[source]#
Model for Job Status.
- run_history#
(
streamsets.sdk.utils.JobRunHistoryEvent
): History of a particular job run.- Type
streamsets.sdk.utils.SeekableList
- offsets#
(
streamsets.sdk.utils.JobPipelineOffset
): Offsets after the job run.- Type
streamsets.sdk.utils.SeekableList
- Parameters
status (
dict
) – Job status in JSON format.control_hub (
streamsets.sdk.sch.ControlHub
) – Control Hub object.
- class streamsets.sdk.sch_models.JobTimeSeriesMetric(metric, metric_type)[source]#
Model for job metrics.
- name#
Name of measurement.
- Type
str
- values#
Timeseries data.
- Type
list
- time_series#
Timeseries data with timestamp as key and metric value as value.
- Type
dict
- Parameters
metric (
dict
) – Metrics in JSON format.
- class streamsets.sdk.sch_models.JobTimeSeriesMetrics(metrics, metric_type)[source]#
Model for job metrics.
- input_records#
Appears when queried for ‘Record Count Time Series’ or ‘Record Throughput Time Series’.
- output_records#
Appears when queried for ‘Record Count Time Series’ or ‘Record Throughput Time Series’.
- error_records#
Appears when queried for ‘Record Count Time Series’ or ‘Record Throughput Time Series’.
- batch_counter#
Appears when queried for ‘Batch Throughput Time Series’.
- batch_processing_timer#
Appears when queried for ‘Batch Processing Timer seconds’.
- Parameters
metrics (
dict
) – Metrics in JSON format.
- class streamsets.sdk.sch_models.RuntimeParameters(runtime_parameters, job)[source]#
Wrapper for Control Hub job runtime parameters.
- Parameters
runtime_parameters (
str
) – Runtime parameter.job (
streamsets.sdk.sch_models.Job
) – Job object.
Organizations#
- class streamsets.sdk.sch_models.Organization(organization, organization_admin_user=None, api_client=None)[source]#
Model for Organization.
- Parameters
organization (
str
) – Organization Id.organization_admin_user (
str
, optional) – Default:None
.api_client (
streamsets.sdk.sch_api.ApiClient
, optional) – Default:None
.
- class streamsets.sdk.sch_models.Organizations(control_hub)[source]#
Collection of
streamsets.sdk.sch_models.Organization
instances.
- class streamsets.sdk.sch_models.OrganizationBuilder(organization, organization_admin_user)[source]#
Class with which to build instances of
streamsets.sdk.sch_models.Organization
.- Instead of instantiating this class directly, most users should use
streamsets.sdk.sch.ControlHub.get_organization_builder()
.
- Parameters
organization (
dict
) – Python object built from our Swagger UserJson definition.
- build(id, name, admin_user_id, admin_user_display_name, admin_user_email_address, admin_user_ldap_user_name=None)[source]#
Build the organization.
- Parameters
id (
str
) – Organization ID.name (
str
) – Organization name.admin_user_id (
str
) – User Id of the admin of this organization.admin_user_display_name (
str
) – User display name of admin of this organization.admin_user_email_address (
str
) – User email address of admin of this organization.admin_user_ldap_user_name (
str
, optional) – LDAP username. Default:None
.
- Returns
An instance of
streamsets.sdk.sch_models.Organization
.
Pipelines#
- class streamsets.sdk.sch_models.Pipeline(pipeline, builder, pipeline_definition, rules_definition, control_hub=None, library_definitions=None)[source]#
Model for Pipeline.
- Parameters
pipeline (
dict
) – Pipeline in JSON format.builder (
streamsets.sdk.sch_models.PipelineBuilder
) – Pipeline Builder object.pipeline_definition (
dict
) – Pipeline Definition in JSON format.rules_definition (
dict
) – Rules Definition in JSON format.control_hub (
streamsets.sdk.sch.ControlHub
, optional) – ControlHub object. Default:None
.library_definitions (
dict
, optional) – Library Definition in JSON format. Default:None
.
- property Labels#
Get the pipeline labels. This attribute will be deprecated in a future release. Please use labels instead.
- Returns
A
list
ofdict
- property acl#
Get pipeline ACL.
- Returns
An instance of
streamsets.sdk.sch_models.ACL
.
- property commits#
Get commits for this pipeline.
- Returns
A
streamsets.sdk.utils.SeekableList
of instances ofstreamsets.sdk.sch_models.PipelineCommit
.
- property configuration#
Get pipeline’s configuration.
- Returns
An instance of
streamsets.sdk.sch_models.Configuration
.
- get_jobs_using_pipeline()[source]#
Get the jobs that are running on this pipeline.
- Returns
An instance of
streamsets.sdk.utils.SeekableList
containing instances ofstreamsets.sdk.sch_models.Job
that run on this pipeline.
- property labels#
Get the pipeline labels.
- Returns
A
streamsets.sdk.utils.SeekableList
of instances ofstreamsets.sdk.sch_models.PipelineLabel
.
- property parameters#
Get the pipeline parameters.
- Returns
A dict like,
streamsets.sdk.sch_models.PipelineParameters
object of parameter key-value pairs.
- property tags#
Get tags for this pipeline.
- Returns
A
streamsets.sdk.utils.SeekableList
of instances ofstreamsets.sdk.sch_models.PipelineTag
.
- class streamsets.sdk.sch_models.Pipelines(control_hub, organization)[source]#
Collection of
streamsets.sdk.sch_models.Pipeline
instances.- Parameters
control_hub – An instance of
streamsets.sdk.sch.ControlHub
.organization (
str
) – Organization Id.
- class streamsets.sdk.sch_models.PipelineBuilder(pipeline, data_collector_pipeline_builder, control_hub=None, fragment=False)[source]#
Class with which to build instances of
streamsets.sdk.sch_models.Pipeline
.- Instead of instantiating this class directly, most users should use
streamsets.sdk.sch.ControlHub.get_pipeline_builder()
.
- Parameters
pipeline (
dict
) – Python object built from our Swagger PipelineJson definition.data_collector_pipeline_builder (
streamsets.sdk.sdc_models.PipelineBuilder
) – Data Collector Pipeline Builder object.control_hub (
streamsets.sdk.sch.ControlHub
) – Default:None
.fragment (
boolean
, optional) – Specify if a fragment builder. Default:False
.
- add_stage(label=None, name=None, type=None, library=None)[source]#
Add a stage to the pipeline.
When specifying a stage, either
label
orname
must be used.type
andlibrary
may also be used to select a particular stage if ambiguities exist. Iftype
and/orlibrary
are omitted, the first stage definition matching the givenlabel
orname
will be used.- Parameters
label (
str
, optional) – Stage label to use when selecting stage from definitions. Default:None
.name (
str
, optional) – Stage name to use when selecting stage from definitions. Default:None
.type (
str
, optional) – Stage type to use when selecting stage from definitions (e.g. origin, destination, processor, executor). Default:None
.library (
str
, optional) – Stage library to use when selecting stage from definitions. Default:None
.
- Returns
An instance of
streamsets.sdk.sch_models.SchSdcStage
.
- class streamsets.sdk.sch_models.PipelineCommit(pipeline_commit, control_hub=None)[source]#
Model for pipeline commit.
- Parameters
pipeline_commit (
dict
) – Pipeline commit in JSON format.control_hub (
streamsets.sdk.sch.ControlHub
) – ControlHub object.
- class streamsets.sdk.sch_models.PipelineLabel(pipeline_label)[source]#
Model for pipeline label.
- Parameters
pipeline_label (
dict
) – Pipeline label in JSON format.
- class streamsets.sdk.sch_models.PipelineLabels(control_hub, organization)[source]#
Collection of
streamsets.sdk.sch_models.PipelineLabel
instances.- Parameters
control_hub – An instance of
streamsets.sdk.sch.ControlHub
.organization (
str
) – Organization Id.
- class streamsets.sdk.sch_models.PipelineParameters(pipeline)[source]#
Parameters for pipelines.
- Parameters
pipeline (
streamsets.sdk.sch_models.Pipeline
) – Pipeline Instance.
- class streamsets.sdk.sch_models.StPipelineBuilder(pipeline, transformer_pipeline_builder, control_hub=None, fragment=False)[source]#
Class with which to build instances of
streamsets.sdk.sch_models.Pipeline
.- Instead of instantiating this class directly, most users should use
streamsets.sdk.sch.ControlHub.get_pipeline_builder()
.
- Parameters
pipeline (
dict
) – Python object built from our Swagger PipelineJson definition.transformer_pipeline_builder (
streamsets.sdk.sdc_models.PipelineBuilder
) – Transformer Pipeline Builder object.control_hub (
streamsets.sdk.sch.ControlHub
) – Default:None
.fragment (
boolean
, optional) – Specify if a fragment builder. Default:False
.
- add_stage(label=None, name=None, type=None, library=None)[source]#
Add a stage to the pipeline.
When specifying a stage, either
label
orname
must be used.type
andlibrary
may also be used to select a particular stage if ambiguities exist. Iftype
and/orlibrary
are omitted, the first stage definition matching the givenlabel
orname
will be used.- Parameters
label (
str
, optional) – Transformer stage label to use when selecting stage from definitions. Default:None
.name (
str
, optional) – Transformer stage name to use when selecting stage from definitions. Default:None
.type (
str
, optional) – Transformer stage type to use when selecting stage from definitions (e.g. origin, destination, processor, executor). Default:None
.library (
str
, optional) – Transformer stage library to use when selecting stage from definitions. Default:None
.
- Returns
An instance of
streamsets.sdk.sch_models.SchStStage
.
Protection Methods#
- class streamsets.sdk.sch_models.ProtectionMethod(stage)[source]#
Protection Method Model.
- Parameters
stage (
dict
) – JSON representation of a stage.
- class streamsets.sdk.sch_models.ProtectionMethodBuilder(pipeline_builder)[source]#
Class with which to build instances of
streamsets.sdk.sch_models.ProtectionMethod
.- Instead of instantiating this class directly, most users should use
streamsets.sdk.sch.ControlHub.get_protection_method_builder()
.
- Parameters
pipeline_builder (
streamsets.sdk.sch_models.PipelineBuilder
) – Pipeline Builder object.
Protection Policies#
- class streamsets.sdk.sch_models.ProtectionPolicy(protection_policy, procedures=None)[source]#
Model for Protection Policy.
- Parameters
protection_policy (
dict
) – JSON representation of Protection Policy.procedures (
list
) – A list ofstreamsets.sdk.sch_models.PolicyProcedure
instances, Default:None
.
- class streamsets.sdk.sch_models.ProtectionPolicies(control_hub)[source]#
Collection of
streamsets.sdk.sch_models.ProtectionPolicy
instances.
- class streamsets.sdk.sch_models.ProtectionPolicyBuilder(control_hub, protection_policy, policy_procedure)[source]#
Class with which to build instances of
streamsets.sdk.sch_models.ProtectionPolicy
.- Instead of instantiating this class directly, most users should use
streamsets.sdk.sch.ControlHub.get_protection_policy_builder()
.
- Parameters
protection_policy (
dict
) – Python object defining a protection policy.policy_procedure (
dict
) – Python object defining a policy procedure.
ProvisioningAgents#
- class streamsets.sdk.sch_models.Deployment(deployment, control_hub=None)[source]#
Model for Deployment.
- Parameters
deployment (
dict
) – A Python object representation of Deployment.control_hub – An instance of
streamsets.sdk.sch.ControlHub
.
- property acl#
Get the ACL of a Deployment.
- Returns
An instance of
streamsets.sdk.sch_models.ACL
.
- class streamsets.sdk.sch_models.Deployments(control_hub, organization)[source]#
Collection of
streamsets.sdk.sch_models.Deployment
instances.- Parameters
control_hub – An instance of
streamsets.sdk.sch.ControlHub
.organization (
str
) – Organization Id.
- class streamsets.sdk.sch_models.DeploymentBuilder(deployment)[source]#
Class with which to build instances of
streamsets.sdk.sch_models.Deployment
.- Instead of instantiating this class directly, most users should use
streamsets.sdk.sch.ControlHub.get_deployment_builder()
.
- Parameters
deployment (
dict
) – Python object that represents Deployment JSON.
- build(name, provisioning_agent, number_of_data_collector_instances, spec=None, description=None, data_collector_labels=None)[source]#
Build the deployment.
- Parameters
name (
str
) – Deployment Name.provisioning_agent (
streamsets.sdk.sch_models.ProvisioningAgent
) – Agent to use.(obj (number_of_data_collector_instances) – int): Number of sdc instances.
spec (
dict
, optional) – Deployment yaml in dictionary format. Will use default yaml used by ui if left out.description (
str
, optional) – Default:None
.data_collector_labels (
list
, optional) – Default:['all']
.
- Returns
An instance of
streamsets.sdk.sch_models.Deployment
.
- class streamsets.sdk.sch_models.ProvisioningAgent(provisioning_agent, control_hub)[source]#
Model for Provisioning Agent.
- Parameters
provisioning_agent (
dict
) – A Python object representation of Provisioning Agent.control_hub – An instance of
streamsets.sdk.sch.ControlHub
.
- property acl#
Get the ACL of a Provisioning Agent.
- Returns
An instance of
streamsets.sdk.sch_models.ACL
.
- property deployments#
Get the deployments associated with the Provisioning Agent.
- Returns
A
list
ofstreamsets.sdk.sch_models.Deployment
instances.
- class streamsets.sdk.sch_models.ProvisioningAgents(control_hub, organization)[source]#
Collection of
streamsets.sdk.sch_models.ProvisioningAgent
instances.- Parameters
control_hub – An instance of
streamsets.sdk.sch.ControlHub
.organization (
str
) – Organization Id.
Reports#
- class streamsets.sdk.sch_models.GenerateReportCommand(control_hub, report_defintion, response)[source]#
Command to interact with the response from generate_report.
- Parameters
control_hub (
streamsets.sdk.ControlHub
) – Control Hub instance.report_definition (
dict
) – JSON representation of Report Definition.response (
dict
) – Api response from generating the report.
- class streamsets.sdk.sch_models.Report(report, control_hub, report_definition_id)[source]#
Model for Report.
- Parameters
report (
dict
) – JSON representation of Report.
- class streamsets.sdk.sch_models.Reports(control_hub, report_definition_id)[source]#
Collection of
streamsets.sdk.sch_models.Report
instances.- Parameters
control_hub (
streamsets.sdk.sch.ControlHub
) – Control Hub object.report_definition_id (
str
) – Report Definition Id.
- class streamsets.sdk.sch_models.ReportDefinition(report_definition, control_hub)[source]#
Model for Report Definition.
- Parameters
report_definition (
dict
) – JSON representation of Report Definition.control_hub (
streamsets.sdk.ControlHub
) – Control Hub instance.
- property acl#
Get Report Definition ACL.
- Returns
An instance of
streamsets.sdk.sch_models.ACL
.
- generate_report()[source]#
Generate a Report for Report Definition.
- Returns
An instance of
streamsets.sdk.sch_models.Report
.
- property report_resources#
Get Report Resources of the Report Definition.
- Returns
An instance of
streamsets.sdk.sch_models.ReportResources
.
- property reports#
Get Reports of the Report Definition.
- Returns
An instance of
streamsets.sdk.sch_models.Reports
.
- class streamsets.sdk.sch_models.ReportDefinitions(control_hub)[source]#
Collection of
streamsets.sdk.sch_models.ReportDefinition
instances.
- class streamsets.sdk.sch_models.ReportDefinitionBuilder(report_definition, control_hub)[source]#
Class with which to build instances of
streamsets.sdk.sch_models.ReportDefinition
.- Instead of instantiating this class directly, most users should use
streamsets.sdk.sch.ControlHub.get_report_definition_builder()
.
- Parameters
report_definition (
dict
) – JSON representation of Report Definition.control_hub (
streamsets.sdk.ControlHub
) – Control Hub instance.
- add_report_resource(resource)[source]#
Add a given resource to Report Definition resources.
- Parameters
resource (
streamsets.sdk.sch_models.Job
) or (streamsets.sdk.sch_models.Topology
) –
- build(name, description=None)[source]#
Build the report definition.
- Parameters
name (
str
) – Name of the Report Definition.description (
str
, optional) – Description of the Report Definition. Default:None
.
- Returns
An instance of
streamsets.sdk.sch_models.ReportDefinition
.
- import_report_definition(report_definition)[source]#
Import an existing Report Definition to update it.
- Parameters
report_definition (
streamsets.sdk.sch_models.ReportDefinition
) – Report Definition object.
- remove_report_resource(resource)[source]#
Remove a resource from Report Definition Resources.
- Parameters
resource (
streamsets.sdk.sch_models.Job
) or (streamsets.sdk.sch_models.Topology
) –- Returns
A resource of type
dict
that is removed from Report Definition Resources.
- class streamsets.sdk.sch_models.ReportResource(report_resource)[source]#
Model for Report Resource.
- Parameters
report_resource (
dict
) – JSON representation of Report Resource.
- class streamsets.sdk.sch_models.ReportResources(report_resources, report_definition)[source]#
Model for the collection of Report Resources.
- Parameters
report_resources (
list
) – List of Report Resources.report_definition (
streamsets.sdk.sch_models.ReportDefinition
) – Report Definition object.
Scheduler#
- class streamsets.sdk.sch_models.ScheduledTask(task, control_hub=None)[source]#
Model for Scheduled Task.
- Parameters
task (
dict
) – JSON representation of task.control_hub (
streamsets.sdk.ControlHub
) – Control Hub instance.
- property acl#
Get the ACL of a Scheduled Task.
- Returns
An instance of
streamsets.sdk.sch_models.ACL
.
- property audits#
Get Scheduled Task Audits.
- Returns
A
streamsets.sdk.utils.SeekableList
of inherited instances ofstreamsets.sdk.sch_models.ScheduledTaskAudit
.
- property runs#
Get Scheduled Task Runs.
- Returns
A
streamsets.sdk.utils.SeekableList
of inherited instances ofstreamsets.sdk.sch_models.ScheduledTaskRun
.
- class streamsets.sdk.sch_models.ScheduledTaskAudit(audit)[source]#
Scheduled Task Audit.
- Parameters
run (
dict
) – JSON representation of scheduled task audit.
- class streamsets.sdk.sch_models.ScheduledTaskBaseModel(data, attributes_to_ignore=None, attributes_to_remap=None, repr_metadata=None)[source]#
Base Model for Scheduled Task related classes.
- class streamsets.sdk.sch_models.ScheduledTaskBuilder(job_selection_types, control_hub)[source]#
Builder for Scheduled Task.
- Instead of instantiating this class directly, most users should use
streamsets.sdk.sch.ControlHub.get_scheduled_task_builder()
.
- Parameters
job_selection_types (
dict
) – JSON representation of job selection types.control_hub (
streamsets.sdk.ControlHub
) – Control Hub instance.
- build(task_object, action='START', name=None, description=None, cron_expression='0 0 1/1 * ? *', time_zone='UTC', status='RUNNING', start_time=None, end_time=None, missed_execution_handling='IGNORE')[source]#
Builder for Scheduled Task.
- Parameters
task_object (
streamsets.sdk.sch_models.Job
) – (streamsets.sdk.sch_models.ReportDefinition
): Job or ReportDefinition object.action (
str
, optional) – One of the {‘START’, ‘STOP’, ‘UPGRADE’} actions. Default:START
.name (
str
, optional) – Name of the task. Default:None
.description (
str
, optional) – Description of the task. Default:None
.crontab_mask (
str
, optional) – Schedule in cron syntax. Default:"0 0 1/1 * ? *"
. (Daily at 12).time_zone (
str
, optional) – Time zone. Default:"UTC"
.status (
str
, optional) – One of the {‘RUNNING’, ‘PAUSED’} statuses. Default:RUNNING
.start_time (
str
, optional) – Start time of task. Default:None
.end_time (
str
, optional) – End time of task. Default:None
.missed_trigger_handling (
str
, optional) – One of {‘IGNORE’, ‘RUN ALL’, ‘RUN ONCE’}. Default:IGNORE
.
- Returns
An instance of
streamsets.sdk.sch_models.ScheduledTask
.
- class streamsets.sdk.sch_models.ScheduledTaskRun(run)[source]#
Scheduled Task Run.
- Parameters
run (
dict
) – JSON representation if scheduled task run.
- class streamsets.sdk.sch_models.ScheduledTasks(control_hub)[source]#
Collection of
streamsets.sdk.sch_models.ScheduledTask
instances.
Subscriptions#
- class streamsets.sdk.sch_models.Subscription(subscription, control_hub)[source]#
Subscription.
- Parameters
subscription (
dict
) – JSON representation of Subscription.control_hub (
streamsets.sdk.ControlHub
) – Control Hub instance.
- property acl#
Get the ACL of an Event Subscription.
- Returns
An instance of
streamsets.sdk.sch_models.ACL
.
- property action#
Action of the Subscription.
- property events#
Events of the Subscription.
- class streamsets.sdk.sch_models.Subscriptions(control_hub)[source]#
Collection of
streamsets.sdk.sch_models.Subscription
instances.- Parameters
control_hub (
streamsets.sdk.sch.ControlHub
) – Control Hub object.
- class streamsets.sdk.sch_models.SubscriptionAction(action)[source]#
Action to take when the Subscription is triggered.
- Parameters
action (
dict
) – JSON representation of an Action for a Subscription.
- class streamsets.sdk.sch_models.SubscriptionAudit(audit)[source]#
Model for subscription audit.
- Parameters
audit (
dict
) – JSON representation of a subscription audit.
- class streamsets.sdk.sch_models.SubscriptionBuilder(subscription, control_hub)[source]#
Builder for Subscription.
- Instead of instantiating this class directly, most users should use
streamsets.sdk.sch.ControlHub.get_subscription_builder()
.
- Parameters
subscription (
dict
) – JSON representation of event subscription.control_hub (
streamsets.sdk.ControlHub
) – Control Hub instance.
- add_event(event_type, filter='')[source]#
Add event to the Subscription.
- Parameters
event_type (
str
) – Type of event in {‘Job Status Change’, ‘Data SLA Triggered’, ‘Pipeline Committed’, ‘Pipeline Status Change’, ‘Report Generated’, ‘Data Collector not Responding’}.filter (
str
, optional) – Filter to be applied on event. Default:""
.
- build(name, description=None)[source]#
Builder for Scheduled Task.
- Parameters
name (
str
) – Name of Subscription.description (
str
, optional) – Description of subscription. Default:None
.
- Returns
An instance of
streamsets.sdk.sch_models.Subscription
.
- import_subscription(subscription)[source]#
Import an existing Subscription into the builder to update it.
- Parameters
subscription (
streamsets.sdk.sch_models.Subscription
) – Subscription instance.
- remove_event(event_type)[source]#
Remove event from the subscription.
- Parameters
event_type (
str
) – Type of event in {‘Job Status Change’, ‘Data SLA Triggered’, ‘Pipeline Committed’, ‘Pipeline Status Change’, ‘Report Generated’, ‘Data Collector not Responding’}.- Returns
An instance of
streamsets.sdk.sch_models.SubscriptionEvent
.
- set_email_action(recipients, subject=None, body=None)[source]#
Set the Email action.
- Parameters
recipients (
list
) – List of email addresses.subject (
str
, optional) – Subject of the email. Default:None
.body (
str
, optional) – Body of the email. Default:None
.
- set_webhook_action(uri, method='GET', content_type=None, payload=None, auth_type=None, username=None, password=None, timeout=30000, headers=None)[source]#
Set the Webhook action.
- Parameters
uri (
str
) – URI for the Webhook.method (
str
, optional) – HTTP method to use. Default:'GET'
.content_type (
str
, optional) – Content Type of the request. Default:None
.payload (
str
, optional) – Payload of the request. Default:None
.auth_type (
str
, optional) –'basic'
orNone
. Default:None
.username (
str
, optional) – username for the authentication. Default:None
.password (
str
, optional) – password for the authentication. Default:None
.timeout (
int
, optional) – timeout for the Webhook action. Default:30000
.headers (
dict
, optional) – Headers to be sent to the Webhook call. Default:None
.
- class streamsets.sdk.sch_models.SubscriptionEvent(event, control_hub)[source]#
An Event of a Subscription.
- Parameters
event (
dict
) – JSON representation of Events of a Subscription.control_hub (
streamsets.sdk.ControlHub
) – Control Hub instance.
Topologies#
- class streamsets.sdk.sch_models.DataSla(data_sla, control_hub)[source]#
Model for DataSla.
- Parameters
data_sla (
dict
) – JSON representation of SLA.control_hub (
streamsets.sdk.sch.ControlHub
) – An instance of the Control Hub.
- class streamsets.sdk.sch_models.DataSlaBuilder(data_sla, control_hub)[source]#
Class with which to build instances of
streamsets.sdk.sch_models.DataSla
.- Instead of instantiating this class directly, most users should use
streamsets.sdk.sch.ControlHub.get_data_sla_builder()
.
- Parameters
data_sla (
dict
) – Python object built from our Swagger DataSlaJson definition.control_hub (
streamsets.sdk.sch.ControlHub
) – An instance of the Control Hub.
- build(topology, label, job, alert_text, qos_parameter='THROUGHPUT_RATE', function_type='Max', min_max_value=100, enabled=True)[source]#
Build the Data Sla.
- Parameters
topology (
streamsets.sdk.sch_models.Topology
) – Topology object.label (
str
) – Label for the SLA.job (
list
) – List ofstreamsets.sdk.sch_models.Job
objects.alert_text (
str
) – Alert text.qos_parameter (
str
, optional) – paramter in {‘THROUGHPUT_RATE’, ‘ERROR_RATE’}. Default:'THROUGHPUT_RATE'
.function_type (
str
, optional) – paramter in {‘Max’, ‘Min’}. Default:'Max'
.min_max_value (
str
, optional) – Default:100
.enabled (
boolean
, optional) – Default:True
.
- Returns
An instance of
streamsets.sdk.sch_models.DataSla
.
- class streamsets.sdk.sch_models.Topology(topology, control_hub=None)[source]#
Model for Topology.
- Parameters
topology (
dict
) – JSON representation of Topology.
- commit_id#
Pipeline commit id.
- Type
str
- commit_message#
Commit Message.
- Type
str
- commit_time#
Time at which commit was made.
- Type
int
- committed_by#
User that made the commit.
- Type
str
- default_topology#
Default Topology.
- Type
bool
- description#
Topology description.
- Type
str
- draft#
Indicates whether this topology is a draft.
- Type
bool
- last_modified_by#
User that last modified this topology.
- Type
str
- last_modified_on#
Time at which this topology was last modified.
- Type
int
- new_pipeline_version_available#
Whether any job in the topology has a new pipeline version to be updated to.
- Type
bool
- organization#
Id of the organization.
- Type
str
- parent_version#
Version of the parent topology.
- Type
str
- topology_definition#
Definition of the topology.
- Type
str
- topology_id#
Id of the topology.
- Type
str
- topology_name#
Name of the topology.
- Type
str
- validation_issues#
Any validation issues that exist for this Topology.
- Type
dict
- version#
Version of this topology.
- Type
str
- acknowledge_job_errors()[source]#
Acknowledge all errors for the jobs in a topology.
- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- property acl#
Get the ACL of a Topology.
- Returns
An instance of
streamsets.sdk.sch_models.ACL
.
- activate_data_sla(*data_slas)[source]#
Activate Data SLAs.
- Parameters
*data_slas – One or more instances of
streamsets.sdk.sch_models.DataSla
.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- add_data_sla(data_sla)[source]#
Add SLA.
- Parameters
data_sla (
streamsets.sdk.sch_models.DataSla
) – Data SLA object.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- auto_discover_connections()[source]#
Auto discover connecting systems between nodes in a Topology.
- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- auto_fix()[source]#
Auto-fix a topology by rectifying invalid or removed jobs, outdated jobs, etc.
- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- deactivate_data_sla(*data_slas)[source]#
Deactivate Data SLAs.
- Parameters
*data_slas – One or more instances of
streamsets.sdk.sch_models.DataSla
.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- delete_data_sla(*data_slas)[source]#
Delete Data SLAs.
- Parameters
*data_slas – One or more instances of
streamsets.sdk.sch_models.DataSla
.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- property jobs#
Get the jobs that are contained within the Topology.
- Returns
A
streamsets.sdk.utils.SeekableList
ofstreamsets.sdk.sch_models.Job
instances.
- property new_pipeline_version_available#
Determine if a new pipeline version is available for any jobs in the Topology.
- Returns
A (
bool
) value.
- property nodes#
Get the job and system nodes that make up the Topology.
- Returns
- A
streamsets.sdk.utils.SeekableList
ofstreamsets.sdk.sch_models.TopologyNode
instances.
- A
- start_all_jobs()[source]#
Start all jobs of a topology.
- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- stop_all_jobs(force=False)[source]#
Stop all jobs of a topology.
- Parameters
force (
bool
, optional) – Force topology jobs to stop. Default:False
.- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- update_jobs_to_latest_change()[source]#
Upgrade a topology’s job(s) to the latest corresponding pipeline change.
- Returns
An instance of
streamsets.sdk.sch_api.Command
.
- property validation_issues#
Get any validation issues that are detected for the Topology.
- Returns
A (
list
) of validation issues in JSON format.
- class streamsets.sdk.sch_models.Topologies(control_hub)[source]#
Collection of
streamsets.sdk.sch_models.Topology
instances.- Parameters
control_hub (
streamsets.sdk.sch.ControlHub
) – An instance of the Control Hub.
- class streamsets.sdk.sch_models.TopologyBuilder(topology, control_hub=None)[source]#
Class with which to build instances of
streamsets.sdk.sch_models.Topology
.- Instead of instantiating this class directly, most users should use
streamsets.sdk.sch.ControlHub.get_topology_builder()
.
- Parameters
topology (
dict
) – Python object built from our Swagger TopologyJson definition.control_hub (
streamsets.sdk.ControlHub
, optional) – Control Hub instance. Default:None
- add_job(job)[source]#
Add a job node to the Topology being built.
- Parameters
job (
streamsets.sdk.sch_models.Job
) – An instance of a job to be added.
- add_system(name)[source]#
Add a system node to the Topology being built.
- Parameters
name (
str
) – The name of the system to add to the topology.
- build(topology_name=None, description=None)[source]#
Build the topology.
- Parameters
topology_name (
str
, optional) – Name of the topology. This parameter is required when building a new topology.description (
str
, optional) – Description of the topology. Default:None
.
- Returns
An instance of
streamsets.sdk.sch_models.Topology
.
- delete_node(topology_node)[source]#
Delete a system or job node from the topology.
- Parameters
topology_node (
streamsets.sdk.sch_models.TopologyNode
) – An instance of a TopologyNode to delete from the topology.
- import_topology(topology)[source]#
Import an existing topology to be used in the builder.
- Parameters
topology (
streamsets.sdk.sch_models.Topology
) – An existing Topology instance to modify.
- property topology_nodes#
Get all of the nodes currently part of the topology held by the TopologyBuilder.
- Returns
- A
streamsets.sdk.utils.SeekableList
ofstreamsets.sdk.sch_models.TopologyNode
instances.
- A
- class streamsets.sdk.sch_models.TopologyNode(topology_node_json)[source]#
Model for a node within a Topology.
- Parameters
topology_node_json (
dict
) – JSON representation of a Topology Node.
- node_type#
The type of this node, i.e. SYSTEM, JOB, etc.
- Type
str
- instance_name#
The name of this node instance.
- Type
str
- stage_name#
The name of the stage in this node.
- Type
str
- stage_version#
The version of the stage in this node.
- Type
str
- job_id#
The ID of the job in this node.
- Type
str
- pipeline_id#
The pipeline ID associated with the job in this node.
- Type
str
- pipeline_commit_id#
The commit ID of the pipeline.
- Type
str
- pipeline_version#
The version of the pipeline.
- Type
str
- input_lanes#
A list of
str
representing the input lanes for this node.- Type
list
- output_lanes#
A list of
str
representing the output lanes for this node.- Type
list
Users#
- class streamsets.sdk.sch_models.User(user, roles)[source]#
Model for User.
- Parameters
user (
dict
) – JSON representation of User.roles (
dict
) – A mapping of role IDs to role labels.
- active#
Whether the user is active or not.
- Type
bool
- created_by#
Creator of this user.
- Type
str
- created_on#
Creation time of this user.
- Type
str
- display_name#
Display name of this user.
- Type
str
- email_address#
Email address of this user.
- Type
str
- id#
Id of this user.
- Type
str
- groups#
Groups this user belongs to.
- Type
list
- last_modified_by#
User last modified by.
- Type
str
- last_modified_on#
User last modification time.
- Type
str
- password_expires_on#
User’s password expiration time.
- Type
str
- password_system_generated#
Whether User’s password is system generated or not.
- Type
bool
- roles#
A set of role labels.
- Type
set
- saml_user_name#
SAML username of user.
- Type
str
- class streamsets.sdk.sch_models.Users(control_hub, roles, organization)[source]#
Collection of
streamsets.sdk.sch_models.User
instances.- Parameters
control_hub – An instance of
streamsets.sdk.sch.ControlHub
.roles (
dict
) – A mapping of role IDs to role labels.organization (
str
) – Organization ID.
- class streamsets.sdk.sch_models.UserBuilder(user, roles, control_hub=None)[source]#
Class with which to build instances of
streamsets.sdk.sch_models.User
.- Instead of instantiating this class directly, most users should use
streamsets.sdk.sch.ControlHub.get_user_builder()
.
- Parameters
user (
dict
) – Python object built from our Swagger UserJson definition.roles (
dict
) – A mapping of role IDs to role labels.control_hub – An instance of
streamsets.sdk.sch.ControlHub
. Default:None
.
- build(id, display_name, email_address, saml_user_name=None, ldap_user_name=None)[source]#
Build the user.
- Parameters
id (
str
) – User Id.display_name (
str
) – User display name.email_address (
str
) – User Email Address.saml_user_name (
str
, optional) – Default:None
.ldap_user_name (
str
, optional) – Default:None
.
- Returns
An instance of
streamsets.sdk.sch_models.User
.
Common#
Models used by StreamSets Data Collector and StreamSets Control Hub:
- class streamsets.sdk.models.Configuration(configuration=None, compatibility_map=None, property_key='name', property_value='value', **kwargs)[source]#
Abstraction for stage configurations.
This class enables easy access to and modification of data stored as a list of dictionaries. As an example, SDC’s pipeline configuration is stored in the form
[ { "name" : "executionMode", "value" : "STANDALONE" }, { "name" : "deliveryGuarantee", "value" : "AT_LEAST_ONCE" }, ... ]
By implementing simple
__getitem__
and__setitem__
methods, this class allows items in this list to be accessed usingconfiguration['executionMode'] = 'CLUSTER_BATCH'
Instead of the more verbose
for property in configuration: if property['name'] == 'executionMode': property['value'] = 'CLUSTER_BATCH' break
- Parameters
configuration (
str
) – List of dictionaries comprising the configuration.compatibility_map (
dict
, optional) – A dictionary mapping values used for backwards compatibility.property_key (
str
, optional) – The dictionary entry denoting the property key. Default:name
property_value (
str
, optional) – The dictionary entry denoting the property value. Default:value
- get(key, default=None)[source]#
Return the value of key or, if not in the configuration, the default value.
Exceptions#
Common exceptions.
- exception streamsets.sdk.exceptions.BadRequestError(response)[source]#
Bad request error (HTTP 400).
- exception streamsets.sdk.exceptions.ConnectionError(code, message)[source]#
Connection Catalog errors.
- exception streamsets.sdk.exceptions.DeploymentInactiveError(message)[source]#
Deployment status changed into INACTIVE_ERROR.
- exception streamsets.sdk.exceptions.InvalidCredentialsError(message)[source]#
Invalid credentials error.
- exception streamsets.sdk.exceptions.JobInactiveError(message)[source]#
Job status changed into INACTIVE_ERROR.
- exception streamsets.sdk.exceptions.MultipleIssuesError(errors)[source]#
Multiple errors were returned.
- exception streamsets.sdk.exceptions.StatusError(response)[source]#
Parent class for pipeline status errors.