StreamSets Transformer#

Main interface#

This is the main entry point used by users when interacting with Transformer instances.

class streamsets.sdk.Transformer(server_url, username=None, password=None, authentication_method='form', accounts_authentication_token=None, accounts_server_url=None, control_hub=None, dump_log_on_error=False, **kwargs)[source]#

Class to interact with StreamSets Transformer.

Parameters
  • server_url (str) – URL of an existing ST deployment with which to interact.

  • username (str, optional) – ST username. Default: streamsets.sdk.st.DEFAULT_ST_USERNAME.

  • password (str, optional) – ST password. Default: streamsets.sdk.st.DEFAULT_ST_PASSWORD.

  • authentication_method (str, optional) – StreamSets Transformer authentication method. Default: streamsets.sdk.constants.ENGINE_AUTHENTICATION_METHOD_FORM.

  • accounts_authentication_token (str, optional) – StreamSets Accounts authentication token. Default: None

  • accounts_server_url (str, optional) – StreamSets Accounts server base URL. Default: None

  • control_hub (streamsets.sdk.ControlHub, optional) – A StreamSets Control Hub instance to use for SCH-registered Transformers. Default: None.

  • dump_log_on_error (bool) – Whether to output Transformer logs when exceptions are raised by certain methods. Default: False

add_pipeline(*pipelines)[source]#

Add one or more pipelines to the Transformer instance.

Parameters

*pipelines – One or more instances of streamsets.sdk.st_models.Pipeline.

change_password(old_password, new_password)[source]#

Change password for the current user.

Parameters
  • old_password (str) – old password.

  • new_password (str) – new password.

Returns

An instance of streamsets.sdk.st_api.Command.

property current_user#

Get currently logged-in user and its groups and roles.

Returns

An instance of streamsets.sdk.st_models.User.

property definitions#

Get an ST instance’s definitions.

Will return a cached instance of the definitions if called more than once.

Returns

An instance of json.

get_alerts()[source]#

Get pipeline alerts.

Returns

An instance of streamsets.sdk.st_models.Alerts.

get_bundle(generators=None)[source]#

Generate new support bundle.

Returns

An instance of zipfile.ZipFile.

get_bundle_generators()[source]#

Get available support bundle generators.

Returns

An instance of streamsets.sdk.st_models.BundleGenerators.

get_jmx_metrics()[source]#

Get Transformer JMX metrics

Returns

An instance of streamsets.sdk.st_models.JmxMetrics.

get_logs(ending_offset=- 1, extra_message=None, pipeline=None, severity=None)[source]#

Get logs.

Parameters
  • ending_offset (int) – ending_offset, Default: -1.

  • extra_message (str) – extra_message, Default: None.

  • pipeline (streamsets.sdk.st_models.Pipeline) – The pipeline instance, Default: None.

  • severity (str) – severity, Default: None.

Returns

An instance of streamsets.sdk.st_models.Log.

get_pipeline(pipeline_id)[source]#

Get a pipeline.

Parameters

pipeline_id (str) – Id of pipeline.

Returns

An instance of streamsets.sdk.st_models.Pipeline.

get_pipeline_acl(pipeline)[source]#

Get pipeline ACL.

Parameters

pipeline (streamsets.sdk.st_models.Pipeline) – The pipeline instance.

Returns

An instance of streamsets.sdk.st_models.PipelineAcl.

get_pipeline_builder(**kwargs)[source]#

Get a pipeline builder instance with which a pipeline can be created.

Returns

An instance of streamsets.sdk.st_models.PipelineBuilder.

get_pipeline_history(pipeline)[source]#

Get a pipeline’s history.

Parameters

pipeline (streamsets.sdk.st_models.Pipeline) – The pipeline instance.

Returns

An instance of streamsets.sdk.st_models.History.

get_pipeline_metrics(pipeline)[source]#

Get a pipeline’s metrics.

Parameters

pipeline (streamsets.sdk.st_models.Pipeline) – The pipeline instance.

Returns

An instance of streamsets.sdk.st_models.Metrics.

get_pipeline_permissions(pipeline)[source]#

Return pipeline permissions for a given pipeline.

Parameters

pipeline (streamsets.sdk.st_models.Pipeline) – The pipeline instance.

Returns

An instance of streamsets.sdk.st_models.PipelinePermissions.

get_pipeline_status(pipeline)[source]#

Get status of a pipeline.

Parameters

pipeline (streamsets.sdk.st_models.Pipeline) – The pipeline instance.

property id#

Return id for StreamSets Transformer.

Returns

A str Transformer ID.

property pipelines#

Get all pipelines in the pipeline store.

Returns

A streamsets.sdk.utils.SeekableList of

streamsets.sdk.st_models.Pipeline instances.

remove_pipeline(*pipelines)[source]#

Remove one or more pipelines from the Transformer instance.

Parameters

*pipelines – One or more instances of streamsets.sdk.st_models.Pipeline.

reset_origin(pipeline)[source]#

Reset origin offset.

Parameters

pipeline (streamsets.sdk.st_models.Pipeline) – Pipeline object.

Returns

An instance of streamsets.sdk.st_api.Command.

run_pipeline_preview(pipeline, rev=0, batches=1, batch_size=10, skip_targets=True, end_stage=None, timeout=120000, stage_outputs_to_override_json=None, **kwargs)[source]#

Run pipeline preview.

Parameters
  • pipeline (streamsets.sdk.st_models.Pipeline) – The pipeline instance.

  • rev (int, optional) – Pipeline revision. Default: 0.

  • batches (int, optional) – Number of batches. Default: 1.

  • batch_size (int, optional) – Batch size. Default: 10.

  • skip_targets (bool, optional) – Skip targets. Default: True.

  • end_stage (str, optional) – End stage. Default: None.

  • timeout (int, optional) – Server side preview Timeout in milliseconds. Default: 120000.

  • stage_outputs_to_override_json (str, optional) – Stage outputs to override. Default: None.

  • remote (bool, optional) – Remote preview (i.e. run on the cluster). Default: False.

  • timeout_sec (int, optional) – Client side preview timeout, in seconds. Default: streamsets.sdk.st.DEFAULT_PREVIEW_CLIENT_TIMEOUT_SEC.

  • wait (bool, optional) – Wait for pipeline preview to finish. Default: True.

  • time_between_checks (int, optional) – Time to sleep between preview status checks. Applicable when `wait` is enabled, in seconds. Default: streamsets.sdk.st.DEFAULT_PREVIEW_TIME_BETWEEN_CHECKS.

Returns

An instance of streamsets.sdk.st_api.PreviewCommand.

set_pipeline_acl(pipeline, pipeline_acl)[source]#

Update pipeline ACL.

Parameters
Returns

An instance of streamsets.sdk.st_api.Command.

set_user(username, password=None)[source]#

Set the user with which to interact with ST.

Parameters
  • username (str) – Username of user.

  • password (str, optional) – Password for user. Default: same as username.

start_pipeline(pipeline, runtime_parameters=None, **kwargs)[source]#

Start a pipeline.

Parameters
  • pipeline (streamsets.sdk.st_models.Pipeline) – The pipeline instance.

  • runtime_parameters (dict, optional) – Collection of runtime parameters. Default: None.

  • wait (bool, optional) – Wait for pipeline to start. Default: True.

  • wait_for_statuses (list, optional) – Pipeline statuses to wait on. Default: ['RUNNING', 'FINISHED'].

  • timeout_sec (int) – Timeout to wait for pipeline statuses, in seconds. Default: streamsets.sdk.st.DEFAULT_START_TIMEOUT.

Returns

An instance of streamsets.sdk.st_api.PipelineCommand.

stop_pipeline(pipeline, **kwargs)[source]#

Stop a pipeline.

Parameters
  • pipeline (streamsets.sdk.st_models.Pipeline) – The pipeline instance.

  • force (bool, optional) – Force pipeline to stop. Default: False.

  • wait (bool, optional) – Wait for pipeline to stop. Default: True.

  • timeout_sec (int) – Timeout to wait for pipeline stop, in seconds. Default: streamsets.sdk.st.DEFAULT_STOP_TIMEOUT.

Returns

An instance of streamsets.sdk.st_api.StopPipelineCommand.

property transformer_configuration#

Return all configurations for StreamSets Transformer. :returns: A dict with property names as keys and property values as values.

validate_pipeline(pipeline, **kwargs)[source]#

Validate a pipeline.

Parameters
  • pipeline (streamsets.sdk.st_models.Pipeline) – The pipeline instance.

  • timeout (int, optional) – Server side validate Timeout in seconds. Default: streamsets.sdk.st.DEFAULT_VALIDATE_SERVER_TIMEOUT_SEC.

  • timeout_sec (int, optional) – Client side validate timeout, in seconds. Default: streamsets.sdk.st.DEFAULT_VALIDATE_CLIENT_TIMEOUT_SEC.

  • time_between_checks (int, optional) – Time to sleep between validation checks. Default: streamsets.sdk.st.DEFAULT_VALIDATE_TIME_BETWEEN_CHECKS.

  • using_configured_cluster_manager (bool, optional) – Validate pipeline using configured cluster manager. Default: True.

property version#

Return the version of the Transformer.

Returns

The version string.

Return type

str

wait_for_pipeline_metric(pipeline, metric, value, timeout_sec=30)[source]#

Block until a pipeline metric reaches the desired value.

Parameters
  • pipeline (streamsets.sdk.st_models.Pipeline) – The pipeline instance.

  • metric (str) – The desired metric (e.g. 'output_record_count' or 'data_batch_count').

  • value – The desired value to wait for.

  • timeout_sec (int, optional) – Timeout to wait for metric to reach value, in seconds. Default: streamsets.sdk.st.DEFAULT_WAIT_FOR_METRIC_TIMEOUT.

Raises

TimeoutError – If timeout_sec passes without metric reaching value.

Models#

These models wrap and provide useful functionality for interacting with common SCH abstractions.

Alerts#

class streamsets.sdk.st_models.Alert(alert)[source]#

Pipeline alert.

Parameters

alert (dict) – Python object representation of a pipeline alert.

property alert_texts#

Get alert’s alert texts.

Returns

The alert’s alert texts as a str.

property label#

Get alert’s label.

Returns

The alert’s label as a str.

property pipeline_id#

Get alert’s pipeline ID.

Returns

The pipeline ID as a str.

class streamsets.sdk.st_models.Alerts(alerts)[source]#

Container for list of alerts with filtering capabilities.

Parameters

alerts (dict) – Python object representation of alerts.

alerts#

A list of streamsets.sdk.st_models.Alert instances.

Type

list

for_pipeline(pipeline)[source]#

Get alerts for the specified pipeline.

Parameters

pipeline (str) – The pipeline for which to get alerts.

Returns

An instance of streamsets.sdk.st_models.Alerts.

Data Rules#

class streamsets.sdk.st_models.DataDriftRule(stream, label, condition=None, sampling_percentage=5, sampling_records_to_retain=10, enable_meter=True, enable_alert=True, alert_text='${alert:info()}', send_email=False, active=False)[source]#

Pipeline data drift rule.

Parameters
  • stream (str) – Stream to use for data rule. An entry from a Stage instance’s output_lanes list is typically used here.

  • label (str) – Rule label.

  • condition (str, optional) – Data rule condition. Default: None.

  • sampling_percentage (int, optional) – Default: 5.

  • sampling_records_to_retain (int, optional) – Default: 10.

  • enable_meter (bool, optional) – Default: True.

  • enable_alert (bool, optional) – Default: True.

  • alert_text (str, optional) – Default: '${alert:info()}'.

  • send_email (bool, optional) – Default: False.

  • active (bool, optional) – Enable the data rule. Default: False.

property active#

The rule is active.

Returns

A bool.

class streamsets.sdk.st_models.DataRule(stream, label, condition=None, sampling_percentage=5, sampling_records_to_retain=10, enable_meter=True, enable_alert=True, alert_text=None, threshold_type='count', threshold_value=100, min_volume=1000, send_email=False, active=False)[source]#

Pipeline data rule.

Parameters
  • stream (str) – Stream to use for data rule. An entry from a Stage instance’s output_lanes list is typically used here.

  • label (str) – Rule label.

  • condition (str, optional) – Data rule condition. Default: None.

  • sampling_percentage (int, optional) – Default: 5.

  • sampling_records_to_retain (int, optional) – Default: 10.

  • enable_meter (bool, optional) – Default: True.

  • enable_alert (bool, optional) – Default: True.

  • alert_text (str, optional) – Default: None.

  • threshold_type (str, optional) – One of count or percentage. Default: 'count'.

  • threshold_value (int, optional) – Default: 100.

  • min_volume (int, optional) – Only set if threshold_type is percentage. Default: 1000.

  • send_email (bool, optional) – Default: False.

  • active (bool, optional) – Enable the data rule. Default: False.

property active#

Returns if the rule is active or not.

Returns

A bool.

History#

class streamsets.sdk.st_models.History(history)[source]#

Pipeline history.

Parameters

history (dict) – Python object representation of the pipeline history.

entries#

A list of streamsets.sdk.st_models.HistoryEntry instances.

Type

list

property latest#

Get pipeline history’s latest entry.

Returns

The most recent pipeline history entry as an instance of streamsets.sdk.st_models.HistoryEntry.

class streamsets.sdk.st_models.HistoryEntry(entry)[source]#

Pipeline history entry.

Parameters

entry (dict) – Python object representation of the history entry.

property metrics#

Get pipeline history entry’s metrics.

Returns

The pipeline history entry’s metrics as an instance of streamsets.sdk.st_models.Metrics.

Issues#

class streamsets.sdk.st_models.Issue(issue)[source]#

Issue encountered for a pipeline or a stage.

Parameters

issue (dict) – Python object representation of the issue.

class streamsets.sdk.st_models.Issues(issues)[source]#

Issues encountered for pipelines as well as stages.

Parameters

issues (dict) – Python object representation of the issues.

issues_count#

The number of issues.

Type

int

pipeline_issues#

A list of streamsets.sdk.st_models.Issue instances.

Type

list

stage_issues#

A dictionary mapping stage names to instances of streamsets.sdk.st_models.Issue.

Type

dict

Logs#

class streamsets.sdk.st_models.Log(log)[source]#

Model for ST logs.

Parameters

log (list) – A list of dictionaries (JSON representation) of the log.

after_time(timestamp)[source]#

Returns log happened after the time specified.

Parameters

timestamp (str) – Timestamp in the form ‘2017-04-10 17:53:55,244’.

Returns

The formatted log as a str.

before_time(timestamp)[source]#

Returns log happened before the time specified.

Parameters

timestamp (str) – Timestamp in the form ‘2017-04-10 17:53:55,244’.

Returns

The formatted log as a str.

Metrics#

class streamsets.sdk.st_models.MetricCounter(counter)[source]#

Metric counter.

Parameters

counter (dict) – Python object representation of a metric counter.

property count#

Get the metric counter’s count.

Returns

The metric counter’s count as an int.

class streamsets.sdk.st_models.MetricGauge(gauge)[source]#

Metric gauge.

Parameters

gauge (dict) – Python object representation of a metric gauge.

property value#

Get the metric gauge’s value.

Returns

The metric gauge’s value as a str.

class streamsets.sdk.st_models.MetricHistogram(histogram)[source]#

Metric histogram.

Parameters

histogram (dict) – Python object representation of a metric histogram.

class streamsets.sdk.st_models.MetricTimer(timer)[source]#

Metric timer.

Parameters

timer (dict) – Python object representation of a metric timer.

property count#

Get the metric timer’s count.

Returns

The metric timer’s count as an int.

class streamsets.sdk.st_models.Metrics(metrics)[source]#

Metrics.

Parameters

metrics (dict) – Python object representation of metrics.

counter(name)[source]#

Get the metric counter from metrics.

Parameters

name (str) – Counter name.

Returns

The metric counter as an instance of streamsets.sdk.st_models.MetricCounter.

gauge(name)[source]#

Get the metric gauge from metrics.

Parameters

name (str) – Gauge name.

Returns

The metric gauge as an instance of streamsets.sdk.st_models.MetricGauge.

histogram(name)[source]#

Get the metric histogram from metrics.

Parameters

name (str) – Histogram name.

Returns

The metric histogram as an instance of streamsets.sdk.st_models.MetricHistogram.

meter(name)[source]#

Get the metric meter from metrics.

Parameters

name (str) – Meter name.

Returns

The metric meter as an instance of streamsets.sdk.st_models.MetricMeter.

property pipeline#

Get pipeline-level metrics.

Returns

An instance of streamsets.sdk.st_models.PipelineMetrics.

timer(name)[source]#

Get the metric timer from metrics.

Parameters

name (str) – Timer namer.

Returns

The metric timer as an instance of streamsets.sdk.st_models.MetricTimer.

Pipelines#

class streamsets.sdk.st_models.PipelineBuilder(pipeline, definitions)[source]#

Class with which to build ST pipelines.

This class allows a user to programmatically generate an ST pipeline. Instead of instantiating this class directly, most users should use streamsets.sdk.Transformer.get_pipeline_builder().

Parameters
  • pipeline (dict) – Python object representing an empty pipeline. If created manually, this would come from creating a new pipeline in ST and then exporting it before doing any configuration.

  • definitions (dict) – The output of ST’s definitions endpoint.

add_data_drift_rule(*data_drift_rules)[source]#

Add one or more data drift rules to the pipeline.

Parameters

*data_drift_rules – One or more instances of streamsets.sdk.st_models.DataDriftRule.

add_data_rule(*data_rules)[source]#

Add one or more data rules to the pipeline.

Parameters

*data_rules – One or more instances of streamsets.sdk.st_models.DataRule.

add_error_stage(label=None, name=None, library=None)[source]#

Add an error stage to the pipeline.

When specifying a stage, either label or name must be used. If library is omitted, the first stage definition matching the given label or name will be used.

Parameters
  • label (str, optional) – ST stage label to use when selecting stage from definitions. Default: None.

  • name (str, optional) – ST stage name to use when selecting stage from definitions. Default: None.

  • library (str, optional) – ST stage library to use when selecting stage from definitions. Default: None.

Returns

An instance of streamsets.sdk.st_models.Stage.

add_metric_rule(*metric_rules)[source]#

Add one or more metric rules to the pipeline.

Parameters

*data_rules – One or more instances of streamsets.sdk.st_models.MetricRule.

add_stage(label=None, name=None, type=None, library=None)[source]#

Add a stage to the pipeline.

When specifying a stage, either label or name must be used. type and library may also be used to select a particular stage if ambiguities exist. If type and/or library are omitted, the first stage definition matching the given label or name will be used.

Parameters
  • label (str, optional) – ST stage label to use when selecting stage from definitions. Default: None.

  • name (str, optional) – ST stage name to use when selecting stage from definitions. Default: None.

  • type (str, optional) – ST stage type to use when selecting stage from definitions (e.g. origin, destination, processor, executor). Default: None.

  • library (str, optional) – ST stage library to use when selecting stage from definitions. Default: None.

Returns

An instance of streamsets.sdk.st_models.Stage.

add_start_event_stage(label=None, name=None, library=None)[source]#

Add start event stage to the pipeline.

When specifying a stage, either label or name must be used. If library is omitted, the first stage definition matching the given label or name will be used.

Parameters
  • label (str, optional) – ST stage label to use when selecting stage from definitions. Default: None.

  • name (str, optional) – ST stage name to use when selecting stage from definitions. Default: None.

  • library (str, optional) – ST stage library to use when selecting stage from definitions. Default: None.

Returns

An instance of streamsets.sdk.st_models.Stage.

add_stats_aggregator_stage(label=None, name=None, library=None)[source]#

Add a stats aggregator stage to the pipeline.

When specifying a stage, either label or name must be used. If library is omitted, the first stage definition matching the given label or name will be used.

Parameters
  • label (str, optional) – ST stage label to use when selecting stage from definitions. Default: None.

  • name (str, optional) – ST stage name to use when selecting stage from definitions. Default: None.

  • library (str, optional) – ST stage library to use when selecting stage from definitions. Default: None.

Returns

An instance of streamsets.sdk.st_models.Stage.

add_stop_event_stage(label=None, name=None, library=None)[source]#

Add stop event stage to the pipeline.

When specifying a stage, either label or name must be used. If library is omitted, the first stage definition matching the given label or name will be used.

Parameters
  • label (str, optional) – ST stage label to use when selecting stage from definitions. Default: None.

  • name (str, optional) – ST stage name to use when selecting stage from definitions. Default: None.

  • library (str, optional) – ST stage library to use when selecting stage from definitions. Default: None.

Returns

An instance of streamsets.sdk.st_models.Stage.

build(title='Pipeline')[source]#

Build the pipeline.

Parameters

title (str, optional) – Pipeline title to use. Default: 'Pipeline'.

Returns

An instance of streamsets.sdk.st_models.Pipeline.

import_pipeline(pipeline, **kwargs)[source]#

Import a pipeline into the PipelineBuilder.

Parameters

pipeline (dict) – Exported pipeline.

Returns

An instance of streamsets.sdk.st_models.PipelineBuilder.

class streamsets.sdk.st_models.Pipeline(pipeline, all_stages=None)[source]#

ST pipeline.

This class provides abstractions to make it easier to interact with a pipeline before it’s imported into ST.

Parameters
  • pipeline (dict) – A Python object representing the serialized pipeline.

  • all_stages (dict, optional) – A dictionary mapping stage names to streamsets.sdk.st_models.Stage instances. Default: None.

add_parameters(**parameters)[source]#

Add pipeline parameters.

Parameters

**parameters – Keyword arguments to add.

property configuration#

Get pipeline’s configuration.

Returns

An instance of streamsets.sdk.models.Configuration.

property delivery_guarantee#

Get the delivery guarantee.

Returns

The delivery guarantee as a str.

property id#

Get the pipeline id.

Returns

The pipeline id as a str.

property metadata#

Get the pipeline metadata.

Returns

Pipeline metadata as a Python object.

property origin_stage#

Get the pipeline’s origin stage.

Returns

An instance of streamsets.sdk.st_models.Stage.

property parameters#

Get the pipeline parameters.

Returns

A dict of parameter key-value pairs.

pprint()[source]#

Pretty-print the pipeline’s JSON representation.

property rate_limit#

Get the rate limit (records/sec).

Returns

The rate limit as a str.

property title#

Get the pipeline title.

Returns

The pipeline title as a str.

class streamsets.sdk.st_models.Stage(stage, label=None)[source]#

Pipeline stage.

Parameters
  • stage – JSON representation of the pipeline stage.

  • label (str, optional) – Human-readable stage label. Default: None.

configuration#

The stage configuration.

Type

streamsets.sdk.models.Configuration

services#

If supported by the stage, a dictionary mapping a service name to an instance of streamsets.sdk.models.Configuration.

Type

dict

add_output(*other_stages, event_lane=False)[source]#

Connect output of this stage to another stage.

The __rshift__ operator (>>) has been overloaded to invoke this method.

Parameters

other_stage (streamsets.sdk.st_models.Stage) – Stage object.

Returns

This stage as an instance of streamsets.sdk.st_models.Stage).

property event_lanes#

Get the stage’s list of event lanes.

Returns

A list of event lanes.

property label#

The stage’s label.

Type

str

property library#

Get the stage’s library.

Returns

The stage library as a str.

property output_lanes#

Get the stage’s list of output lanes.

Returns

A list of output lanes.

set_attributes(**attributes)[source]#

Set one or more stage attributes.

Parameters

**attributes – Attributes to set.

Returns

This stage as an instance of streamsets.sdk.st_models.Stage.

property stage_on_record_error#

The stage’s on record error configuration value.

property stage_record_preconditions#

The stage’s record preconditions configuration value.

property stage_required_fields#

The stage’s required fields configuration value.

Pipeline ACLs#

class streamsets.sdk.st_models.PipelineAcl(pipeline_acl)[source]#

Represents a pipeline ACL.

Parameters

pipeline_acl (dict) – JSON representation of a pipeline ACL.

permissions#

Pipeline Permissions object.

Type

streamsets.sdk.st_models.PipelinePermissions

Pipeline Permissions#

class streamsets.sdk.st_models.PipelinePermission(pipeline_permission)[source]#

A container for a pipeline permission.

Parameters

pipeline_permission (dict) – A Python object representation of a pipeline permission.

class streamsets.sdk.st_models.PipelinePermissions(pipeline_permissions)[source]#

Container for list of permissions for a pipeline.

Parameters

pipeline_permissions (dict) – A Python object representation of pipeline permissions.

permissions#

A list of streamsets.sdk.st_models.PipelinePermission instances.

Type

list

Previews#

class streamsets.sdk.st_models.Preview(pipeline_id, previewer_id, preview)[source]#

Preview.

Parameters
  • pipeline_id (str) – Pipeline ID.

  • previewer_id (str) – Previewer ID.

  • preview (dict) – Python object representation of the preview.

issues#

An instance of streamsets.sdk.st_models.Issues.

Type

dict

preview_batches#

A list of streamsets.sdk.st_models.Batch instances.

Type

list

Snapshots#

class streamsets.sdk.st_models.Batch(batch)[source]#

Snapshot batch.

Parameters

batch – Python object representation of the snapshot batch.

class streamsets.sdk.st_models.Record(record)[source]#

Record.

Parameters

record (dict) – Python object representation of the record.

header#

An instance of streamsets.sdk.st_models.RecordHeader.

Type

dict

value#

Python object representation of the record value.

Type

dict

value2#

A typed representation of the record value.

get_field_attributes(path)[source]#

Given a field path string (similar to XPath), get streamsets.sdk.st_models.Field attributes. .. rubric:: Example

get_field_attributes(path=’[2]/east/HR/employeeName’).

Parameters

path (str) – field path string.

Returns

streamsets.sdk.st_models.Field attributes.

get_field_data(path)[source]#

Given a field path string (similar to XPath), get streamsets.sdk.st_models.Field. .. rubric:: Example

get_field_data(path=’[2]/east/HR/employeeName’).

Parameters

path (str) – field path string.

Returns

An instance of streamsets.sdk.st_models.Field.

class streamsets.sdk.st_models.RecordHeader(header)[source]#

Record Header.

Parameters

header (dict) – Python object representation of the record header.

class streamsets.sdk.st_models.StageOutput(stage_output)[source]#

Snapshot batch’s stage output.

Parameters

stage_output – Python object representation of the stage output.

property output#

Gets the stage output’s output.

If the stage contains multiple lanes, use streamsets.sdk.st_models.StageOutput.output_lanes.

:raises An instance of Exception if the stage contains multiple lanes.:

Returns

An instance of streamsets.sdk.st_models.Record.

Users#

class streamsets.sdk.st_models.User(user)[source]#

User.

Parameters

user (dict) – Python object representation of the user.

property groups#

Get user’s groups.

Returns

User groups as a str.

property name#

Get user’s name.

Returns

User name as a str.

property roles#

Get user’s roles.

Returns

User roles as a str.