StreamSets Transformer

Main interface

This is the main entry point used by users when interacting with Transformer instances.

class streamsets.sdk.Transformer(server_url, username=None, password=None, authentication_method='form', accounts_authentication_token=None, accounts_server_url=None, control_hub=None, dump_log_on_error=False, **kwargs)[source]

Class to interact with StreamSets Transformer.

Parameters
  • server_url (str) – URL of an existing ST deployment with which to interact.

  • username (str, optional) – ST username. Default: streamsets.sdk.st.DEFAULT_ST_USERNAME.

  • password (str, optional) – ST password. Default: streamsets.sdk.st.DEFAULT_ST_PASSWORD.

  • authentication_method (str, optional) – StreamSets Transformer authentication method. Default: streamsets.sdk.constants.ENGINE_AUTHENTICATION_METHOD_FORM.

  • accounts_authentication_token (str, optional) – StreamSets Accounts authentication token. Default: None

  • accounts_server_url (str, optional) – StreamSets Accounts server base URL. Default: None

  • control_hub (streamsets.sdk.ControlHub, optional) – A StreamSets Control Hub instance to use for SCH-registered Transformers. Default: None.

  • dump_log_on_error (bool) – Whether to output Transformer logs when exceptions are raised by certain methods. Default: False

add_pipeline(*pipelines)[source]

Add one or more pipelines to the Transformer instance.

Parameters

*pipelines – One or more instances of streamsets.sdk.st_models.Pipeline.

change_password(old_password, new_password)[source]

Change password for the current user.

Parameters
  • old_password (str) – old password.

  • new_password (str) – new password.

Returns

An instance of streamsets.sdk.st_api.Command.

property current_user

Get currently logged-in user and its groups and roles.

Returns

An instance of streamsets.sdk.st_models.User.

property definitions

Get an ST instance’s definitions.

Will return a cached instance of the definitions if called more than once.

Returns

An instance of json.

get_alerts()[source]

Get pipeline alerts.

Returns

An instance of streamsets.sdk.st_models.Alerts.

get_bundle(generators=None)[source]

Generate new support bundle.

Returns

An instance of zipfile.ZipFile.

get_bundle_generators()[source]

Get available support bundle generators.

Returns

An instance of streamsets.sdk.st_models.BundleGenerators.

get_jmx_metrics()[source]

Get Transformer JMX metrics

Returns

An instance of streamsets.sdk.st_models.JmxMetrics.

get_logs(ending_offset=- 1, extra_message=None, pipeline=None, severity=None)[source]

Get logs.

Parameters
  • ending_offset (int) – ending_offset, Default: -1.

  • extra_message (str) – extra_message, Default: None.

  • pipeline (streamsets.sdk.st_models.Pipeline) – The pipeline instance, Default: None.

  • severity (str) – severity, Default: None.

Returns

An instance of streamsets.sdk.st_models.Log.

get_pipeline(pipeline_id)[source]

Get a pipeline.

Parameters

pipeline_id (str) – Id of pipeline.

Returns

An instance of streamsets.sdk.st_models.Pipeline.

get_pipeline_acl(pipeline)[source]

Get pipeline ACL.

Parameters

pipeline (streamsets.sdk.st_models.Pipeline) – The pipeline instance.

Returns

An instance of streamsets.sdk.st_models.PipelineAcl.

get_pipeline_builder(**kwargs)[source]

Get a pipeline builder instance with which a pipeline can be created.

Returns

An instance of streamsets.sdk.st_models.PipelineBuilder.

get_pipeline_history(pipeline)[source]

Get a pipeline’s history.

Parameters

pipeline (streamsets.sdk.st_models.Pipeline) – The pipeline instance.

Returns

An instance of streamsets.sdk.st_models.History.

get_pipeline_metrics(pipeline)[source]

Get a pipeline’s metrics.

Parameters

pipeline (streamsets.sdk.st_models.Pipeline) – The pipeline instance.

Returns

An instance of streamsets.sdk.st_models.Metrics.

get_pipeline_permissions(pipeline)[source]

Return pipeline permissions for a given pipeline.

Parameters

pipeline (streamsets.sdk.st_models.Pipeline) – The pipeline instance.

Returns

An instance of streamsets.sdk.st_models.PipelinePermissions.

get_pipeline_status(pipeline)[source]

Get status of a pipeline.

Parameters

pipeline (streamsets.sdk.st_models.Pipeline) – The pipeline instance.

property id

Return id for StreamSets Transformer.

Returns

A str Transformer ID.

property pipelines

Get all pipelines in the pipeline store.

Returns

A streamsets.sdk.utils.SeekableList of

streamsets.sdk.st_models.Pipeline instances.

remove_pipeline(*pipelines)[source]

Remove one or more pipelines from the Transformer instance.

Parameters

*pipelines – One or more instances of streamsets.sdk.st_models.Pipeline.

reset_origin(pipeline)[source]

Reset origin offset.

Parameters

pipeline (streamsets.sdk.st_models.Pipeline) – Pipeline object.

Returns

An instance of streamsets.sdk.st_api.Command.

run_pipeline_preview(pipeline, rev=0, batches=1, batch_size=10, skip_targets=True, end_stage=None, timeout=120000, stage_outputs_to_override_json=None, **kwargs)[source]

Run pipeline preview.

Parameters
  • pipeline (streamsets.sdk.st_models.Pipeline) – The pipeline instance.

  • rev (int, optional) – Pipeline revision. Default: 0.

  • batches (int, optional) – Number of batches. Default: 1.

  • batch_size (int, optional) – Batch size. Default: 10.

  • skip_targets (bool, optional) – Skip targets. Default: True.

  • end_stage (str, optional) – End stage. Default: None.

  • timeout (int, optional) – Server side preview Timeout in milliseconds. Default: 120000.

  • stage_outputs_to_override_json (str, optional) – Stage outputs to override. Default: None.

  • remote (bool, optional) – Remote preview (i.e. run on the cluster). Default: False.

  • timeout_sec (int, optional) – Client side preview timeout, in seconds. Default: streamsets.sdk.st.DEFAULT_PREVIEW_CLIENT_TIMEOUT_SEC.

  • wait (bool, optional) – Wait for pipeline preview to finish. Default: True.

  • time_between_checks (int, optional) – Time to sleep between preview status checks. Applicable when `wait` is enabled, in seconds. Default: streamsets.sdk.st.DEFAULT_PREVIEW_TIME_BETWEEN_CHECKS.

Returns

An instance of streamsets.sdk.st_api.PreviewCommand.

set_pipeline_acl(pipeline, pipeline_acl)[source]

Update pipeline ACL.

Parameters
Returns

An instance of streamsets.sdk.st_api.Command.

set_user(username, password=None)[source]

Set the user with which to interact with ST.

Parameters
  • username (str) – Username of user.

  • password (str, optional) – Password for user. Default: same as username.

start_pipeline(pipeline, runtime_parameters=None, **kwargs)[source]

Start a pipeline.

Parameters
  • pipeline (streamsets.sdk.st_models.Pipeline) – The pipeline instance.

  • runtime_parameters (dict, optional) – Collection of runtime parameters. Default: None.

  • wait (bool, optional) – Wait for pipeline to start. Default: True.

  • wait_for_statuses (list, optional) – Pipeline statuses to wait on. Default: ['RUNNING', 'FINISHED'].

  • timeout_sec (int) – Timeout to wait for pipeline statuses, in seconds. Default: streamsets.sdk.st.DEFAULT_START_TIMEOUT.

Returns

An instance of streamsets.sdk.st_api.PipelineCommand.

stop_pipeline(pipeline, **kwargs)[source]

Stop a pipeline.

Parameters
  • pipeline (streamsets.sdk.st_models.Pipeline) – The pipeline instance.

  • force (bool, optional) – Force pipeline to stop. Default: False.

  • wait (bool, optional) – Wait for pipeline to stop. Default: True.

  • timeout_sec (int) – Timeout to wait for pipeline stop, in seconds. Default: streamsets.sdk.st.DEFAULT_STOP_TIMEOUT.

Returns

An instance of streamsets.sdk.st_api.StopPipelineCommand.

property transformer_configuration

Return all configurations for StreamSets Transformer. :returns: A dict with property names as keys and property values as values.

validate_pipeline(pipeline, **kwargs)[source]

Validate a pipeline.

Parameters
  • pipeline (streamsets.sdk.st_models.Pipeline) – The pipeline instance.

  • timeout (int, optional) – Server side validate Timeout in seconds. Default: streamsets.sdk.st.DEFAULT_VALIDATE_SERVER_TIMEOUT_SEC.

  • timeout_sec (int, optional) – Client side validate timeout, in seconds. Default: streamsets.sdk.st.DEFAULT_VALIDATE_CLIENT_TIMEOUT_SEC.

  • time_between_checks (int, optional) – Time to sleep between validation checks. Default: streamsets.sdk.st.DEFAULT_VALIDATE_TIME_BETWEEN_CHECKS.

  • using_configured_cluster_manager (bool, optional) – Validate pipeline using configured cluster manager. Default: True.

property version

Return the version of the Transformer.

Returns

The version string.

Return type

str

wait_for_pipeline_metric(pipeline, metric, value, timeout_sec=30)[source]

Block until a pipeline metric reaches the desired value.

Parameters
  • pipeline (streamsets.sdk.st_models.Pipeline) – The pipeline instance.

  • metric (str) – The desired metric (e.g. 'output_record_count' or 'data_batch_count').

  • value – The desired value to wait for.

  • timeout_sec (int, optional) – Timeout to wait for metric to reach value, in seconds. Default: streamsets.sdk.st.DEFAULT_WAIT_FOR_METRIC_TIMEOUT.

Raises

TimeoutError – If timeout_sec passes without metric reaching value.

Models

These models wrap and provide useful functionality for interacting with common SCH abstractions.

Alerts

class streamsets.sdk.st_models.Alert(alert)[source]

Pipeline alert.

Parameters

alert (dict) – Python object representation of a pipeline alert.

property alert_texts

Get alert’s alert texts.

Returns

The alert’s alert texts as a str.

property label

Get alert’s label.

Returns

The alert’s label as a str.

property pipeline_id

Get alert’s pipeline ID.

Returns

The pipeline ID as a str.

class streamsets.sdk.st_models.Alerts(alerts)[source]

Container for list of alerts with filtering capabilities.

Parameters

alerts (dict) – Python object representation of alerts.

alerts

A list of streamsets.sdk.st_models.Alert instances.

Type

list

for_pipeline(pipeline)[source]

Get alerts for the specified pipeline.

Parameters

pipeline (str) – The pipeline for which to get alerts.

Returns

An instance of streamsets.sdk.st_models.Alerts.

Data Rules

class streamsets.sdk.st_models.DataDriftRule(stream, label, condition=None, sampling_percentage=5, sampling_records_to_retain=10, enable_meter=True, enable_alert=True, alert_text='${alert:info()}', send_email=False, active=False)[source]

Pipeline data drift rule.

Parameters
  • stream (str) – Stream to use for data rule. An entry from a Stage instance’s output_lanes list is typically used here.

  • label (str) – Rule label.

  • condition (str, optional) – Data rule condition. Default: None.

  • sampling_percentage (int, optional) – Default: 5.

  • sampling_records_to_retain (int, optional) – Default: 10.

  • enable_meter (bool, optional) – Default: True.

  • enable_alert (bool, optional) – Default: True.

  • alert_text (str, optional) – Default: '${alert:info()}'.

  • send_email (bool, optional) – Default: False.

  • active (bool, optional) – Enable the data rule. Default: False.

property active

The rule is active.

Returns

A bool.

class streamsets.sdk.st_models.DataRule(stream, label, condition=None, sampling_percentage=5, sampling_records_to_retain=10, enable_meter=True, enable_alert=True, alert_text=None, threshold_type='count', threshold_value=100, min_volume=1000, send_email=False, active=False)[source]

Pipeline data rule.

Parameters
  • stream (str) – Stream to use for data rule. An entry from a Stage instance’s output_lanes list is typically used here.

  • label (str) – Rule label.

  • condition (str, optional) – Data rule condition. Default: None.

  • sampling_percentage (int, optional) – Default: 5.

  • sampling_records_to_retain (int, optional) – Default: 10.

  • enable_meter (bool, optional) – Default: True.

  • enable_alert (bool, optional) – Default: True.

  • alert_text (str, optional) – Default: None.

  • threshold_type (str, optional) – One of count or percentage. Default: 'count'.

  • threshold_value (int, optional) – Default: 100.

  • min_volume (int, optional) – Only set if threshold_type is percentage. Default: 1000.

  • send_email (bool, optional) – Default: False.

  • active (bool, optional) – Enable the data rule. Default: False.

property active

Returns if the rule is active or not.

Returns

A bool.

History

class streamsets.sdk.st_models.History(history)[source]

Pipeline history.

Parameters

history (dict) – Python object representation of the pipeline history.

entries

A list of streamsets.sdk.st_models.HistoryEntry instances.

Type

list

property latest

Get pipeline history’s latest entry.

Returns

The most recent pipeline history entry as an instance of streamsets.sdk.st_models.HistoryEntry.

class streamsets.sdk.st_models.HistoryEntry(entry)[source]

Pipeline history entry.

Parameters

entry (dict) – Python object representation of the history entry.

property metrics

Get pipeline history entry’s metrics.

Returns

The pipeline history entry’s metrics as an instance of streamsets.sdk.st_models.Metrics.

Issues

class streamsets.sdk.st_models.Issue(issue)[source]

Issue encountered for a pipeline or a stage.

Parameters

issue (dict) – Python object representation of the issue.

class streamsets.sdk.st_models.Issues(issues)[source]

Issues encountered for pipelines as well as stages.

Parameters

issues (dict) – Python object representation of the issues.

issues_count

The number of issues.

Type

int

pipeline_issues

A list of streamsets.sdk.st_models.Issue instances.

Type

list

stage_issues

A dictionary mapping stage names to instances of streamsets.sdk.st_models.Issue.

Type

dict

Logs

class streamsets.sdk.st_models.Log(log)[source]

Model for ST logs.

Parameters

log (list) – A list of dictionaries (JSON representation) of the log.

after_time(timestamp)[source]

Returns log happened after the time specified.

Parameters

timestamp (str) – Timestamp in the form ‘2017-04-10 17:53:55,244’.

Returns

The formatted log as a str.

before_time(timestamp)[source]

Returns log happened before the time specified.

Parameters

timestamp (str) – Timestamp in the form ‘2017-04-10 17:53:55,244’.

Returns

The formatted log as a str.

Metrics

class streamsets.sdk.st_models.MetricCounter(counter)[source]

Metric counter.

Parameters

counter (dict) – Python object representation of a metric counter.

property count

Get the metric counter’s count.

Returns

The metric counter’s count as an int.

class streamsets.sdk.st_models.MetricGauge(gauge)[source]

Metric gauge.

Parameters

gauge (dict) – Python object representation of a metric gauge.

property value

Get the metric gauge’s value.

Returns

The metric gauge’s value as a str.

class streamsets.sdk.st_models.MetricHistogram(histogram)[source]

Metric histogram.

Parameters

histogram (dict) – Python object representation of a metric histogram.

class streamsets.sdk.st_models.MetricTimer(timer)[source]

Metric timer.

Parameters

timer (dict) – Python object representation of a metric timer.

property count

Get the metric timer’s count.

Returns

The metric timer’s count as an int.

class streamsets.sdk.st_models.Metrics(metrics)[source]

Metrics.

Parameters

metrics (dict) – Python object representation of metrics.

counter(name)[source]

Get the metric counter from metrics.

Parameters

name (str) – Counter name.

Returns

The metric counter as an instance of streamsets.sdk.st_models.MetricCounter.

gauge(name)[source]

Get the metric gauge from metrics.

Parameters

name (str) – Gauge name.

Returns

The metric gauge as an instance of streamsets.sdk.st_models.MetricGauge.

histogram(name)[source]

Get the metric histogram from metrics.

Parameters

name (str) – Histogram name.

Returns

The metric histogram as an instance of streamsets.sdk.st_models.MetricHistogram.

meter(name)[source]

Get the metric meter from metrics.

Parameters

name (str) – Meter name.

Returns

The metric meter as an instance of streamsets.sdk.st_models.MetricMeter.

property pipeline

Get pipeline-level metrics.

Returns

An instance of streamsets.sdk.st_models.PipelineMetrics.

timer(name)[source]

Get the metric timer from metrics.

Parameters

name (str) – Timer namer.

Returns

The metric timer as an instance of streamsets.sdk.st_models.MetricTimer.

Pipelines

class streamsets.sdk.st_models.PipelineBuilder(pipeline, definitions)[source]

Class with which to build ST pipelines.

This class allows a user to programmatically generate an ST pipeline. Instead of instantiating this class directly, most users should use streamsets.sdk.Transformer.get_pipeline_builder().

Parameters
  • pipeline (dict) – Python object representing an empty pipeline. If created manually, this would come from creating a new pipeline in ST and then exporting it before doing any configuration.

  • definitions (dict) – The output of ST’s definitions endpoint.

add_data_drift_rule(*data_drift_rules)[source]

Add one or more data drift rules to the pipeline.

Parameters

*data_drift_rules – One or more instances of streamsets.sdk.st_models.DataDriftRule.

add_data_rule(*data_rules)[source]

Add one or more data rules to the pipeline.

Parameters

*data_rules – One or more instances of streamsets.sdk.st_models.DataRule.

add_error_stage(label=None, name=None, library=None)[source]

Add an error stage to the pipeline.

When specifying a stage, either label or name must be used. If library is omitted, the first stage definition matching the given label or name will be used.

Parameters
  • label (str, optional) – ST stage label to use when selecting stage from definitions. Default: None.

  • name (str, optional) – ST stage name to use when selecting stage from definitions. Default: None.

  • library (str, optional) – ST stage library to use when selecting stage from definitions. Default: None.

Returns

An instance of streamsets.sdk.st_models.Stage.

add_metric_rule(*metric_rules)[source]

Add one or more metric rules to the pipeline.

Parameters

*data_rules – One or more instances of streamsets.sdk.st_models.MetricRule.

add_stage(label=None, name=None, type=None, library=None)[source]

Add a stage to the pipeline.

When specifying a stage, either label or name must be used. type and library may also be used to select a particular stage if ambiguities exist. If type and/or library are omitted, the first stage definition matching the given label or name will be used.

Parameters
  • label (str, optional) – ST stage label to use when selecting stage from definitions. Default: None.

  • name (str, optional) – ST stage name to use when selecting stage from definitions. Default: None.

  • type (str, optional) – ST stage type to use when selecting stage from definitions (e.g. origin, destination, processor, executor). Default: None.

  • library (str, optional) – ST stage library to use when selecting stage from definitions. Default: None.

Returns

An instance of streamsets.sdk.st_models.Stage.

add_start_event_stage(label=None, name=None, library=None)[source]

Add start event stage to the pipeline.

When specifying a stage, either label or name must be used. If library is omitted, the first stage definition matching the given label or name will be used.

Parameters
  • label (str, optional) – ST stage label to use when selecting stage from definitions. Default: None.

  • name (str, optional) – ST stage name to use when selecting stage from definitions. Default: None.

  • library (str, optional) – ST stage library to use when selecting stage from definitions. Default: None.

Returns

An instance of streamsets.sdk.st_models.Stage.

add_stats_aggregator_stage(label=None, name=None, library=None)[source]

Add a stats aggregator stage to the pipeline.

When specifying a stage, either label or name must be used. If library is omitted, the first stage definition matching the given label or name will be used.

Parameters
  • label (str, optional) – ST stage label to use when selecting stage from definitions. Default: None.

  • name (str, optional) – ST stage name to use when selecting stage from definitions. Default: None.

  • library (str, optional) – ST stage library to use when selecting stage from definitions. Default: None.

Returns

An instance of streamsets.sdk.st_models.Stage.

add_stop_event_stage(label=None, name=None, library=None)[source]

Add stop event stage to the pipeline.

When specifying a stage, either label or name must be used. If library is omitted, the first stage definition matching the given label or name will be used.

Parameters
  • label (str, optional) – ST stage label to use when selecting stage from definitions. Default: None.

  • name (str, optional) – ST stage name to use when selecting stage from definitions. Default: None.

  • library (str, optional) – ST stage library to use when selecting stage from definitions. Default: None.

Returns

An instance of streamsets.sdk.st_models.Stage.

build(title='Pipeline')[source]

Build the pipeline.

Parameters

title (str, optional) – Pipeline title to use. Default: 'Pipeline'.

Returns

An instance of streamsets.sdk.st_models.Pipeline.

import_pipeline(pipeline, **kwargs)[source]

Import a pipeline into the PipelineBuilder.

Parameters

pipeline (dict) – Exported pipeline.

Returns

An instance of streamsets.sdk.st_models.PipelineBuilder.

class streamsets.sdk.st_models.Pipeline(pipeline, all_stages=None)[source]

ST pipeline.

This class provides abstractions to make it easier to interact with a pipeline before it’s imported into ST.

Parameters
  • pipeline (dict) – A Python object representing the serialized pipeline.

  • all_stages (dict, optional) – A dictionary mapping stage names to streamsets.sdk.st_models.Stage instances. Default: None.

add_parameters(**parameters)[source]

Add pipeline parameters.

Parameters

**parameters – Keyword arguments to add.

property configuration

Get pipeline’s configuration.

Returns

An instance of streamsets.sdk.models.Configuration.

property delivery_guarantee

Get the delivery guarantee.

Returns

The delivery guarantee as a str.

property id

Get the pipeline id.

Returns

The pipeline id as a str.

property metadata

Get the pipeline metadata.

Returns

Pipeline metadata as a Python object.

property origin_stage

Get the pipeline’s origin stage.

Returns

An instance of streamsets.sdk.st_models.Stage.

property parameters

Get the pipeline parameters.

Returns

A dict of parameter key-value pairs.

pprint()[source]

Pretty-print the pipeline’s JSON representation.

property rate_limit

Get the rate limit (records/sec).

Returns

The rate limit as a str.

property title

Get the pipeline title.

Returns

The pipeline title as a str.

class streamsets.sdk.st_models.Stage(stage, label=None)[source]

Pipeline stage.

Parameters
  • stage – JSON representation of the pipeline stage.

  • label (str, optional) – Human-readable stage label. Default: None.

configuration

The stage configuration.

Type

streamsets.sdk.models.Configuration

services

If supported by the stage, a dictionary mapping a service name to an instance of streamsets.sdk.models.Configuration.

Type

dict

add_output(*other_stages, event_lane=False)[source]

Connect output of this stage to another stage.

The __rshift__ operator (>>) has been overloaded to invoke this method.

Parameters

other_stage (streamsets.sdk.st_models.Stage) – Stage object.

Returns

This stage as an instance of streamsets.sdk.st_models.Stage).

property event_lanes

Get the stage’s list of event lanes.

Returns

A list of event lanes.

property label

The stage’s label.

Type

str

property library

Get the stage’s library.

Returns

The stage library as a str.

property output_lanes

Get the stage’s list of output lanes.

Returns

A list of output lanes.

set_attributes(**attributes)[source]

Set one or more stage attributes.

Parameters

**attributes – Attributes to set.

Returns

This stage as an instance of streamsets.sdk.st_models.Stage.

property stage_on_record_error

The stage’s on record error configuration value.

property stage_record_preconditions

The stage’s record preconditions configuration value.

property stage_required_fields

The stage’s required fields configuration value.

Pipeline ACLs

class streamsets.sdk.st_models.PipelineAcl(pipeline_acl)[source]

Represents a pipeline ACL.

Parameters

pipeline_acl (dict) – JSON representation of a pipeline ACL.

permissions

Pipeline Permissions object.

Type

streamsets.sdk.st_models.PipelinePermissions

Pipeline Permissions

class streamsets.sdk.st_models.PipelinePermission(pipeline_permission)[source]

A container for a pipeline permission.

Parameters

pipeline_permission (dict) – A Python object representation of a pipeline permission.

class streamsets.sdk.st_models.PipelinePermissions(pipeline_permissions)[source]

Container for list of permissions for a pipeline.

Parameters

pipeline_permissions (dict) – A Python object representation of pipeline permissions.

permissions

A list of streamsets.sdk.st_models.PipelinePermission instances.

Type

list

Previews

class streamsets.sdk.st_models.Preview(pipeline_id, previewer_id, preview)[source]

Preview.

Parameters
  • pipeline_id (str) – Pipeline ID.

  • previewer_id (str) – Previewer ID.

  • preview (dict) – Python object representation of the preview.

issues

An instance of streamsets.sdk.st_models.Issues.

Type

dict

preview_batches

A list of streamsets.sdk.st_models.Batch instances.

Type

list

Snapshots

class streamsets.sdk.st_models.Batch(batch)[source]

Snapshot batch.

Parameters

batch – Python object representation of the snapshot batch.

class streamsets.sdk.st_models.Record(record)[source]

Record.

Parameters

record (dict) – Python object representation of the record.

header

An instance of streamsets.sdk.st_models.RecordHeader.

Type

dict

value

Python object representation of the record value.

Type

dict

value2

A typed representation of the record value.

get_field_attributes(path)[source]

Given a field path string (similar to XPath), get streamsets.sdk.st_models.Field attributes. .. rubric:: Example

get_field_attributes(path=’[2]/east/HR/employeeName’).

Parameters

path (str) – field path string.

Returns

streamsets.sdk.st_models.Field attributes.

get_field_data(path)[source]

Given a field path string (similar to XPath), get streamsets.sdk.st_models.Field. .. rubric:: Example

get_field_data(path=’[2]/east/HR/employeeName’).

Parameters

path (str) – field path string.

Returns

An instance of streamsets.sdk.st_models.Field.

class streamsets.sdk.st_models.RecordHeader(header)[source]

Record Header.

Parameters

header (dict) – Python object representation of the record header.

class streamsets.sdk.st_models.StageOutput(stage_output)[source]

Snapshot batch’s stage output.

Parameters

stage_output – Python object representation of the stage output.

property output

Gets the stage output’s output.

If the stage contains multiple lanes, use streamsets.sdk.st_models.StageOutput.output_lanes.

:raises An instance of Exception if the stage contains multiple lanes.:

Returns

An instance of streamsets.sdk.st_models.Record.

Users

class streamsets.sdk.st_models.User(user)[source]

User.

Parameters

user (dict) – Python object representation of the user.

property groups

Get user’s groups.

Returns

User groups as a str.

property name

Get user’s name.

Returns

User name as a str.

property roles

Get user’s roles.

Returns

User roles as a str.