Glossary

Glossary of Terms

batch
A set of records that passes through a pipeline. Data Collector processes data in batches.
CDC-enabled origin
An origin that can process changed data and place CRUD operation information in the sdc.operation.type record header attribute.
control character
A non-printing character in a character set, such as the acknowledgement or escape characters.
CRUD-enabled stage
A processor or destination that can use the CRUD operation written in the sdc.operation.type header attribute to write changed data.
data alerts
Alerts based on rules that gather information about the data that passes between two stages.
data drift alerts
Alerts based on data drift functions that gather information about the structure of data that passes between two stages.
dataflow triggers
Instructions for the pipeline to kick off asynchronous tasks in external systems in response to events that occur in the pipeline. For more information, see Dataflow Triggers Overview.
delivery guarantee
Pipeline property that determines how Data Collector handles data when the pipeline stops unexpectedly.
destination
A stage type used in a pipeline to represent where Data Collector writes processed data.
development stages, dev stages
Stages such as the Dev Data Generator origin and the Dev Random Error processor that enable pipeline development and testing. Not meant for use in production pipelines.
event framework

The event framework enables the pipeline to trigger tasks in external systems based on actions that occur in the pipeline, such as running a MapReduce job after the pipeline writes a file to HDFS. You can also use the event framework to store event information, such as when an origin starts or completes reading a file.

event record
A record created by an event-generating stage when a stage-related event occurs, like when an origin starts reading a new file or a destination closes an output file.
executor
A stage type used to perform tasks in external systems upon receiving an event record.
explicit validation
A semantic validation that checks all configured values for validity and verifies whether the pipeline can run as configured. Occurs when you click the Validate icon, request data preview, or start the pipeline.
field path
The path to a field in a record. Use to reference a field.
implicit validation
Lists missing or incomplete configuration. Occurs by default as changes are saved in the pipeline canvas.
late directories
Origin directories that appear after a pipeline starts.
metric alerts
Alerts based on stage or pipeline metrics.
microservice pipeline
A pipeline that creates a fine­grained service to perform a specific task.
multithreaded pipeline
A pipeline with an origin that generates multiple threads, enabling the processing of high volumes of data in a single pipeline.
orchestration pipeline
A pipeline that can schedule and perform a variety of tasks to complete an integrated workflow across the StreamSets ecosystem.
origin
A stage type used in a pipeline to represent the source of data in a pipeline.
pipeline
A representation of a stream of data processing.
pipeline runner
Used in multithreaded pipelines to run a sourceless instance of a pipeline.
preconditions
Conditions that a record must satisfy to enter the stage for processing. Records that don't meet all preconditions are processed based on stage error handling.
processors
A stage type that performs specific processing on pipeline data.
required fields
A required field is a field that must exist in a record to allow it into the stage for processing. Records that don't have all required fields are processed based on pipeline error handling.
runtime parameters
Parameters that you define for the pipeline and call from within that same pipeline.
runtime properties
Properties that you define in an external location and call from within a pipeline.
runtime resources
Values that you define in an external file and call from within a pipeline.
SDC Record data format
A data format used for Data Collector error records and an optional format to use for output records.
sourceless pipeline instance
An instance of the pipeline that includes all of the processors and destinations in the pipeline and represents all pipeline processing after the origin. Used in multithreaded pipelines.
standalone pipeline, standalone mode pipeline
A pipeline configured to run in the default standalone execution mode.