Glossary

Glossary of Terms

batch
A set of records that passes through a pipeline. Data Collector processes data in batches.
CDC-enabled origin
An origin that can process changed data and place CRUD operation information in the sdc.operation.type record header attribute.
cluster execution mode
Pipeline execution mode that allows you to process large volumes of data from Kafka or HDFS.
cluster pipeline, cluster mode pipeline
A pipeline configured to run in cluster execution mode.
control character
A non-printing character in a character set, such as the acknowledgement or escape characters.
CRUD-enabled stage
A processor or destination that can use the CRUD operation written in the sdc.operation.type header attribute to write changed data.
data alerts
Alerts based on rules that gather information about the data that passes between two stages.
Data Collector configuration file (sdc.properties)
Configuration file with most Data Collector properties. Found in the following location:
$SDC_CONF/sdc.properties
Data Collector Edge (SDC Edge)

A lightweight agent without a UI that runs pipelines in edge execution mode on edge devices.

data drift alerts
Alerts based on data drift functions that gather information about the structure of data that passes between two stages.
data preview
Preview of data as it moves through a pipeline. Use to develop and test pipelines.
dataflow triggers
Instructions for the pipeline to kick off asynchronous tasks in external systems in response to events that occur in the pipeline. For more information, see Dataflow Triggers Overview.
delivery guarantee
Pipeline property that determines how Data Collector handles data when the pipeline stops unexpectedly.
destination
A stage type used in a pipeline to represent where Data Collector writes processed data.
development stages, dev stages
Stages such as the Dev Data Generator origin and the Dev Random Error processor that enable pipeline development and testing. Not meant for use in production pipelines.
edge pipeline, edge mode pipeline
A pipeline that runs in edge execution mode on a Data Collector Edge (SDC Edge) installed on an edge device. Use edge pipelines to read data from the edge device or to receive data from another pipeline and then act on that data to control the edge device.
event framework

The event framework enables the pipeline to trigger tasks in external systems based on actions that occur in the pipeline, such as running a MapReduce job after the pipeline writes a file to HDFS. You can also use the event framework to store event information, such as when an origin starts or completes reading a file.

event record
A record created by an event-generating stage when a stage-related event occurs, like when an origin starts reading a new file or a destination closes an output file.
executor
A stage type used to perform tasks in external systems upon receiving an event record.
explicit validation
A semantic validation that checks all configured values for validity and verifies whether the pipeline can run as configured. Occurs when you click the Validate icon, request data preview, or start the pipeline.
field path
The path to a field in a record. Use to reference a field.
implicit validation
Lists missing or incomplete configuration. Occurs by default as changes are saved in the pipeline canvas.
late directories
Origin directories that appear after a pipeline starts.
metric alerts
Alerts based on stage or pipeline metrics.
microservice pipeline
A pipeline that creates a fine­grained service to perform a specific task.
multithreaded pipeline
A pipeline with an origin that generates multiple threads, enabling the processing of high volumes of data in a single pipeline.
orchestration pipeline
A pipeline that can schedule and perform a variety of tasks to complete an integrated workflow across the StreamSets ecosystem.
origin
A stage type used in a pipeline to represent the source of data in a pipeline.
pipeline
A representation of a stream of data processing.
pipeline runner
Used in multithreaded pipelines to run a sourceless instance of a pipeline.
preconditions
Conditions that a record must satisfy to enter the stage for processing. Records that don't meet all preconditions are processed based on stage error handling.
processors
A stage type that performs specific processing on pipeline data.
required fields
A required field is a field that must exist in a record to allow it into the stage for processing. Records that don't have all required fields are processed based on pipeline error handling.
runtime parameters
Parameters that you define for the pipeline and call from within that same pipeline.
runtime properties
Properties that you define in a file local to Data Collector and call from within a pipeline.
runtime resources
Values that you define in a restricted file local to Data Collector and call from within a pipeline.
SDC Record data format
A data format used for Data Collector error records and an optional format to use for output records.
SDC RPC pipelines
A set of pipelines that use the SDC RPC destination and SDC RPC origin to pass data from one pipeline to another without writing to an intermediary system.
sourceless pipeline instance
An instance of the pipeline that includes all of the processors and destinations in the pipeline and represents all pipeline processing after the origin. Used in multithreaded pipelines.
snapshot
A set of data captured as a pipeline runs. You can step through the snapshot like data preview. You can also use it as a source for data preview.
standalone pipeline, standalone mode pipeline
A pipeline configured to run in the default standalone execution mode.