Glossary
Glossary of Terms
- batch
- A set of records that passes through a pipeline. Data Collector processes data in batches.
- CDC-enabled origin
- An origin that can process changed data and place CRUD operation information in the sdc.operation.type record header attribute.
- control character
- A non-printing character in a character set, such as the acknowledgement or escape characters.
- CRUD-enabled stage
- A processor or destination that can use the CRUD operation written in the sdc.operation.type header attribute to write changed data.
- data alerts
- Alerts based on rules that gather information about the data that passes between two stages.
- data drift alerts
- Alerts based on data drift functions that gather information about the structure of data that passes between two stages.
- dataflow triggers
- Instructions for the pipeline to kick off asynchronous tasks in external systems in response to events that occur in the pipeline. For more information, see Dataflow Triggers Overview.
- delivery guarantee
- Pipeline property that determines how Data Collector handles data when the pipeline stops unexpectedly.
- destination
- A stage type used in a pipeline to represent where Data Collector writes processed data.
- development stages, dev stages
- Stages such as the Dev Data Generator origin and the Dev Random Error processor that enable pipeline development and testing. Not meant for use in production pipelines.
- event framework
-
The event framework enables the pipeline to trigger tasks in external systems based on actions that occur in the pipeline, such as running a MapReduce job after the pipeline writes a file to HDFS. You can also use the event framework to store event information, such as when an origin starts or completes reading a file.
- event record
- A record created by an event-generating stage when a stage-related event occurs, like when an origin starts reading a new file or a destination closes an output file.
- executor
- A stage type used to perform tasks in external systems upon receiving an event record.
- explicit validation
- A semantic validation that checks all configured values for validity and verifies whether the pipeline can run as configured. Occurs when you click the Validate icon, request data preview, or start the pipeline.
- field path
- The path to a field in a record. Use to reference a field.
- implicit validation
- Lists missing or incomplete configuration. Occurs by default as changes are saved in the pipeline canvas.
- late directories
- Origin directories that appear after a pipeline starts.
- metric alerts
- Alerts based on stage or pipeline metrics.
- microservice pipeline
- A pipeline that creates a finegrained service to perform a specific task.
- multithreaded pipeline
- A pipeline with an origin that generates multiple threads, enabling the processing of high volumes of data in a single pipeline.
- orchestration pipeline
- A pipeline that can schedule and perform a variety of tasks to complete an integrated workflow across the StreamSets ecosystem.
- origin
- A stage type used in a pipeline to represent the source of data in a pipeline.
- pipeline
- A representation of a stream of data processing.
- pipeline runner
- Used in multithreaded pipelines to run a sourceless instance of a pipeline.
- preconditions
- Conditions that a record must satisfy to enter the stage for processing. Records that don't meet all preconditions are processed based on stage error handling.
- processors
- A stage type that performs specific processing on pipeline data.
- required fields
- A required field is a field that must exist in a record to allow it into the stage for processing. Records that don't have all required fields are processed based on pipeline error handling.
- runtime parameters
- Parameters that you define for the pipeline and call from within that same pipeline.
- runtime properties
- Properties that you define in an external location and call from within a pipeline.
- runtime resources
- Values that you define in an external file and call from within a pipeline.
- SDC Record data format
- A data format used for Data Collector error records and an optional format to use for output records.
- sourceless pipeline instance
- An instance of the pipeline that includes all of the processors and destinations in the pipeline and represents all pipeline processing after the origin. Used in multithreaded pipelines.
- standalone pipeline, standalone mode pipeline
- A pipeline configured to run in the default standalone execution mode.