Glossary
Glossary of Terms
- batch
- A set of records that passes through a pipeline. Data Collector processes data in batches.
- CDC-enabled origin
- An origin that can process changed data and place CRUD operation information in the sdc.operation.type record header attribute.
- cluster execution mode
- Pipeline execution mode that allows you to process large volumes of data from Kafka or HDFS.
- cluster pipeline, cluster mode pipeline
- A pipeline configured to run in cluster execution mode.
- control character
- A non-printing character in a character set, such as the acknowledgement or escape characters.
- CRUD-enabled stage
- A processor or destination that can use the CRUD operation written in the sdc.operation.type header attribute to write changed data.
- data alerts
- Alerts based on rules that gather information about the data that passes between two stages.
- Data Collector configuration file (sdc.properties)
- Configuration file with most Data Collector
properties. Found in the following location:
$SDC_CONF/sdc.properties
- Data Collector Edge (SDC Edge)
-
A lightweight agent without a UI that runs pipelines in edge execution mode on edge devices.
- data drift alerts
- Alerts based on data drift functions that gather information about the structure of data that passes between two stages.
- data preview
- Preview of data as it moves through a pipeline. Use to develop and test pipelines.
- dataflow triggers
- Instructions for the pipeline to kick off asynchronous tasks in external systems in response to events that occur in the pipeline. For more information, see Dataflow Triggers Overview.
- delivery guarantee
- Pipeline property that determines how Data Collector handles data when the pipeline stops unexpectedly.
- destination
- A stage type used in a pipeline to represent where Data Collector writes processed data.
- development stages, dev stages
- Stages such as the Dev Data Generator origin and the Dev Random Error processor that enable pipeline development and testing. Not meant for use in production pipelines.
- edge pipeline, edge mode pipeline
- A pipeline that runs in edge execution mode on a Data Collector Edge (SDC Edge) installed on an edge device. Use edge pipelines to read data from the edge device or to receive data from another pipeline and then act on that data to control the edge device.
- event framework
-
The event framework enables the pipeline to trigger tasks in external systems based on actions that occur in the pipeline, such as running a MapReduce job after the pipeline writes a file to HDFS. You can also use the event framework to store event information, such as when an origin starts or completes reading a file.
- event record
- A record created by an event-generating stage when a stage-related event occurs, like when an origin starts reading a new file or a destination closes an output file.
- executor
- A stage type used to perform tasks in external systems upon receiving an event record.
- explicit validation
- A semantic validation that checks all configured values for validity and verifies whether the pipeline can run as configured. Occurs when you click the Validate icon, request data preview, or start the pipeline.
- field path
- The path to a field in a record. Use to reference a field.
- implicit validation
- Lists missing or incomplete configuration. Occurs by default as changes are saved in the pipeline canvas.
- late directories
- Origin directories that appear after a pipeline starts.
- metric alerts
- Alerts based on stage or pipeline metrics.
- microservice pipeline
- A pipeline that creates a finegrained service to perform a specific task.
- multithreaded pipeline
- A pipeline with an origin that generates multiple threads, enabling the processing of high volumes of data in a single pipeline.
- orchestration pipeline
- A pipeline that can schedule and perform a variety of tasks to complete an integrated workflow across the StreamSets ecosystem.
- origin
- A stage type used in a pipeline to represent the source of data in a pipeline.
- pipeline
- A representation of a stream of data processing.
- pipeline runner
- Used in multithreaded pipelines to run a sourceless instance of a pipeline.
- preconditions
- Conditions that a record must satisfy to enter the stage for processing. Records that don't meet all preconditions are processed based on stage error handling.
- processors
- A stage type that performs specific processing on pipeline data.
- required fields
- A required field is a field that must exist in a record to allow it into the stage for processing. Records that don't have all required fields are processed based on pipeline error handling.
- runtime parameters
- Parameters that you define for the pipeline and call from within that same pipeline.
- runtime properties
- Properties that you define in a file local to Data Collector and call from within a pipeline.
- runtime resources
- Values that you define in a restricted file local to Data Collector and call from within a pipeline.
- SDC Record data format
- A data format used for Data Collector error records and an optional format to use for output records.
- SDC RPC pipelines
- A set of pipelines that use the SDC RPC destination and SDC RPC origin to pass data from one pipeline to another without writing to an intermediary system.
- sourceless pipeline instance
- An instance of the pipeline that includes all of the processors and destinations in the pipeline and represents all pipeline processing after the origin. Used in multithreaded pipelines.
- snapshot
- A set of data captured as a pipeline runs. You can step through the snapshot like data preview. You can also use it as a source for data preview.
- standalone pipeline, standalone mode pipeline
- A pipeline configured to run in the default standalone execution mode.