Design Edge Pipelines
Edge pipelines run in edge execution mode. You design edge pipelines in Data Collector.
After designing edge pipelines, you deploy the edge pipelines to SDC Edge installed on an edge device. You then run the edge pipelines on SDC Edge.
You can design the following types of edge pipelines:
- Edge sending pipelines
- An edge sending pipeline uses an origin specific to the edge device to read local data residing on the device. The pipeline can perform minimal processing on the data before sending the data to a Data Collector receiving pipeline.
- Edge receiving pipelines
- An edge receiving pipeline listens for data sent by another pipeline running on Data Collector or on SDC Edge and then acts on that data to control the edge device.
Edge pipelines support a limited number of origins, processors, destinations, error record handling options, and data formats. Edge pipelines do not currently support any executors.
Origins
Edge pipelines support a limited number of origins.
Origins function the same way in edge pipelines as they do in other pipelines. However, some origins have limitations in edge pipelines as noted below. In addition, stages in edge pipelines support a limited number of data formats. Also, origins in edge pipeline can process uncompressed or compressed files, but not archive or compressed archive files.
Edge pipelines support the following origins:
Supported Origin | Limitations |
---|---|
Dev Data Generator | None |
Dev Random Record Source | None |
Dev Raw Data Source | None |
Directory | Edge pipelines do not support multithreaded processing. In an edge pipeline, the Directory origin always creates a single thread to read the files even if you configure it to use multiple threads. |
File Tail |
In edge pipelines, the File Tail origin can use the following naming conventions
for archived files:
|
gRPC Client | None |
HTTP Client |
In edge pipelines, the HTTP Client origin does not support batch processing mode, pagination, or OAuth2 authorization. |
HTTP Server | Edge pipelines do not support multithreaded processing. In an edge pipeline, the HTTP Server origin always creates a single thread to read the files even if you configure it to use multiple threads. |
MQTT Subscriber |
Edge pipelines that use MQTT stages require using an intermediary MQTT broker. For example, an edge sending pipeline uses an MQTT Publisher destination to write to an MQTT broker. The MQTT broker temporarily stores the data until the MQTT Subscriber origin in the Data Collector receiving pipeline reads the data. |
Sensor Reader | None |
System Metrics | None |
WebSocket Client | None |
Windows Event Log | None |
Processors
Edge pipelines support a limited number of processors. Processors function the same way in edge pipelines as they do in other pipelines. However, some processors have limitations in edge pipelines as noted below.
Edge pipelines support the following processors:
Supported Processor | Limitations |
---|---|
Delay | None |
Dev Identity | None |
Dev Random Error | None |
Expression Evaluator | None |
Field Remover | None |
JavaScript Evaluator | In edge pipelines, the JavaScript Evaluator processor does not support the sdcFunctions scripting object. |
Stream Selector | None |
TensorFlow Evaluator | In edge pipelines, the TensorFlow Evaluator processor can evaluate each record. It cannot evaluate the entire batch. |
Destinations
Edge pipelines support a limited number of destinations.
Destinations function the same way in edge pipelines as they do in other pipelines. However, some destinations have limitations in edge pipelines as noted below. In addition, stages in edge pipelines support a limited number of data formats.
Edge pipelines support the following destinations:
Destination | Limitations |
---|---|
Amazon S3 | In edge pipelines, the Amazon S3 destination does not generate event records after streaming a whole file. |
CoAP Client | None |
HTTP Client | In edge pipelines, the HTTP Client destination does not support OAuth 2 authentication. |
InfluxDB | In edge pipelines, the InfluxDB destination does not support automatically creating the database in InfluxDB. |
Kafka Producer | In edge pipelines, the Kafka Producer destination does not support Kerberos authentication to connect to Kafka. |
Kinesis Firehose | None |
Kinesis Producer | None |
MQTT Publisher | Edge pipelines that use MQTT stages require
using an intermediary MQTT broker. For example, an edge sending pipeline uses an MQTT Publisher destination to write to an MQTT broker. The MQTT broker temporarily stores the data until the MQTT Subscriber origin in the Data Collector receiving pipeline reads the data. |
To Error | None |
To Event | None |
Trash | None |
WebSocket Client | None |
Error Record Handling
- Discard
- The pipeline discards the record.
- Write to File
- The pipeline writes error records and related details to a local directory on the edge device. Create another edge pipeline with a Directory origin to process the error records written to the file.
- Write to MQTT
- The pipeline publishes error records and related details to a topic on an MQTT broker. Create another edge or standalone Data Collector pipeline with an MQTT Subscriber origin to process the error records published to the broker.
Data Formats
Stages included in edge pipelines can process a limited number of data formats.
- Binary
- Delimited
- JSON
- SDC Record
- Text
- Whole File
Origins in edge pipelines can only process uncompressed or compressed files, not archive or compressed archive files.
- Binary
- JSON
- SDC Record
- Text
- Whole File
Configure corresponding stages to use the same data format. For example, if the MQTT Publisher destination in an edge sending pipeline uses the JSON data format, then configure the MQTT Subscriber origin in the Data Collector receiving pipeline to also use the JSON data format.
Edge Pipeline Limitations
Edge pipelines run on SDC Edge which is a lightweight agent without a UI. As a result, some features available for standalone pipelines are not available for edge pipelines at this time. We will provide support for some of these features in edge pipelines in a future release.
- Origins in edge pipelines can process the Binary, Delimited, JSON, SDC Record, Text, and Whole File data formats only.
- Origins in edge pipelines can process uncompressed or compressed files, but not archive or compressed archive files.
- Destinations in edge pipelines can process the Binary, JSON, SDC Record, Text, and Whole File data formats only.
- When stages in edge pipelines are enabled for SSL/TLS, keystore and truststore files must use the PEM format.
- Edge pipelines cannot send email and webhook notifications.
- Rules and alerts are not used in edge pipelines.
- You cannot configure edge pipelines to retry upon error.
- You cannot configure pipeline memory or rate limits for edge pipelines.
-
Edge pipelines support only the following functions within the StreamSets expression language:
- All job functions.
- All pipeline functions.
- The
sdc:hostname()
function to return the host name of the edge device. - A limited number of record, math, and string functions.
- Edge pipelines do not support using executor stages to perform tasks when receiving dataflow triggers.
- Edge pipelines do not support multithreaded processing.
- You cannot capture snapshots for edge pipelines.
- Edge pipelines can only write statistics to Control Hub directly. As a result, Control Hub cannot display aggregated statistics for a job run on multiple instances of SDC Edge. When you monitor the job, you can view the statistics for each remote pipeline instance separately.