Design Edge Pipelines

Edge pipelines run in edge execution mode. You design edge pipelines in Data Collector.

After designing edge pipelines, you deploy the edge pipelines to SDC Edge installed on an edge device. You then run the edge pipelines on SDC Edge.

You can design the following types of edge pipelines:

Edge sending pipelines: An edge sending pipeline uses an origin specific to the edge device to read local data residing on the device. The pipeline can perform minimal processing on the data before sending the data to a Data Collector receiving pipeline.; Optionally, you can also design an edge sending pipeline to monitor the data being processed and then send data to an edge receiving pipeline running on the same SDC Edge. The edge receiving pipeline acts on the data to control the edge device.
Edge receiving pipelines: An edge receiving pipeline listens for data sent by another pipeline running on Data Collector or on SDC Edge and then acts on that data to control the edge device.; An edge receiving pipeline includes the corresponding origin to read from the destination in the pipeline that sends the data. For example, if the sending pipeline writes to an HTTP Client destination, then the edge receiving pipeline uses an HTTP Server origin to read the data.

Edge pipelines support a limited number of origins, processors, destinations, error record handling options, and data formats. Edge pipelines do not currently support any executors.

Origins

Edge pipelines support a limited number of origins.

Origins function the same way in edge pipelines as they do in other pipelines. However, some origins have limitations in edge pipelines as noted below. In addition, stages in edge pipelines support a limited number of data formats. Also, origins in edge pipeline can process uncompressed or compressed files, but not archive or compressed archive files.

Edge pipelines support the following origins:


Supported Origin	Limitations
Dev Data Generator	None
Dev Random Record Source	None
Dev Raw Data Source	None
Directory	Edge pipelines do not support multithreaded processing. In an edge pipeline, the Directory origin always creates a single thread to read the files even if you configure it to use multiple threads.
File Tail	In edge pipelines, the File Tail origin can use the following naming conventions for archived files: Active File with Reverse Counter Files Matching a Pattern If you configure the origin to use other active file naming conventions, the origin uses the Active File with Reverse Counter convention.
gRPC Client	None
HTTP Client	In edge pipelines, the HTTP Client origin does not support batch processing mode, pagination, or OAuth2 authorization.
HTTP Server	Edge pipelines do not support multithreaded processing. In an edge pipeline, the HTTP Server origin always creates a single thread to read the files even if you configure it to use multiple threads.
MQTT Subscriber	Edge pipelines that use MQTT stages require using an intermediary MQTT broker. For example, an edge sending pipeline uses an MQTT Publisher destination to write to an MQTT broker. The MQTT broker temporarily stores the data until the MQTT Subscriber origin in the Data Collector receiving pipeline reads the data.
Sensor Reader	None
System Metrics	None
WebSocket Client	None
Windows Event Log	None

Processors

Edge pipelines support a limited number of processors. Processors function the same way in edge pipelines as they do in other pipelines. However, some processors have limitations in edge pipelines as noted below.

Edge pipelines support the following processors:


Supported Processor	Limitations
Delay	None
Dev Identity	None
Dev Random Error	None
Expression Evaluator	None
Field Remover	None
JavaScript Evaluator	In edge pipelines, the JavaScript Evaluator processor does not support the sdcFunctions scripting object.
Stream Selector	None
TensorFlow Evaluator	In edge pipelines, the TensorFlow Evaluator processor can evaluate each record. It cannot evaluate the entire batch.

Destinations

Edge pipelines support a limited number of destinations.

Destinations function the same way in edge pipelines as they do in other pipelines. However, some destinations have limitations in edge pipelines as noted below. In addition, stages in edge pipelines support a limited number of data formats.

Edge pipelines support the following destinations:


Destination	Limitations
Amazon S3	In edge pipelines, the Amazon S3 destination does not generate event records after streaming a whole file.
CoAP Client	None
HTTP Client	In edge pipelines, the HTTP Client destination does not support OAuth 2 authentication.
InfluxDB	In edge pipelines, the InfluxDB destination does not support automatically creating the database in InfluxDB.
Kafka Producer	In edge pipelines, the Kafka Producer destination does not support Kerberos authentication to connect to Kafka.
Kinesis Firehose	None
Kinesis Producer	None
MQTT Publisher	Edge pipelines that use MQTT stages require using an intermediary MQTT broker. For example, an edge sending pipeline uses an MQTT Publisher destination to write to an MQTT broker. The MQTT broker temporarily stores the data until the MQTT Subscriber origin in the Data Collector receiving pipeline reads the data.
To Error	None
To Event	None
Trash	None
WebSocket Client	None

Error Record Handling

You can configure the following error record handling options for an edge pipeline:

Discard: The pipeline discards the record.
Write to File: The pipeline writes error records and related details to a local directory on the edge device. Create another edge pipeline with a Directory origin to process the error records written to the file.
Write to MQTT: The pipeline publishes error records and related details to a topic on an MQTT broker. Create another edge or standalone Data Collector pipeline with an MQTT Subscriber origin to process the error records published to the broker.

Data Formats

Stages included in edge pipelines can process a limited number of data formats.

Origins included in edge pipelines can process the following data formats:

Binary
Delimited
JSON
SDC Record
Text
Whole File

Origins in edge pipelines can only process uncompressed or compressed files, not archive or compressed archive files.

Destinations included in edge pipelines can process the following data formats:

Binary
JSON
SDC Record
Text
Whole File

Configure corresponding stages to use the same data format. For example, if the MQTT Publisher destination in an edge sending pipeline uses the JSON data format, then configure the MQTT Subscriber origin in the Data Collector receiving pipeline to also use the JSON data format.

Edge Pipeline Limitations

Edge pipelines run on SDC Edge which is a lightweight agent without a UI. As a result, some features available for standalone pipelines are not available for edge pipelines at this time. We will provide support for some of these features in edge pipelines in a future release.

Please note the following limitations for edge pipelines:

Origins in edge pipelines can process the Binary, Delimited, JSON, SDC Record, Text, and Whole File data formats only.
Origins in edge pipelines can process uncompressed or compressed files, but not archive or compressed archive files.
Destinations in edge pipelines can process the Binary, JSON, SDC Record, Text, and Whole File data formats only.
When stages in edge pipelines are enabled for SSL/TLS, keystore and truststore files must use the PEM format.
Edge pipelines cannot send email and webhook notifications.
Rules and alerts are not used in edge pipelines.
You cannot configure edge pipelines to retry upon error.
You cannot configure pipeline memory or rate limits for edge pipelines.
Edge pipelines support only the following functions within the StreamSets expression language:
- All job functions.
- All pipeline functions.
- The sdc:hostname() function to return the host name of the edge device.
- A limited number of record, math, and string functions.
Edge pipelines do not support using executor stages to perform tasks when receiving dataflow triggers.
Edge pipelines do not support multithreaded processing.
You cannot capture snapshots for edge pipelines.
Edge pipelines can only write statistics to Control Hub directly. As a result, Control Hub cannot display aggregated statistics for a job run on multiple instances of SDC Edge. When you monitor the job, you can view the statistics for each remote pipeline instance separately.