Design Edge Pipelines

Edge pipelines run in edge execution mode. You design edge pipelines in Data Collector.

After designing edge pipelines, you deploy the edge pipelines to SDC Edge installed on an edge device. You then run the edge pipelines on SDC Edge.

You can design the following types of edge pipelines:

Edge sending pipelines
An edge sending pipeline uses an origin specific to the edge device to read local data residing on the device. The pipeline can perform minimal processing on the data before sending the data to a Data Collector receiving pipeline.
Optionally, you can also design an edge sending pipeline to monitor the data being processed and then send data to an edge receiving pipeline running on the same SDC Edge. The edge receiving pipeline acts on the data to control the edge device.
Edge receiving pipelines
An edge receiving pipeline listens for data sent by another pipeline running on Data Collector or on SDC Edge and then acts on that data to control the edge device.
An edge receiving pipeline includes the corresponding origin to read from the destination in the pipeline that sends the data. For example, if the sending pipeline writes to an HTTP Client destination, then the edge receiving pipeline uses an HTTP Server origin to read the data.

Edge pipelines support a limited number of origins, processors, destinations, error record handling options, and data formats. Edge pipelines do not currently support any executors.

Origins

Edge pipelines support a limited number of origins.

Origins function the same way in edge pipelines as they do in other pipelines. However, some origins have limitations in edge pipelines as noted below. In addition, stages in edge pipelines support a limited number of data formats. Also, origins in edge pipeline can process uncompressed or compressed files, but not archive or compressed archive files.

Edge pipelines support the following origins:

Supported Origin Limitations
Dev Data Generator None
Dev Random Record Source None
Dev Raw Data Source None
Directory Edge pipelines do not support multithreaded processing.

In an edge pipeline, the Directory origin always creates a single thread to read the files even if you configure it to use multiple threads.

File Tail
In edge pipelines, the File Tail origin can use the following naming conventions for archived files:
  • Active File with Reverse Counter
  • Files Matching a Pattern
If you configure the origin to use other active file naming conventions, the origin uses the Active File with Reverse Counter convention.
gRPC Client None
HTTP Client

In edge pipelines, the HTTP Client origin does not support batch processing mode, pagination, or OAuth2 authorization.

HTTP Server Edge pipelines do not support multithreaded processing.

In an edge pipeline, the HTTP Server origin always creates a single thread to read the files even if you configure it to use multiple threads.

MQTT Subscriber

Edge pipelines that use MQTT stages require using an intermediary MQTT broker.

For example, an edge sending pipeline uses an MQTT Publisher destination to write to an MQTT broker. The MQTT broker temporarily stores the data until the MQTT Subscriber origin in the Data Collector receiving pipeline reads the data.

Sensor Reader None
System Metrics None
WebSocket Client None
Windows Event Log None

Processors

Edge pipelines support a limited number of processors. Processors function the same way in edge pipelines as they do in other pipelines. However, some processors have limitations in edge pipelines as noted below.

Edge pipelines support the following processors:

Supported Processor Limitations
Delay None
Dev Identity None
Dev Random Error None
Expression Evaluator None
Field Remover None
JavaScript Evaluator In edge pipelines, the JavaScript Evaluator processor does not support the sdcFunctions scripting object.
Stream Selector None
TensorFlow Evaluator In edge pipelines, the TensorFlow Evaluator processor can evaluate each record. It cannot evaluate the entire batch.

Destinations

Edge pipelines support a limited number of destinations.

Destinations function the same way in edge pipelines as they do in other pipelines. However, some destinations have limitations in edge pipelines as noted below. In addition, stages in edge pipelines support a limited number of data formats.

Edge pipelines support the following destinations:

Destination Limitations
Amazon S3 In edge pipelines, the Amazon S3 destination does not generate event records after streaming a whole file.
CoAP Client None
HTTP Client In edge pipelines, the HTTP Client destination does not support OAuth 2 authentication.
InfluxDB In edge pipelines, the InfluxDB destination does not support automatically creating the database in InfluxDB.
Kafka Producer In edge pipelines, the Kafka Producer destination does not support Kerberos authentication to connect to Kafka.
Kinesis Firehose None
Kinesis Producer None
MQTT Publisher Edge pipelines that use MQTT stages require using an intermediary MQTT broker.

For example, an edge sending pipeline uses an MQTT Publisher destination to write to an MQTT broker. The MQTT broker temporarily stores the data until the MQTT Subscriber origin in the Data Collector receiving pipeline reads the data.

To Error None
To Event None
Trash None
WebSocket Client None

Error Record Handling

You can configure the following error record handling options for an edge pipeline:
Discard
The pipeline discards the record.
Write to File
The pipeline writes error records and related details to a local directory on the edge device. Create another edge pipeline with a Directory origin to process the error records written to the file.
Write to MQTT
The pipeline publishes error records and related details to a topic on an MQTT broker. Create another edge or standalone Data Collector pipeline with an MQTT Subscriber origin to process the error records published to the broker.

Data Formats

Stages included in edge pipelines can process a limited number of data formats.

Origins included in edge pipelines can process the following data formats:
  • Binary
  • Delimited
  • JSON
  • SDC Record
  • Text
  • Whole File

Origins in edge pipelines can only process uncompressed or compressed files, not archive or compressed archive files.

Destinations included in edge pipelines can process the following data formats:
  • Binary
  • JSON
  • SDC Record
  • Text
  • Whole File

Configure corresponding stages to use the same data format. For example, if the MQTT Publisher destination in an edge sending pipeline uses the JSON data format, then configure the MQTT Subscriber origin in the Data Collector receiving pipeline to also use the JSON data format.

Edge Pipeline Limitations

Edge pipelines run on SDC Edge which is a lightweight agent without a UI. As a result, some features available for standalone pipelines are not available for edge pipelines at this time. We will provide support for some of these features in edge pipelines in a future release.

Please note the following limitations for edge pipelines:
  • Origins in edge pipelines can process the Binary, Delimited, JSON, SDC Record, Text, and Whole File data formats only.
  • Origins in edge pipelines can process uncompressed or compressed files, but not archive or compressed archive files.
  • Destinations in edge pipelines can process the Binary, JSON, SDC Record, Text, and Whole File data formats only.
  • When stages in edge pipelines are enabled for SSL/TLS, keystore and truststore files must use the PEM format.
  • Edge pipelines cannot send email and webhook notifications.
  • Rules and alerts are not used in edge pipelines.
  • You cannot configure edge pipelines to retry upon error.
  • You cannot configure pipeline memory or rate limits for edge pipelines.
  • Edge pipelines support only the following functions within the StreamSets expression language:
    • All job functions.
    • All pipeline functions.
    • The sdc:hostname() function to return the host name of the edge device.
    • A limited number of record, math, and string functions.
  • Edge pipelines do not support using executor stages to perform tasks when receiving dataflow triggers.
  • Edge pipelines do not support multithreaded processing.
  • You cannot capture snapshots for edge pipelines.
  • Edge pipelines can only write statistics to Control Hub directly. As a result, Control Hub cannot display aggregated statistics for a job run on multiple instances of SDC Edge. When you monitor the job, you can view the statistics for each remote pipeline instance separately.