Error Record Handling

You can configure error record handling at a stage level and at a pipeline level. You can also specify the version of the record to use as the basis for the error record.

When an error occurs as a stage processes a record, Data Collector handles the record based on the stage configuration. One of the stage options is to pass the record to the pipeline for error handling. For this option, Data Collector processes the record based on the pipeline error record handling configuration.

When you configure a pipeline, be aware that stage error handling takes precedence over pipeline error handling. That is, a pipeline might be configured to write error records to file, but if a stage is configured to discard error records those records are discarded. You might use this functionality to reduce the types of error records that are saved for review and reprocessing.

Note that records missing required fields do not enter the stage. They are passed directly to the pipeline for error handling.

Pipeline Error Record Handling

Pipeline error record handling determines how Data Collector processes error records that stages send to the pipeline for error handling. It also handles records deliberately dropped from the pipeline such as records without required fields.

The pipeline handles error records based on the Error Records property on the Error Records tab. When Data Collector encounters an unexpected error, it stops the pipeline and logs the error.

Pipelines provide the following error record handling options:
Discard
The pipeline discards the record. Data Collector includes the records in error record counts and metrics.
Send Response to Origin
The pipeline passes error records back to the microservice origin to be included in a response to the originating REST API client. Data Collector includes the records in error record counts and metrics. Use in microservice pipelines only.
Write to Amazon S3
The pipeline writes error records and related details to Amazon S3. Data Collector includes the records in error record counts and metrics.

You define the Amazon S3 configuration properties.

Write to Azure Event Hub
The pipeline writes error records and related details to Microsoft Azure Event Hub. Data Collector includes the records in error record counts and metrics.
You define the configuration properties for the Azure Event Hub to use.
Write to Elasticsearch
The pipeline writes error records and related details to Elasticsearch. Data Collector includes the records in error record counts and metrics.
You define the configuration properties for the Elasticsearch cluster to use.
Write to File
The pipeline writes error records and related details to a local directory. Data Collector includes the records in error record counts and metrics.
You define the directory to use and the maximum file size. Error files are named based on the File Prefix pipeline property.
Write to file is not supported for cluster pipelines at this time.
Write to Google Cloud Storage
The pipeline writes error records and related details to Google Cloud Storage. Data Collector includes the records in error record counts and metrics.
You define the Google Cloud Storage configuration properties.
Write to Google Pub/Sub
The pipeline writes error records and related details to Google Pub/Sub. Data Collector includes the records in error record counts and metrics.
You define the Google Pub/Sub configuration properties.
Write to Kafka
The pipeline writes error records and related details to Kafka. Data Collector includes the records in error record counts and metrics.
You define the configuration properties for the Kafka cluster to use.
Write to Kinesis
The pipeline writes error records and related details to Amazon Kinesis Streams. Data Collector includes the records in error record counts and metrics.
You define the configuration properties for the Kinesis stream to use.
Write to MapR Streams
The pipeline writes error records and related details to MapR Streams. Data Collector includes the records in error record counts and metrics.
You define the configuration properties for the MapR Streams cluster to use.
Write to MQTT
The pipeline writes error records and related details to an MQTT broker. Data Collector includes the records in error record counts and metrics.
You define the configuration properties for the MQTT broker to use.

Stage Error Record Handling

Most stages include error record handling options. When an error occurs when processing a record, Data Collector processes records based on the On Record Error property on the General tab of the stage.

The On Record Error property provides the following error handling options for stages:
Discard
The stage silently discards the record. Data Collector does not log information about the error or note the specific record that encountered an error. The discarded record is not included in error record counts or metrics.
Send to Error
The stage sends the record to the pipeline for error handling. The pipeline processes the record based on the pipeline error handling configuration.
Stop Pipeline
Data Collector stops the pipeline and logs information about the error. The error that stopped the pipeline displays as an error in the pipeline history.

Example

An origin reads JSON data with a maximum object length of 4096 characters and the origin encounters an object with 5000 characters. Based on the stage configuration, Data Collector either discards the record, stops the pipeline, or passes the record to the pipeline for error record handling.

When the stage is configured to send the record to the pipeline, one of the following occurs based on how you configure the pipeline error handling:
  • When the pipeline discards error records, Data Collector discards the record without noting the action or the cause.
  • When the pipeline writes error records to a destination, Data Collector writes the error record and additional error information to the destination. It also includes the error records in counts and metrics.

Error Records and Version

When Data Collector creates an error record, it preserves the data and attributes from the record that triggered the error, and then adds error related information as record header attributes. For a list of the error header attributes and other internal header attributes associated with a record, see Internal Attributes.

When you configure a pipeline, you can specify the version of the record that you want to use:
  • The original record - The record as originally generated by the origin. Use this record when you want the original record without any additional pipeline processing.
  • The current record - The record in the stage that generated the error. Depending on the type of error that occurred, this record can be unprocessed or partially processed by the error-generating stage.

    Use this record when you want to preserve any processing that the pipeline completed before the record caused an error.