MapR DB JSON

Supported pipeline types:
  • Data Collector

The MapR DB JSON origin reads JSON documents from MapR DB JSON tables. The origin converts each document into a record.

MapR is now HPE Ezmeral Data Fabric. At times, this documentation uses "MapR" to refer to both MapR and HPE Ezmeral Data Fabric. For information about supported versions, see Supported Systems and Versions.

MapR DB JSON tables are tables in which every row is a JSON document. Each JSON document has a unique identifier stored in the _id field, which in turn is used as the row key to uniquely identify each row in the table.

When you configure the origin, you define the JSON table to read from. The origin uses the _id field in each JSON document as the offset field. You can optionally define the initial offset value to start reading from.

When the pipeline stops, the MapR DB JSON origin notes where it stops reading. When the pipeline starts again, the origin continues processing from where it stopped by default. You can reset the origin to process all available data.

Tip: Data Collector provides several MapR origins to address different needs. For a quick comparison chart to help you choose the right one, see Comparing MapR Origins.

Before you use any MapR stage in a pipeline, you must perform additional steps to enable Data Collector to process MapR data. For more information, see MapR Prerequisites.

Handling the _id Field

When the origin converts a JSON document into a record, it includes the _id field of the JSON document in the record. If needed, you can use the Field Remover processor in the pipeline to remove the _id field.

The _id field in a JSON document can contain string or binary data. The MapR DB JSON origin can read from JSON tables that include _id fields with one of the valid types. For example, the origin can read from a JSON table when all documents in the table have a string _id field or when all documents have a binary _id field. The origin cannot read from a table with a combination of types for the _id field.

When a JSON document contains a string _id field, the origin creates the _id field in the record as a String.

When a JSON document contains a binary _id field, the origin converts the data to String and then includes the field in the record.

Note: A binary _id field in a JSON document must contain numeric data for the origin to process the data correctly. In addition, binary _id fields must have the same width for all rows or JSON documents in the table.

Configuring a MapR DB JSON Origin

Configure a MapR DB JSON origin to read JSON documents from MapR DB JSON tables.

  1. In the Properties panel, on the General tab, configure the following properties:
    General Property Description
    Name Stage name.
    Description Optional description.
    Stage Library Library version that you want to use.
    On Record Error Error record handling for the stage:
    • Discard - Discards the record.
    • Send to Error - Sends the record to the pipeline for error handling.
    • Stop Pipeline - Stops the pipeline. Not valid for cluster pipelines.
  2. On the MapR DB JSON tab, configure the following properties:
    MapR DB JSON Property Description
    Table Name Name of the MapR DB JSON table to read from. Enter the name of a table.

    If you do not include a path to the table, the stage assumes that the table exists in the user's home directory. For example, /user/<user name>/<table name>.

    You can include a path relative to the user's home directory or an absolute path when you enter the table name. For tables in a default cluster, specify the absolute path as /<table path>. For tables in a specific cluster, specify the absolute path as /mapr/<cluster name>/<table path>.

    Initial Offset Value of the _id field in the JSON document, or the row key of the table, where you want the origin to start reading.

    By default, the origin reads all rows in the JSON table. You can optionally define an initial offset value to determine where the origin starts reading data within the JSON table.