MapR DB JSON

The MapR DB JSON destination writes data as JSON documents to MapR DB JSON tables. The destination converts each record into a JSON document and writes the document to the JSON table that you specify. To write text, binary data, or JSON strings to MapR DB binary tables, use the MapR DB destination.

MapR is now HPE Ezmeral Data Fabric. At times, this documentation uses "MapR" to refer to both MapR and HPE Ezmeral Data Fabric. For information about supported versions, see Supported Systems and Versions in the Data Collector documentation.

MapR DB JSON tables are tables in which every row is a JSON document. The JSON documents in a table do not need to have the same structure. For example, a single JSON table can include any number of JSON documents that share only some common fields.

The MapR DB JSON destination can use CRUD operations defined in the sdc.operation.type record header attribute to write data. When CRUD operations are not specified in a record, the destination treats it like an Insert record. For information about Data Collector change data processing and a list of CDC-enabled origins, see Processing Changed Data.

When you configure the MapR DB JSON destination, you specify the table name and whether the destination should create the table if it doesn't exist. You specify the row key for the table. And you configure the Insert API and Set API properties, which can affect how the destination writes data to MapR DB JSON tables.

Before you use any MapR stage in a pipeline, you must perform additional steps to enable Data Collector to process MapR data. For more information, see MapR Prerequisites in the Data Collector documentation.

Row Key

MapR DB uses a row key to uniquely identify each row in a JSON table. The row key is defined by the _id field of the JSON document stored in the row.

When you configure the MapR DB JSON destination, you define a field in the record to use as the row key. The field must contain a unique value. The destination writes the value of the specified field to the _id field in the JSON document. The destination retains the original field in the JSON document.

For example, let's say you define the customer_ID field in the record as the row key. When the destination converts a record with a customer_ID of 034667 to a JSON document, the JSON document includes both an _id field and a customer_ID field with the value of 034667. MapR DB uses the _id field with the value of 034667 in the JSON document as the row key in the JSON table.

If the field defined as the row key doesn't exist in the record, the record is sent to the stage for error handling.

Row Key Data Types

You configure the MapR DB JSON destination to process row keys as string or binary data. If necessary, the MapR DB JSON destination converts the data type of the row key field and then writes the converted value to the _id field in the JSON document.

Note: The destination cannot convert the List, Map, or List-Map data types. As a result, you cannot define a field with these data types as the row key.

The destination processes the field defined as the row key as one of the following data types:

String
When you configure the destination to process the row key as string data, you can assign fields with any data type as a row key, except for fields with a List, Map, or List-Map data type. The origin processes the row key data as String, converting data types as necessary. A Byte Array field is an exception to this rule. The destination processes a Byte Array field defined as a row key as binary data, even if the destination is configured to process the row key as string data.
By default, the MapR DB JSON destination processes the row key as string data.
Binary
When you configure the destination to process the row key as binary data, you can assign fields with the following data types as a row key:
  • Byte Array
  • Date
  • Datetime
  • Integer
  • Long
  • Short
  • String
  • Time
If the field defined as the row key is any other data type, the record is sent to the stage for error handling.
The origin processes the row key data as Byte Array, converting data types as necessary. Date, Datetime, and Time fields are first converted to an epoch time in milliseconds, and then converted to a Byte Array.
To configure the destination to process the row key as binary data, select the Process Row Key as Binary property.

Writing to MapR DB JSON

When the MapR DB JSON destination writes to MapR DB JSON tables, it uses CRUD operations in record header attributes when available. When records do not include CRUD operations, the destination treats them as Insert records.

You can also configure the Insert API and the Set API properties that define how records are treated when they already exist in the destination.

Define the CRUD Operation

You can use CRUD operations to write to MapR DB JSON. To use CRUD operations, define the CRUD operation record header attribute for each record earlier in the pipeline. Records without the attribute defined are treated as Insert.

To use CRUD operations to write records, set the following CRUD operation record header attribute:
sdc.operation.type
When defined, the MapR DB JSON destination uses the CRUD operation in the sdc.operation.type record header attribute when writing to MapR DB JSON tables. The destination supports the following values for the sdc.operation.type attribute:
  • 1 for INSERT
  • 2 for DELETE
  • 3 for UPDATE
If your pipeline has a CRUD-enabled origin that processes changed data, the destination simply reads the operation type from the sdc.operation.type header attribute that the origin generates. If your pipeline has a non-CDC origin, you can use the Expression Evaluator processor or a scripting processor to define the record header attribute. For more information about Data Collector changed data processing and a list of CDC-enabled origins, see Processing Changed Data.

Insert and Set API Properties

The Insert API and Set API properties determine how records are treated when records with the same row key exist in the MapR DB JSON table.
Insert API
Used for Insert records. This includes records where the CRUD operation header attribute is set to Insert, and those where the attribute is not set at all. You can use one of the following MapR APIs:
  • MapR Insert API - The MapR DB JSON destination inserts the record into the MapR DB JSON table when no matching row key exists in the table. When the table has a matching row key, the destination sends the record to error, using the error handling configured for the stage.

  • MapR InsertOrReplace API - The MapR DB JSON destination inserts the record into the MapR DB JSON table when no matching row key exists in the table. When the table has a matching row key, the destination replaces the existing row. This is the default API.

Set API
Used only for Update records. This includes only records where the CRUD operation header attribute is set to Update. You can use one of the following MapR APIs:
  • MapR Set API - The MapR DB JSON performs updates only when the data types of the fields in the record match the corresponding fields in the existing row. When the data types do not match, the destination sends the record to error, using the error handling configured for the stage.
  • MapR SetOrReplace API - The MapR DB JSON updates the existing row regardless of whether the data types in record match those in the existing row. This is the default API.

Configuring a MapR DB JSON Destination

Configure a MapR DB JSON destination to write data as JSON documents to MapR DB JSON tables.

  1. In the Properties panel, on the General tab, configure the following properties:
    General Property Description
    Name Stage name.
    Description Optional description.
    Stage Library Library version that you want to use.
    Required Fields Fields that must include data for the record to be passed into the stage.
    Tip: You might include fields that the stage uses.

    Records that do not include all required fields are processed based on the error handling configured for the pipeline.

    Preconditions Conditions that must evaluate to TRUE to allow a record to enter the stage for processing. Click Add to create additional preconditions.

    Records that do not meet all preconditions are processed based on the error handling configured for the stage.

    On Record Error Error record handling for the stage:
    • Discard - Discards the record.
    • Send to Error - Sends the record to the pipeline for error handling.
    • Stop Pipeline - Stops the pipeline.
  2. On the MapR DB JSON tab, configure the following properties:
    MapR DB JSON Property Description
    Table Name Name of the MapR DB JSON table to write to. Enter one of the following:
    • Name of a table.
    • Expression that evaluates to the name of a table. For example, if the table name is stored in the "tableName" record attribute, enter the following expression:
      ${record:attribute('tableName')}

    If you do not include a path to the table, the stage assumes that the table exists in the user's home directory. For example, /user/<user name>/<table name>.

    You can include a path relative to the user's home directory or an absolute path when you enter the table name. For tables in a default cluster, specify the absolute path as /<table path>. For tables in a specific cluster, specify the absolute path as /mapr/<cluster name>/<table path>.

    Create Table Determines whether the destination creates the table if it doesn't exist.

    When selected, the destination creates the table if it does not exist. When cleared, the destination produces an error when it attempts to write to a table that does not exist.

    Row Key Row key for the table.

    Define which field in the record to use as the row key.

    Process Row Key as Binary Determines whether the destination processes the row key as string or binary data.

    When cleared, the destination converts the row key field to String. When selected, the destination converts the row key field to Byte Array.

    Insert API Determines how the destination inserts data to the MapR DB JSON table:
    • Use MapR InsertOrReplace API - Inserts the record into the table if it has a unique row key. If the destination finds a matching row key in the table, it replaces the row.
    • Use MapR Insert API - Inserts the record into the table if it has a unique row key. If the destination finds a matching row key in the table, it sends the record to error.

    Default is Use MapR InsertOrReplace API.

    Set API Determines how the destination updates data in the MapR DB JSON table:
    • Use MapR SetOrReplace API - Performs an update for all records marked as update, regardless of whether the field data types do not match.
    • Use MapR Set API - Performs an update only when the data types in the record and the corresponding row match. When they do not match, the destination sends the record to error.

    Default is Use MapR SetOrReplace API.