JSON Schema Format

To use JSON to define a custom schema, specify the field names and data types within a root field that uses the Struct data type.

Tip: Data types must be in lowercase letters. Also, the nullable attribute is required for most fields, but allows nulls regardless of the configuration due to an unresolved Spark issue.

Here's an example of the basic structure:

{
  "type": "struct",
  "fields": [
    {
      "name": "<first field name>",
      "type": "<data type>",
      "nullable": <true|false>
    },
    {
      "name": "<second field name>",
      "type": "<data type>",
      "nullable": <true|false>
    }
  ]
}
To define a List field, use the Array data type and specify the data types of the subfields as follows:
{
  "name": "<list field name>",
  "type": {
     "type": "array",
     "elementType": "<subfield data type>",
     "containsNull": <true|false>
}
To define a Map field, use the Struct type, then define the subfields as follows:
{
  "name": "<map field name>",
  "type": {
    "type": "struct",
    "fields": [ {
      "name": "<first subfield name>",
      "type": "<data type>",
      "nullable": <true|false>
       }, {
        "name": "<second subfield name>",
        "type": "<data type>",
        "nullable": <true|false>
        } ] },
  "nullable": <true|false>
}

Example

The following JSON custom schema includes, in order, a String, Boolean, Map, and List field:
{
  "type": "struct",
  "fields": [
    {
      "name": "TransactionID",
      "type": "string",
      "nullable": false
    },
    {
      "name": "Verified",
      "type": "boolean",
      "nullable":false
    },
     {
    "name": "User",
    "type": {
      "type": "struct",
      "fields": [ {
        "name": "ID",
        "type": "long",
        "nullable": true
      }, {
        "name": "Name",
        "type": "string",
        "nullable": true
      } ] },
    "nullable": true
    },
    {
      "name": "Items",
      "type": {
        "type": "array",
        "elementType": "string",
        "containsNull": true},
        "nullable":true
     }
   ]
}
When processing the following JSON data:
{"Verified":true, "Items":["T-35089", "M-00352", "Q-11044"], "TransactionID":"G-23525-3350", "User":{"ID":23005,"Name":"Marnee Gehosephat"}}

The origin generates the following record:

Notice the User Map field with the Integer and String subfields and the Items List field with String subfields. Also note that the order of the fields now match the order in the custom schema.