JSON Schema Format
To use JSON to define a custom schema, specify the field names and data types within a root field that uses the Struct data type.
Tip: Data types must be in lowercase letters. Also, the
nullable
attribute is required for most fields, but allows nulls regardless of the configuration due
to an unresolved Spark issue.Here's an example of the basic structure:
{
"type": "struct",
"fields": [
{
"name": "<first field name>",
"type": "<data type>",
"nullable": <true|false>
},
{
"name": "<second field name>",
"type": "<data type>",
"nullable": <true|false>
}
]
}
To define a List field, use the Array data type and specify the data types of the subfields
as
follows:
{
"name": "<list field name>",
"type": {
"type": "array",
"elementType": "<subfield data type>",
"containsNull": <true|false>
}
To define a Map field, use the Struct type, then define the subfields as
follows:
{
"name": "<map field name>",
"type": {
"type": "struct",
"fields": [ {
"name": "<first subfield name>",
"type": "<data type>",
"nullable": <true|false>
}, {
"name": "<second subfield name>",
"type": "<data type>",
"nullable": <true|false>
} ] },
"nullable": <true|false>
}
Example
The following JSON custom schema includes, in order, a String, Boolean, Map, and List
field:
{
"type": "struct",
"fields": [
{
"name": "TransactionID",
"type": "string",
"nullable": false
},
{
"name": "Verified",
"type": "boolean",
"nullable":false
},
{
"name": "User",
"type": {
"type": "struct",
"fields": [ {
"name": "ID",
"type": "long",
"nullable": true
}, {
"name": "Name",
"type": "string",
"nullable": true
} ] },
"nullable": true
},
{
"name": "Items",
"type": {
"type": "array",
"elementType": "string",
"containsNull": true},
"nullable":true
}
]
}
When processing the following JSON
data:
{"Verified":true, "Items":["T-35089", "M-00352", "Q-11044"], "TransactionID":"G-23525-3350", "User":{"ID":23005,"Name":"Marnee Gehosephat"}}
The origin generates the following record:
Notice the User
Map field with the Integer and String subfields and the
Items
List field with String subfields. Also note that the order of the
fields now match the order in the custom schema.