JSON Parser

The JSON Parser processor parses JSON data embedded in a column and passes the extracted data to new columns in the output row. The processor passes the JSON data in the original column unchanged.

When you configure the JSON Parser processor, you specify the column that contains the JSON data. You determine whether the processor extracts all elements in the JSON data, or extracts specific elements. When you configure the processor to extract specific elements, you can optionally flatten arrays in the JSON data before extracting elements. When flattening an array, the processor produces multiple output rows from a single input row, one output row for each child element within the array.

For information about how Snowflake handles JSON data, see the Snowflake documentation.

Extracted Column Names

The JSON Parser processor uses the original element names for the extracted column names. You can optionally add a prefix to all extracted column names to avoid duplicate names.

When the processor extracts child elements from objects or arrays in the JSON data, the processor uses the following naming conventions to determine the name of each extracted column:

Column names from objects

When extracting child elements from an object in the JSON data, the processor uses the following column naming convention:

<object name>_<column name>

For example, if extracting elements from the following user object, the processor names the extracted columns user_ID and user_name:

{
  "user":
     {
       "ID":23005,
       "name":"Marnee Gehosephat"
     }
}

Column names from arrays

When extracting child elements from an array in the JSON data, the processor uses the following column naming convention:

<array name>_<position>

For example, if extracting elements from the following items array, the processor names the extracted columns items_0, items_1, and items_2:

{
  "items":
     [
       "T-35089", 
       "M-00352", 
       "Q-11044"
     ]
}

Extract All Elements

By default, the JSON Parser processor extracts all elements from the JSON data, recursively extracting all child elements inside objects or arrays and extracting all columns as Variant columns.

For example, you configure the processor to extract all elements from a transaction column and you leave the defaults to recursively extract elements and to extract all columns as Variant. Let's say that the transaction column contains the following JSON data:

{
  "items":
     [
       "T-35089", 
       "M-00352", 
       "Q-11044"
     ], 
  "transactionID":235253350, 
  "user":
     {
       "ID":23005,
       "name":"Marnee Gehosephat"
     }
}

The processor passes the following columns to the output row:


transaction	items_0	items_1	items_2	transactionID	user_ID	user_name
`{"items":["T-35089", "M-00352", "Q-11044"], "transactionID":235253350, "user":{"ID":23005,"name":"Marnee Gehosephat"}}`	`"T-35089"`	`"M-00352"`	`"Q-11044"`	`235253350`	`23005`	`"Marnee Gehosephat"`

Extract Objects and Arrays with JSON Data

When extracting all elements and you want to extract objects and arrays as single columns containing JSON data, clear the Recursive property when you configure the processor.

For example, if you configure the processor to extract the same data from the transaction column, but clear the Recursive property, the processor passes the following columns to the output row, extracting the original items and user elements as single columns containing JSON data:


transaction	items	transactionID	user
`{"items":["T-35089", "M-00352", "Q-11044"], "transactionID":235253350, "user":{"ID":23005,"name":"Marnee Gehosephat"}}`	`["T-35089", "M-00352", "Q-11044"]`	`235253350`	`{"ID":23005,"name":"Marnee Gehosephat"}`

Infer the Data Type

When extracting all elements, you can configure the processor to infer the data type based on the column value, rather than extracting all columns as Variant columns.

For example, if you configure the processor to extract the same data from the transaction column, but select the Infer Data Type property, the processor infers that the transactionID column contains a number value, and extracts the column as a Number.

Extract Specific Elements

You can configure the JSON Parser processor to extract specific elements from the JSON data.

To extract specific elements, clear the Extract All property and then add each element that you want to extract. Define the path to the element depending on the element type:

Name/value pair

To extract a name/value pair from JSON data, enter the column name.

For example, to extract only the first name from the following JSON data, enter firstName as the path to the element:

{"firstName":"John", "empid":45611}

Object

To extract a child element from an object, enter the path using the following format:

<object name>.<column name>

For example, to extract only the user ID from the following JSON data, enter user.ID as the path to the element:

{
   "user":
     {
       "ID":23005,
       "name":"Marnee Gehosephat"
     }
}

Array

To extract a child element from an array, enter the path using the following format:

<array name>[<position>]

For example, to extract only the second item in the array from the following JSON data, enter items[1] as the path to the element:

{
  "items":
     [
       "T-35089", 
       "M-00352", 
       "Q-11044"
     ]
}

To extract all elements in an array, use the asterisk (*) wildcard for the position. For example, to extract all items in the array from the above JSON data, enter items[*] as the path.

When an array includes child elements with a list of name/value pairs, you can flatten the array before extracting elements.

For each element that you add, you can optionally select the data type to convert the column to. If you do not select a data type, the processor extracts all columns as Variant columns.

Flatten Arrays Before Extracting

By default when extracting specific elements and when an array includes child elements with a list of name/value pairs, the JSON Parser processor extracts each child element as a column containing a list of name/value pairs.

To extract each pair as a new column, configure the processor to flatten the array into a relational representation before extracting elements. When flattening an array, the processor produces multiple output rows from a single input row, one output row for each child element within the array. Each row includes all extracted columns along with one of the flattened child elements of the array.

To flatten an array, configure the processor as follows:

On the JSON tab, select the Flatten by Expression property.
Specify the array to flatten in the Flatten By property.
In the Elements to Extract property, add at least one element from the flattened array and select Extract from Value Column when Flattening for that element.
As needed, add additional elements included in the JSON data but outside of the flattened array to pass those elements as columns to the output row. For these elements, clear Extract from Value Column when Flattening.

For more information about how Snowflake flattens JSON data, see the Snowflake documentation.

Example: Default Array Processing

The information column contains the following JSON data with a contact.business array that includes child elements with a list of name/value pairs:

{ 
 name:  
   { first: "John", last: "Smith"},
     contact:
       { business:
         [
           { type: "phone", content:"555-1234" },
           { type: "email", content:"j.smith@company.com" } 
         ] 
        } 
}

To extract the array without flattening it, you configure the processor to extract specific elements from the information column, and enter contact.business[*] as the element to extract, as follows:

With this configuration, the processor generates the following output row, where each element in the contact.business array is extracted as a column containing a list of name/value pairs:


information	contact_business_0	contact_business_1
`{ name: { first: "John", last: "Smith"}, contact: { business:[ { type: "phone", content:"555-1234" }, { type: "email", content:"j.smith@company.com" } ] } }`	`{"content":"555-1234","type":"phone"}`	`{"content":"j.smith@company.com","type":"email"}`

Example: Flattening an Array

Instead of using the default processing, you decide to flatten the contact.business array when you extract specific elements from the information column. You also configure the processor to extract the type and content elements from the flattened array, and to extract the first and last name elements that are outside of the array, as follows:

With this configuration, the processor generates the following two output rows, one output row for each child element in the contact.business array:


information	type	content	name_first	name_last
`{ name: { first: "John", last: "Smith"}, contact: { business:[ { type: "phone", content:"555-1234" }, { type: "email", content:"j.smith@company.com" } ] } }`	`"phone"`	`"555-1234"`	`"John"`	`"Smith"`
`{ name: { first: "John", last: "Smith"}, contact: { business:[ { type: "phone", content:"555-1234" }, { type: "email", content:"j.smith@company.com" } ] } }`	`"email"`	`"j.smith@company.com"`	`"John"`	`"Smith"`

Configuring a JSON Parser Processor

Configure a JSON Parser processor to parse JSON data embedded in a column and pass the extracted data to new columns in the output row.

On the General tab, configure the following properties:

General Property Description

Name Stage name.

Description Optional description.

Cache Data Caches processed data.

General Property	Description
Name	Stage name.
Description	Optional description.
Cache Data	Caches processed data.

On the JSON tab, configure the following properties:


JSON Property	Description
Column to Parse	Column that contains the JSON data. You can enter the column name or select a column from preview data.
Prefix	Optional prefix to add to extracted column names. Use to avoid duplicate column names.
Extract All	Extracts all elements from the JSON data.
Recursive	Extracts all child elements from objects or arrays in the JSON data, creating a new column for each child element. When cleared, extracts objects and arrays as single columns containing JSON data. Available when extracting all elements.
Infer Data Types	Infers the data type of each extracted column based on the column value. When cleared, the processor extracts all columns as Variant columns. Available when extracting all elements.
Flatten By Expression	Flattens the data in an array into a relational representation before extracting elements. Flatten an array when you want to extract each element in the array, creating a new column for each child element. Important: You must add at least one element from the flattened array to the Elements to Extract property. Available when extracting specific elements.
Flatten By	Array from the JSON data that you want to flatten. Available when flattening an array.
Elements to Extract	Specific elements to extract from the JSON data. Define the path to the element. Optionally, in the Target Type column, select the data type to convert the column to. If no data type is selected, the processor extracts all columns as Variant columns. When adding an element from a flattened array, select Extract from Value Column when Flattening. When adding an element outside of the flattened array, clear the property. Using simple or bulk edit mode, click Add Another to define another element to extract. Available when extracting specific elements.