Referencing Fields in Spark SQL Expressions

To reference a first-level field in a record in a Spark SQL expression, you simply specify the field name. Transformer does not perform the case-sensitive evaluation of field names within a pipeline.

For example, to deduplicate data based on an ID field, you configure a Deduplicate processor to deduplicate based on fields. Then, you can specify ID, Id, iD, or id as the field to use.

To reference a field within a Map field, use dot notation (.) to specify the path to the field, as follows:

<top level>.<next level>.<next level>.<field to use>

For example, customer.transactions.2019.

To reference an item in a List field, use bracket notation ([#]) to indicate the position in a list. Use 0 to indicate the first item in the list, 1 to indicate the second, and so on.

For example, to reference the second item in an appt_date List field, enter appt_date[1].

Tip: After running preview for a pipeline, you can also copy a field path from the preview results or when you view the input and output schema for a stage input and output schema for a stage.