Referencing Fields in Spark SQL Expressions
To reference a first-level field in a record in a Spark SQL expression, you simply specify the field name. Transformer does not perform the case-sensitive evaluation of field names within a pipeline.
For example, to deduplicate data based on an ID
field, you configure a
Deduplicate processor to deduplicate based on fields. Then, you can specify
ID
, Id
, iD
, or
id
as the field to use.
To reference a field within a Map field, use dot notation (
.
) to specify
the path to the field, as
follows:<top level>.<next level>.<next level>.<field to use>
For example, customer.transactions.2019
.
To reference an item in a List field, use bracket notation ([#]
) to
indicate the position in a list. Use 0 to indicate the first item in the list, 1 to
indicate the second, and so on.
For example, to reference the second item in an
appt_date
List field,
enter appt_date[1]
. Tip: After running
preview for a pipeline, you can also copy a field path from the preview results or
when you view the input and output schema for a stageinput and output schema for a
stage.