Delta Lake Lookup

The Delta Lake Lookup processor performs a lookup on a Delta Lake table. The processor can return the first matching row, all matching rows, a count of matching rows, or a boolean value that indicates whether a match was found.

Important: The Delta Lake Lookup processor

When you configure the Delta Lake Lookup processor, you specify the path to the lookup table, and you can enable time travel to query older versions of the table. You configure the record field to use and the table column to match against. You also specify the operator to use. You select the information to return, then configure related properties.

When returning one or more records, you specify the columns to return and optionally define a prefix for the resulting field names to prevent adding duplicate fields to the record. You can specify columns to sort by and the sort order. When returning multiple rows, you can specify a maximum number of rows to return.

When returning a count or boolean value, you define a name for the field to contain the results. If the field does not exist, the processor creates it.

You configure the storage system for the table. When using a table stored on Azure Data Lake Storage (ADLS) Gen2, you also specify connection-related details. For a table on Amazon S3 or HDFS, Transformer uses connection information stored in a Hadoop configuration file. You can configure security for connections to Amazon S3.

If the lookup table is static, you can configure the processor to load the table only once, enabling the processor to cache and reuse the data for the duration of the pipeline run.

If not loading only once, and if the processor passes data to multiple stages, you might enable caching to improve pipeline performance.

To access a table stored on ADLS Gen2, complete the necessary prerequisites before you run the pipeline. Also, before you run a local pipeline for a table on ADLS Gen2 or Amazon S3, complete these additional prerequisite tasks.