Performing Lookups

Transformer provides several system-related lookup processors, such as the Delta Lake Lookup processor and the Snowflake Lookup processor. You can also use the JDBC Lookup processor to perform a lookup on a database table.

To look up data from other systems, such as Amazon S3 or a file directory, you can use an additional origin in the pipeline to read the lookup data. Then, you use a Join processor to join the lookup data with the primary pipeline data.

In the Join processor, in most cases, you use either the right outer or left outer join type in the Join processor. The type to use depends on how you join the data in the Join processor. If the primary data in the pipeline is the left input stream of the Join processor, use the left outer join type to return all of the primary data with the additional lookup data added to those records. If the primary data is the right input stream of the Join processor, use the right outer join type.

When necessary, you can join lookup data to multiple streams of data. You simply need to use a separate Join processor to join the lookup origin to each stream.

You configure the lookup origin differently depending on the execution mode specified for the pipeline:
Batch pipeline execution
In a batch pipeline, the primary origin reads all of the primary pipeline data in one batch, and the lookup origin reads all of the lookup data in one batch. Then, the Join processor joins the two data sets.
Since a batch pipeline joins only one set of batches, the lookup origin does not require any particular properties to be set.
Streaming pipeline execution
In a streaming pipeline, the primary origin processes multiple batches of primary pipeline data. To merge these batches of data with the lookup data, you must enable the Load Lookup Data Only Once property in the lookup origin.
With this property enabled, the lookup origin reads one batch of data and caches it for reuse. Then, each time the primary origin passes a new batch to the Join processor, the processor joins the batch with the cached lookup data.
Important: Some origins provide properties that limit the size of each batch. When configuring a lookup origin, do not limit the batch size. All lookup data should be read in a single batch.