Performing Lookups
Transformer provides several system-related lookup processors, such as the Delta Lake Lookup processor and the Snowflake Lookup processor. You can also use the JDBC Lookup processor to perform a lookup on a database table.
To look up data from other systems, such as Amazon S3 or a file directory, you can use an additional origin in the pipeline to read the lookup data. Then, you use a Join processor to join the lookup data with the primary pipeline data.
In the Join processor, in most cases, you use either the right outer or left outer join type in the Join processor. The type to use depends on how you join the data in the Join processor. If the primary data in the pipeline is the left input stream of the Join processor, use the left outer join type to return all of the primary data with the additional lookup data added to those records. If the primary data is the right input stream of the Join processor, use the right outer join type.
When necessary, you can join lookup data to multiple streams of data. You simply need to use a separate Join processor to join the lookup origin to each stream.
- Batch pipeline execution
- In a batch pipeline, the primary origin reads all of the primary pipeline data in one batch, and the lookup origin reads all of the lookup data in one batch. Then, the Join processor joins the two data sets.
- Streaming pipeline execution
- In a streaming pipeline, the primary origin processes multiple batches of primary pipeline data. To merge these batches of data with the lookup data, you must enable the Load Lookup Data Only Once property in the lookup origin.