Slowly Changing Dimension Pipeline
A slowly changing dimension pipeline compares change data against master dimension data, then writes the changes to the master dimension data.
A slowly changing dimension pipeline can process a traditional table dimension, where the dimension data is stored in a database table. It can also process a file dimension, where the dimension data is stored in a set of files in a directory.
The simplest slowly changing dimension pipeline looks like this:
A slowly changing dimension pipeline includes the following components:
- Master origin - Reads the master dimension data. Use one of the following
origins:
- Whole Directory - Use to read a file dimension. The dimension files must reside within a single directory, but can include partitions. No non-dimension files should exist in the directory.
- JDBC Table origin - Use to read a table dimension.
- Change origin - Reads change data. Change data can be read by any Transformer origin.
- Slowly Changing Dimension processor - Compares change data against master data and flags change records for insert or update.
- Dimension destination - Writes results to the master
dimension. Use one of the following destinations:
- ADLS Gen1 - Use to write to a file dimension on Azure Data Lake Storage Gen1.
- ADLS Gen2 - Use to write to a file dimension on Azure Data Lake Storage Gen2.
- Amazon S3 - Use to write to a file dimension on Amazon S3.
- File - Use to write to a file dimension on HDFS or a local file system.
- JDBC - Use to write to a database table dimension.