Slowly Changing Dimension Pipeline

A slowly changing dimension pipeline compares change data against master dimension data, then writes the changes to the master dimension data.

A slowly changing dimension pipeline can process a traditional table dimension, where the dimension data is stored in a database table. It can also process a file dimension, where the dimension data is stored in a set of files in a directory.

The simplest slowly changing dimension pipeline looks like this:

A slowly changing dimension pipeline includes the following components:
  • Master origin - Reads the master dimension data. Use one of the following origins:
    • Whole Directory - Use to read a file dimension. The dimension files must reside within a single directory, but can include partitions. No non-dimension files should exist in the directory.
    • JDBC Table origin - Use to read a table dimension.
  • Change origin - Reads change data. Change data can be read by any Transformer origin.
  • Slowly Changing Dimension processor - Compares change data against master data and flags change records for insert or update.
  • Dimension destination - Writes results to the master dimension. Use one of the following destinations:
    • ADLS Gen1 - Use to write to a file dimension on Azure Data Lake Storage Gen1.
    • ADLS Gen2 - Use to write to a file dimension on Azure Data Lake Storage Gen2.
    • Amazon S3 - Use to write to a file dimension on Amazon S3.
    • File - Use to write to a file dimension on HDFS or a local file system.
    • JDBC - Use to write to a database table dimension.