Deduplicate

The Deduplicate processor removes duplicate rows from incoming data.

By default, the processor evaluates entire rows for duplicates, removing a row when all of the column names and values match those of another row. You can configure the processor to assess specific columns instead of the entire row. When evaluating specific columns, you can specify the evaluation behavior.

For example, to remove rows when a customer accidentally submits the same online order twice, you might configure the processor to evaluate the critical details of the order, such as the customer name, shipping address, payment details, and ordered items, while excluding the order ID or timestamp columns.

The Deduplicate processor is case sensitive, but is not concerned with column order.

When you configure the Deduplicate processor, you specify whether to evaluate the entire row or specified columns. When evaluating specified columns, you list the columns to use. Then, you specify the evaluation behavior to use.