Overwrite Condition

When the Delta Lake destination uses the Overwrite Data write mode, the destination removes all data in the existing table before writing new data, by default. You can define a condition to overwrite only the data within the specified partitions.

Here are some guidelines for the condition:
  • The condition must evaluate to true or false.
  • In the condition, use columns that the existing table is partitioned by.
  • When you define a condition, you typically base it on field values in the record. For information about referencing fields in the condition, see Referencing Fields in Spark SQL Expressions.
  • You can use any Spark SQL syntax that can be used in the WHERE clause of a query, including functions such as isnull or trim and operators such as = or <=.

    You can also use user-defined functions (UDFs), but you must define the UDFs in the pipeline. Use a pipeline preprocessing script to define UDFs.

    For more information about Spark SQL functions, see the Apache Spark SQL Functions documentation.

For example, let's say that you configure the Delta Lake destination to overwrite the existing table located in the /delta/sales directory. The table is partitioned by the order_date column. You only want to overwrite the order data sold in the month of January, so you define the following overwrite condition:
order_date >= '2019-01-01' AND order_date <= '2019-01-31'