Overwrite Partition Requirement

When writing to partitioned data, the Amazon S3 destination can overwrite objects within affected partitions rather than overwriting the entire data set. For example, if output data includes only data within a 03-2019 partition, then the destination can overwrite the objects in the 03-2019 partition and leave all other partitions untouched.

To overwrite partitioned data, Spark must be configured to allow overwriting data within a partition. When writing to unpartitioned data, no action is needed.

To enable overwriting partitions, set the spark.sql.sources.partitionOverwriteMode Spark configuration property to dynamic.

You can configure the property in Spark, or you can configure the property in individual pipelines. Configure the property in Spark when you want to enable overwriting partitions for all Transformer pipelines.

To enable overwriting partitions for an individual pipeline, add an extra Spark configuration property on the Cluster tab of the pipeline properties.