Coalesce by Number
When coalescing by number, the processor creates the specified number of partitions and then randomly redistributes the data in an attempt to balance the data across the partitions. Use only to decrease the number of partitions.
Instead of shuffling all of the data, coalescing rebalances data between partitions only when necessary. This can result in better pipeline performance.
You specify how the partitions are created:
- Number of Partitions - The processor creates the specified number of partitions and then randomly redistributes the data to balance the data across the partitions.
- Max Records per Partition - The processor performs a count to determine how many partitions are needed and creates the partitions. Then, it redistributes the data evenly across the partitions, honoring the maximum record requirement.