Repartition by Number

When repartitioning by number, the processor creates the specified number of partitions, and then attempts to redistribute the data evenly across the partitions, shuffling data as necessary.

You can increase or decrease the number of partitions when you repartition by number. You can also use the same number of partitions to redistribute the data to reduce skew. Note that when decreasing the number of partitions, the Coalesce method can be more efficient.

You specify how the partitions are created:
  • Number of Partitions - The processor creates the specified number of partitions and then randomly redistributes the data across the partitions.
  • Max Records per Partition - The processor performs a record count to determine how many partitions are needed and creates the partitions. Then, it redistributes the data across the partitions, honoring the maximum record requirement.