Partitioning

When you start a pipeline, StreamSets Transformer launches a Spark application. Spark runs the application just as it runs any other application, splitting the pipeline data into partitions and performing operations on the partitions in parallel.

Spark automatically handles the partitioning of pipeline data for you. However, at times you might need to change the size and number of partitions.