Execution Mode

Transformer pipelines can run in batch or streaming mode.

You select the execution mode when you create a pipeline:
Batch
A batch pipeline processes a single batch, and then stops. By default, batch pipelines process all available data in the batch. You can, however, configure a maximum batch size in each origin to limit the amount of data processed in the batch.
When the pipeline stops, Transformer saves the offset for each origin. The offset is the location where an origin stops reading. If you restart the pipeline, origins start reading from the saved offsets.
For a detailed example of a batch pipeline, see Batch Case Study.
Streaming
A streaming pipeline runs continuously until you manually stop it, maintaining connections to origin systems and processing data at regular intervals. Use a streaming pipeline when you expect data to continuously arrive in origin systems.
When you start a streaming pipeline, origins create an initial batch, based on the configured maximum batch size. When creating a batch, an origin notes the offset. An offset is the location where an origin stops reading.
After destinations write the batch to destination systems, origins wait a user-defined interval, then create a new batch, starting from the last saved offsets.
When you manually stop the pipeline, Transformer saves the offset for each origin. If you restart the pipeline, origins start reading from the offsets.
After processing existing data, streaming pipelines typically process small batches that contain continuously arriving data. When you want to perform processing such as aggregation, deduplication, or joins, on larger batches, you can use a Window processor to create larger batches.
For a detailed example of a streaming pipeline, see Streaming Case Study.