Execution Mode

Transformer pipelines can run in batch or streaming mode. You select the execution mode when you create a pipeline:
Batch
A batch pipeline processes available data, and then stops. By default, batch pipelines process all available data. For some origins, you can configure a maximum batch size to limit the amount of data processed. Most pipelines process one batch. However, pipelines that contain an origin configured to read multiple tables process one batch for each table specified in the origin.
When the pipeline stops, Transformer saves the offset for each origin. The offset is the location where an origin stops reading. If you restart the pipeline, origins start reading from the saved offsets by default.
For a detailed example of a batch pipeline, see Batch Case Study.
Streaming
A streaming pipeline runs continuously until manually stopped. While running, the pipeline maintains connections to origin systems and processes data at regular intervals. Use a streaming pipeline when you expect data to continuously arrive in origin systems.
When you start a streaming pipeline, origins read an initial set of data. Origins with a configured maximum batch size read a limited amount of data. Origins without a maximum batch size read all available data. By default, origins start reading from the offsets where they last stopped reading. An origin configured to read multiple tables only reads data from one of those tables.
After destinations write the batch to destination systems, most pipelines wait a user-defined interval. However, pipelines that contain an origin configured to read multiple tables immediately start processing another batch. The multitable origin reads from the next table and any other origins read data as configured. The pipeline continues to process batches until the multitable origin has read from all of the specified tables. Then the pipeline waits a user-defined interval.
After the pipeline waits the user-defined interval, the pipeline starts processing another batch. All origins in the pipeline read data as configured, typically starting from the last saved offsets.
When you manually stop the pipeline, Transformer saves the offsets for each origin. If you restart the pipeline, origins start reading from the offsets by default.
After processing existing data, streaming pipelines typically process small batches that contain continuously arriving data. When you want to perform processing such as aggregation, deduplication, or joins, on larger batches, you can use a Window processor to create larger batches.
For a detailed example of a streaming pipeline, see Streaming Case Study.