Delivery Guarantee
Transformer's offset handling ensures that, in times of sudden failures, a Transformer pipeline does not lose data - it processes data at least once. If a sudden failure occurs at a particular time, up to one batch of data may be reprocessed. This is an at-least-once delivery guarantee.
When a pipeline comes to a graceful stop, Transformer processes the data exactly once. A graceful stop occurs, for example, when a pipeline runs in batch mode and completes all processing, when a pipeline stops with errors, or when you manually stop the pipeline and allow Transformer to transition the pipeline to a stopped pipeline status.
The at-least-once delivery guarantee applies to failures that cause the pipeline to stop abruptly. This includes force-stopping a pipeline or shutting down the Transformer machine without first stopping Transformer.
When you restart a pipeline after a sudden failure, and if the pipeline includes an origin that stores offsets, Transformer starts processing from the last-saved offset. Transformer commits offsets after receiving a write confirmation from destination systems. If pipeline origins do not store offsets, then all data is reprocessed as expected.
- While writing a batch of data
- When a sudden failure occurs as Transformer writes data to destination systems, Transformer reprocesses data. When the pipeline starts again, Transformer reprocesses the batch of data that was being written because it never completed the write, received confirmation, or stored the offset for the batch. As a result, the last-saved offset indicates that the batch was not processed.
- After the write, before receiving the write confirmation
- When a sudden failure occurs after Transformer writes data to destination systems but before receiving write-confirmation and committing the offset, Transformer reprocesses data. When the pipeline starts again, Transformer begins processing with the batch that was already written because the offset was not yet saved.
- All other times
- If a sudden failure occurs at any other time - like while reading or processing a batch - Transformer provides exactly-once behavior. In these cases, the batch in flight hasn't been written to the destination systems. When the pipeline starts again, Transformer reprocesses the batch, which is then written to the destination systems exactly once.