When you configure a pipeline, you define how you want data to be treated: Do you want
to prevent the loss of data or the duplication of data?
The Delivery Guarantee pipeline property offers the
following choices:
- At least once
- Ensures that the pipeline processes all data.
- If a failure causes Data Collector to
stop while processing a batch of data, when it restarts, it reprocesses the batch. This
option ensures that no data is lost.
- With this option, Data Collector
commits the offset after receiving write confirmation from destination systems. If a
failure occurs after Data Collector
passes data to destination systems but before receiving confirmation and committing the
offset, up to one batch data might be duplicated in destination systems.
- At most once
- Ensures that data is not processed more than once.
- If a failure causes Data Collector to
stop while processing a batch of data, when it starts up, it begins processing with the
next batch of data. This option avoids the duplication of data in destinations due to
reprocessing.
- With this option, Data Collector
commits the offset after a write without waiting for confirmation from destination
systems. If a failure occurs after Data Collector
passes data to destinations and commits the offset, up to one batch of data might not
get written to the destination systems.