What is a Pipeline?

A pipeline describes the flow of data from the origin system to destination systems and defines how to transform the data along the way.

You can use a single origin stage to represent the origin system, multiple processor stages to transform data, and multiple destination stages to represent destination systems.

When you develop a pipeline, you can use development stages to provide sample data and generate errors to test error handling. And you can use data preview to determine how stages alter the data through the pipeline.

You can use executor stages to perform event-triggered task execution or to save event information. To process large volumes of data, you can use multithreaded pipelines.

In pipelines that write to Hive or parquet or to PostgreSQL, you can implement a data drift solution that detects drift in incoming data and updates tables in destination systems.