Kudu
The origin can only be used in a batch pipeline and does not track offsets. As a result, each time the pipeline runs, the origin reads all available data. The origin can read all of the columns from a table or only the specified columns from a table.
When you configure the Kudu origin, you specify the connection information for one or more Kudu masters. You configure the table to read, and optionally define the columns to read from the table. When needed, you can specify a maximum batch size for the origin.
You can also use a connectionconnection to configure the origin.
You can configure the origin to load data only once and cache the data for reuse throughout the pipeline run. Or, you can configure the origin to cache each batch of data so the data can be passed to multiple downstream batches efficiently.