JDBC Table

The JDBC Table origin reads data from a database table. Use the JDBC Table origin to process data from a database that is not natively supported.

Transformer provides database-specific origins, such as Google Big Query, Oracle JDBC Table, and Snowflake. When possible, StreamSets recommends using available database-specific origins. To read from one or more tables using a custom query, use the JDBC Query origin.

The origin can read all the columns from a table or only specified columns from a table. In each batch, the origin reads a specified number of rows, distributing those rows uniformly across the specified partitions. When reading the last row, the origin saves the value from a specified offset column. In the subsequent batch, the origin uses the offset to locate the last row read and starts reading from the next row.

When you configure the JDBC Table origin, you specify the database connection information and any additional JDBC configuration properties you want to use. You configure the table to read and optionally specify the columns to read from the table. You can specify an additional predicate to include any conditions that you would place in a WHERE clause.

You can also use a connectionconnection to configure the origin.

You define the offset column, the maximum number of rows to include in each batch, and the number of partitions used to read from the database table. You can optionally configure advanced properties related to the JDBC driver.

You can configure the origin to load data only once and cache the data for reuse throughout the pipeline run. Or, you can configure the origin to cache each batch of data so the data can be passed to multiple downstream batches efficiently. You can also configure the origin to skip tracking offsets, which enables reading the entire data set each time you start the pipeline.

Before using the JDBC Table origin, verify if you need to install a JDBC driver.