JDBC Query

The JDBC Query origin reads data from a database based on the specified query.

Use the JDBC Query origin in batch pipelines only. In a batch pipeline, the origin reads all available data with the specified query, then stops the pipeline. The origin does not support streaming execution mode and does not save offsets or support partitioning.

Use this origin to read from a database when you require a specific query. To read from a partitioned table or to read in streaming execution mode, use the JDBC Table origin.

Note: To read from most database vendors, the origin requires that Apache Spark version 2.4.0 or later is installed on the Transformer machine and on each node in the cluster. To read from Oracle databases, the JDBC Query origin requires Spark version 2.4.4 or later.

When you configure the JDBC Query origin, you specify the database connection information and any additional JDBC configuration properties you want to use. You can also use a connection connection connection to configure the origin.

You specify the SQL query to use for the read. You can specify a separate query to use when previewing data for pipeline development and testing. You can define a fetch size that is used by both queries.

You can configure the origin to cache the data for reuse throughout the pipeline run. You can also specify the JDBC driver to include with the pipeline.

Before using the JDBC Query origin, verify if you need to install a JDBC driver.