JDBC Query Consumer

Data Collector

The JDBC Query Consumer origin reads database data using a user-defined SQL query through a JDBC connection. The origin returns data as a map with column names and field values. For information about supported versions, see Supported Systems and VersionsSupported Systems and Versions in the Data Collector documentation.

Data Collector includes database-specific origins, such as the Oracle Bulkload origin. When available, StreamSets recommends using a database-specific origin. Data Collector also provides CDC origins to process changed data and the JDBC Multitable Consumer origin to perform database replication or to read from multiple tables in the same database.

The ability to process Microsoft SQL Server CDC data is deprecated in this origin and will be removed in a future release. To process data from Microsoft SQL Server CDC tables, use the SQL Server CDC Client origin. To process data from Microsoft SQL Server change tracking tables, use the SQL Server Change Tracking origin.

Important: This stage does not support connecting to non-RDBMS systems, including Hive, Impala, Kudu, or Snowflake. Support for untested systems is not guaranteed. For a list of tested systems, see "Database Vendors and Drivers".

When you configure the JDBC Query Consumer origin, you define the SQL query that the origin uses to read data from a single table or from a join of tables.

When you configure JDBC Query Consumer, you specify connection information, query interval, and custom JDBC configuration properties to determine how the origin connects to the database. You configure the query mode and SQL query to define the data returned by the database. When in full query mode and reading from certain databases, you can use a stored procedure instead of a SQL query. When the source database has high-precision timestamps, such as IBM Db2 TIMESTAMP(9) fields, you can configure the origin to write strings rather than datetime values to maintain the precision.

You can configure JDBC Query Consumer to perform change data capture for databases that store the information in a table. And you can specify what the origin does when encountering an unsupported data type.

You can specify custom properties that your driver requires. You can configure advanced connection properties. To use a JDBC version older than 4.0, you specify the driver class name and define a health check query.

By default, the origin generates JDBC record header and field attributes that provide additional information about each record and field.

You can also use a connection connection connection to configure the origin.

The origin can generate events for an event stream. For more information about dataflow triggers and the event framework, see Dataflow Triggers Overview.