JDBC Multitable Consumer

Supported pipeline types:
  • Data Collector

The JDBC Multitable Consumer origin reads database data from multiple tables and multiple schemas through a JDBC connection. For information about supported versions, see Supported Systems and VersionsSupported Systems and Versions in the Data Collector documentation.

Use the origin to read multiple tables from one or more schemas in the same database. For example, you might use the origin to perform database replication.

Data Collector includes database-specific origins, such as the Oracle Bulkload and SQL Server 2019 BDC Multitable Consumer origins. When available, StreamSets recommends using a database-specific origin. Data Collector also provides CDC origins to process changed data, and the JDBC Query Consumer origin to use a custom SQL query for processing.
Important: This stage does not support connecting to non-RDBMS systems, including Hive, Impala, Kudu, or Snowflake. Support for untested systems is not guaranteed. For a list of tested systems, see "Database Vendors and Drivers".

When you configure the origin, you specify connection information and custom JDBC configuration properties to determine how the origin connects to the database. When the source database has high-precision timestamps, such as IBM Db2 TIMESTAMP(9) fields, you can configure the origin to write strings rather than datetime values to maintain the precision.

You define groups of database tables to read. The origin generates SQL queries based on the table configurations that you define, and then returns data as a map with column names and field values.

When you define the table configurations, you can optionally override the default key column and specify the initial offset to use. By default, the origin processes tables incrementally, using primary key columns or user-defined offset columns to track its progress. You can configure the origin to perform non-incremental processing to enable it to also process tables that do not have a key or offset column.

You can configure the origin to perform multithreaded partition processing, multithreaded table processing, or use the default - a mix of both. You also specify the processing batch strategy. When configuring partitions, you can configure the offset size, number of active partitions, and offset conditions.

You can configure advanced properties, such as the initial order to read from tables, connection related properties, and transaction isolation. And you can specify what the origin does when encountering an unsupported data type: convert the data to string or stop the pipeline.

When the pipeline stops, the JDBC Multitable Consumer origin notes where it stops reading. When the pipeline starts again, the origin continues processing from where it stopped by default. You can reset the origin to process all available data, using any initial offsets that you defined.

By default, the origin generates JDBC record header and field attributes that provide additional information about each record and field.

You can configure advanced connection properties. To use a JDBC version older than 4.0, you specify the driver class name and define a health check query.

You can also use a connectionconnection to configure the origin.

The origin can generate events for an event stream. For more information about dataflow triggers and the event framework, see Dataflow Triggers Overview.