SQL Server CDC Client

The SQL Server CDC Client origin processes data in Microsoft SQL Server change data capture (CDC) tables. The origin fetches changes in time windows and uses multiple threads to enable parallel processing of data. For information about supported versions, see Supported Systems and VersionsSupported Systems and Versions in the Data Collector documentation.

Use the SQL Server CDC Client origin to generate records from CDC tables. To read data from Microsoft SQL Server change tracking tables, use the SQL Server Change Tracking origin. For more information about the differences between CDC and change tracking data, see the Microsoft SQL Server documentation. To read data from SQL Server temporal tables, use the JDBC Multitable Consumer origin or the JDBC Query Consumer origin. For more information about temporal tables, see the Microsoft documentation.

By default, the SQL Server CDC Client origin generates a record for each record in the CDC table. You can configure the origin to combine the two update records that SQL Server generates for each update. This changes the structure of the generated records and produces additional record header attributes.

The SQL Server CDC Client origin includes the CRUD operation type in a record header attribute so generated records can be easily processed by CRUD-enabled destinations. For an overview of Data Collector changed data processing and a list of CRUD-enabled destinations, see Processing Changed Data.

You might use this origin to perform database replication. You can use a separate pipeline with the JDBC Query Consumer or JDBC Multitable Consumer origin to read existing data. Then start a pipeline with the SQL Server CDC Client origin to process subsequent changes.

When you configure the origin, you specify the SQL Server capture instance names - the origin processes the related CDC tables. You can define groups of tables in the same database and any initial offsets to use. When you omit initial offsets, the origin processes all available data in the CDC tables.

You can enable late table processing to allow the origin to process tables that appear after the starts. You can also configure the origin to check for schema changes in processed tables and to generate an event after discovering a change.

To determine how the origin connects to the database, you specify connection information, a query interval, number of retries, and any custom JDBC configuration properties that you need. You can configure advanced connection properties. You can also use a connectionconnectionconnection to configure the origin.

The origin can generate events for an event stream. For more information about dataflow triggers and the event framework, see Dataflow Triggers Overview.

When a stops, the SQL Server CDC Client origin notes where it stops reading. When the starts again, the origin continues processing from where it stopped by default. You can reset the originreset the originreset the origin to process all requested data.