Oracle CDC Client

The Oracle CDC Client origin processes change data capture (CDC) information provided by Oracle LogMiner redo logs. This is the older Oracle change data capture origin. For new development, use the Oracle CDC origin. For information about supported versions, see Supported Systems and Versions.

You might use this origin to perform database replication. You can use a separate pipeline with the JDBC Query Consumer or JDBC Multitable Consumer origin to read existing data. Then start a pipeline with the Oracle CDC Client origin to process subsequent changes.

Oracle CDC Client processes data based on the commit number, in ascending order.

To read the redo logs, Oracle CDC Client requires the LogMiner dictionary. The origin can use the dictionary in redo logs or in an online catalog. When using the dictionary in redo logs, the origin captures and adjusts to schema changes. The origin can also generate events when using redo log dictionaries.
Note: To use Oracle CDC Client, you must enable LogMiner for the database that you want to use and complete the necessary prerequisite tasks. The origin uses JDBC to access the database.

The origin includes CDC and CRUD information in record header attributes so generated records can be easily processed by CRUD-enabled destinations. For an overview of Data Collector changed data processing and a list of CRUD-enabled destinations, see Processing Changed Data.

When you configure Oracle CDC Client, you configure change data capture details, such as the schema and tables to read from, how to read the initial change, the dictionary source location, and the operations to include. You also specify the transaction window and LogMiner session windows to use.

You can configure the origin to buffer records locally or to use database buffers. Before using local buffers, verify that the required resources are available and specify the action to take for uncommitted transactions.

You can specify the behavior when the origin encounters an unsupported data type, and you can configure the origin to pass null values when it receives them from supplemental logging data. When the source database has high-precision timestamps, you can configure the origin to write string values rather than datetime values to maintain the precision.

You can specify JDBC connection information and user credentials. You can also use a connection to configure the origin. If the schema was created in a pluggable database, state the pluggable database name. You can configure custom properties that the driver requires.

You can configure advanced connection properties. To use a JDBC version older than 4.0, you specify the driver class name and define a health check query.

The origin can generate events for an event stream. For more information about dataflow triggers and the event framework, see Dataflow Triggers Overview.

When concerned about performance, as when processing very wide tables, you can consider several alternatives to the default origin behavior. You can use the alternate PEG parser or use multiple threads for parsing. Or, you can configure the origin to not parse SQL query statements so you can pass the queries to the SQL Parser processor to be parsed.

For a list of supported operations and a description of generated records, see Parse SQL, Supported Operations, and Generated Records.

LogMiner Dictionary Source

LogMiner provides dictionaries to help process redo logs. LogMiner can store dictionaries in several locations.

The Oracle CDC Client can use the following dictionary source locations:
  • Online catalog - Use the online catalog when table structures are not expected to change.
  • Redo logs - Use redo logs when table structures are expected to change. When reading the dictionary from redo logs, the Oracle CDC Client origin determines when schema changes occur and refreshes the schema that it uses to create records. The origin can also generate events for each DDL it reads in the redo logs.
    Important: When using the dictionary in redo logs, make sure to extract the latest dictionary to the redo logs each time table structures change. For more information, see Task 4. Extract a LogMiner Dictionary (Redo Logs).

Note that using the dictionary in redo logs can have significantly higher latency than using the dictionary in the online catalog. But using the online catalog does not allow for schema changes.

For more information about dictionary options and configuring LogMiner, see the Oracle LogMiner documentation.

Oracle CDC Client Prerequisites

Before using the Oracle CDC Client origin, complete the following tasks:
  1. Configure the database archiving mode.
  2. Enable supplemental logging for the database or tables.
  3. Create a user account with the required roles and privileges.
  4. To use the dictionary in redo logs, extract the LogMiner dictionary.
  5. Install the Oracle JDBC driver.
Note: This documentation includes commands that might help you perform the required tasks. However, the commands that you need might differ depending on your Oracle environment and configuration. For definitive steps and commands, see the Oracle documentation and contact your database administrator (DBA).
This documentation uses the following terms, which comply with the Oracle documentation at this time:
  • CDB or multitenant database - A multitenant container database (CDB) that includes one or more pluggable databases (PDB).
  • PDB or pluggable database - A database that is part of a CDB.
  • Non-CDB database - A standalone database created on earlier versions of Oracle.

Task 1. Enable Database Archiving Mode

LogMiner provides access to database changes stored in redo log files that summarize database activity. The origin uses this information to generate records. LogMiner requires that the database be open, writable, and in ARCHIVELOG mode with archiving enabled.

You configure the database archiving mode differently based on whether you use an Oracle Standard, Oracle RAC, or Oracle Exadata database:
Oracle Standard databases
Use the following steps to determine the status of the database and to configure the database archiving mode:
  1. In a SQL shell, log into the database as a user with DBA privileges.
  2. Check the database logging mode.

    For example:

    select log_mode from v$database;

    If the command returns ARCHIVELOG, you can go to "Task 2. Enable Supplemental Logging".

    If the command returns NOARCHIVELOG, continue with the following steps.

  3. Shut down the database.

    For example:

    shutdown immediate;
  4. Start up and mount the database.

    For example:

    startup mount;
  5. Enable archiving, open the database, and make it writable.

    For example:

    alter database archivelog;
    alter database open read write;
Oracle RAC and Exadata databases
Use the following steps to configure the database archiving mode and verify the status of the database:
  1. In a Terminal session on one of the database nodes, stop and mount the database.

    For example:

    srvctl stop database -d <database name>
    srvctl start database -d <database name> -o mount
  2. In a SQL shell, log into the database as a user with DBA privileges and configure the database.

    For example:

    alter database archivelog;
  3. In the Terminal session, restart the database.

    For example:

    srvctl stop database -d <database name>
    srvctl start database -d <database name>
  4. In the SQL shell, check if the database is in logging mode.

    For example:

    select log_mode from v$database;

Task 2. Enable Supplemental Logging

To retrieve data from redo logs, LogMiner requires supplemental logging at a database level, table level, or a combination of both.

Enable at least primary key or "identification key" logging at a table level for each table that you want to use. With identification key logging, records include only the primary key and changed fields.

To enable supplemental logging for a table, you must first enable minimum supplemental logging for the database.

To include all fields in the records the origin generates, enable full supplemental logging at a table or database level. Full supplemental logging provides data from all columns, those with unchanged data as well as the primary key and changed columns. For details on the data included in records based on the supplemental logging type, see Parse SQL, Supported Operations, and Generated Records.
  1. Determine the logging that you want to enable.
  2. Before enabling supplemental logging, check if it is enabled for the database.

    For example, you might run the following command on your CDB, then the pluggable database:

    select supplemental_log_data_min, supplemental_log_data_pk, supplemental_log_data_all from v$database;

    If the command returns Yes or Implicit for all three columns, then minimal supplemental logging is enabled, as well as identification key and full supplemental logging. You can skip to "Task 3. Create a User Account."

    If the command returns Yes or Implicit for the first two columns, then supplemental and identification key logging are both enabled. If this is what you want, you can skip to "Task 3. Create a User Account."

  3. Enable minimal supplemental logging.
    For example, you can run the following command from your CDB or from your non-CDB database:
    alter database add supplemental log data;
  4. Optionally, enable identification key logging, if needed.

    When enabling identification key logging for a database, do so for the database that contains the schemas to be monitored.

    To enable identification key logging for a pluggable database, you must also enable it for the CDB.

    When enabling identification key logging for individual tables, you don’t need to enable it for the CDB. According to Oracle, best practice is to enable logging for individual tables, rather than the entire CDB. For more information, see the Oracle LogMiner documentation.

    For example, you can run the following command to apply the changes to just the container:
    alter session set container=<pdb>;
    Enable primary key or full supplemental identification key logging to retrieve data from redo logs. You do not need to enable both:
    To enable primary key logging
    You can enable primary key identification key logging for individual tables or all tables in the database:
    • For individual tables

      For example, you might run the following commands to enable minimal supplemental logging for a pluggable database, and then enable primary key logging for each table that you want to use:

      alter table <schema name>.<table name> add supplemental log data (primary key) columns;
    • For all tables

      For example, you might run the following commands to enable primary key logging for an entire pluggable database:

      alter database add supplemental log data (primary key) columns;
      alter database add supplemental log data (primary key);
    To enable full supplemental logging
    You can enable full supplemental identification key logging for individual tables or all tables in the database:
    • For individual tables

      For example, you might run the following commands to enable minimal supplemental logging for a pluggable database, and then enable full supplemental logging for each table that you want to use:
      alter table <schema name>.<table name> add supplemental log data (all) columns;
    • For all tables

      For example, you might run the following command to enable full supplemental logging for an entire pluggable database:
      alter database add supplemental log data (primary key) columns;
      alter database add supplemental log data (all) columns;
  5. Submit the changes.

    For example:

    alter system archive log current;

Task 3. Create a User Account

Create a user account to use with the origin. You need the account to access the database through JDBC.

You create accounts differently based on the Oracle version that you use:
12c, 18c, 19c, or 21c CDB databases
For Oracle, Oracle RAC, and Oracle Exadata 12c, 18c, 19c, and 21c CDB databases, create a common user account. Common user accounts are created in cdb$root and must use the convention: c##<name>.
  1. In a SQL shell, log into the database as a user with DBA privileges.
  2. Create the common user account.

    For example:

    alter session set container=cdb$root;
    
    create user <user name> identified by <password>;
    grant create session, alter session, set container, logmining, select any dictionary, select_catalog_role, execute_catalog_role to <user name> container=all;
    
    grant select on database_properties to <user name>;
    grant select on all_objects to <user name>;
    grant select on all_tables to <user name>;
    grant select on all_tab_columns to <user name>;
    grant select on all_constraints to <user name>;
    grant select on all_cons_columns to <user name>;
    
    grant select on v_$database to <user name>;
    grant select on gv_$database to <user name>;
    grant select on v_$instance to <user name>;
    grant select on gv_$instance to <user name>;
    grant select on v_$database_incarnation to <user name>;
    grant select on gv_$database_incarnation to <user name>;
    
    grant select on v_$log to <user name>;
    grant select on gv_$log to <user name>;
    grant select on v_$logfile to <user name>;
    grant select on gv_$logfile to <user name>;
    grant select on v_$archived_log to <user name>;
    grant select on gv_$archived_log to <user name>;
    grant select on v_$standby_log to <user name>;
    grant select on gv_$standby_log to <user name>;
    
    grant select on v_$logmnr_parameters to <user name>;
    grant select on gv_$logmnr_parameters to <user name>;
    grant select on v_$logmnr_logs to <user name>;
    grant select on gv_$logmnr_logs to <user name>;
    grant select on v_$logmnr_contents to <user name>;
    grant select on gv_$logmnr_contents to <user name>;
    
    alter session set container=<pdb>;
    
    grant select <db>.<table> to <user name>; 

    Repeat the final command for each table that you want to monitor.

When you configure the origin, use this user account for the JDBC credentials. Use the entire user name, including the c##, as the JDBC user name.

12c, 18c, 19c, or 21c non-CDB databases
For Oracle, Oracle RAC, and Oracle Exadata 12c, 18c, 19c, and 21c non-CDB databases, create a user account with the necessary privileges:
  1. In a SQL shell, log into the database as a user with DBA privileges.
  2. Create the user account.

    For example:

    create user <user name> identified by <password>;
    grant create session, alter session, set container, logmining, select any dictionary, select_catalog_role, execute_catalog_role to <user name>;
    
    grant select on database_properties to <user name>;
    grant select on all_objects to <user name>;
    grant select on all_tables to <user name>;
    grant select on all_tab_columns to <user name>;
    grant select on all_constraints to <user name>;
    grant select on all_cons_columns to <user name>;
    
    grant select on v_$database to <user name>;
    grant select on gv_$database to <user name>;
    grant select on v_$instance to <user name>;
    grant select on gv_$instance to <user name>;
    grant select on v_$database_incarnation to <user name>;
    grant select on gv_$database_incarnation to <user name>;
    
    grant select on v_$log to <user name>;
    grant select on gv_$log to <user name>;
    grant select on v_$logfile to <user name>;
    grant select on gv_$logfile to <user name>;
    grant select on v_$archived_log to <user name>;
    grant select on gv_$archived_log to <user name>;
    grant select on v_$standby_log to <user name>;
    grant select on gv_$standby_log to <user name>;
    
    grant select on v_$logmnr_parameters to <user name>;
    grant select on gv_$logmnr_parameters to <user name>;
    grant select on v_$logmnr_logs to <user name>;
    grant select on gv_$logmnr_logs to <user name>;
    grant select on v_$logmnr_contents to <user name>;
    grant select on gv_$logmnr_contents to <user name>;
    
    grant select <db>.<table> to <user name>;

    Repeat the final command for each table that you want to monitor.

When you configure the origin, use this user account for the JDBC credentials.

11g databases
For Oracle 11g databases, create a user account with the necessary privileges:
  1. In a SQL shell, log into the database as a user with DBA privileges.
  2. Create the user account.

    For example:

    create user <user name> identified by <password>;
    grant create session, alter session, execute_catalog_role, select any transaction, select any table to <user name>;
    grant select on gv_$database to <user name>;
    grant select on gv_$archived_log to <user name>;
    grant select on gv_$instance to <user name>;
    grant select on gv_$log to <user name>; 
    grant select on v_$database to <user name>;
    grant select on v_$database_incarnation to <user name>;
    grant select on v_$logmnr_contents to <user name>;
    grant select on v_$archived_log to <user name>;
    grant select on v_$instance to <user name>;
    grant select on v_$log to <user name>;
    grant select on v_$logfile to <user name>;
    grant select on v_$logmnr_logs to <user name>;
    grant select on <db>.<table> to <user name>;

    Repeat the final command for each table that you want to mine.

When you configure the origin, use this user account for the JDBC credentials.

Task 4. Extract a LogMiner Dictionary (Redo Logs)

When using redo logs as the dictionary source, you must extract the LogMiner dictionary to the redo logs before you start the pipeline. Repeat this step periodically to ensure that the redo logs that contain the dictionary are still available.

Oracle recommends that you extract the dictionary only at off-peak hours since the extraction can consume database resources.

To extract the dictionary for Oracle, Oracle RAC, and Oracle Exadata 12c, 18c, 19c, or 21c CDB databases, you might use the following commands:
alter session set container=cdb$root;
execute dbms_logmnr_d.build(options=> dbms_logmnr_d.store_in_redo_logs);
alter system archive log current;
To extract the dictionary for all Oracle, Oracle RAC, and Oracle Exadata non-CDB databases, you might use the following commands:
execute dbms_logmnr_d.build(options=> dbms_logmnr_d.store_in_redo_logs);
alter system archive log current;

Task 5. Install the Driver

The Oracle CDC Client origin connects to Oracle through JDBC. You cannot access the database until you install the required driver.

Install the Oracle JDBC driver for the Oracle database version that you use. Install the driver into the JDBC stage library, streamsets-datacollector-jdbc-lib, which includes the origin.

To use the JDBC driver with multiple stage libraries, install the driver into each stage library associated with the stages.

For information about installing additional drivers, see Install External Libraries.

Available Parsers

The Oracle CDC Client origin provides two parsers that you can choose from to process data:
Adaptive LL(*) parser
Also known as the ALL(*) parser, this parser uses Context-Free Grammar (CFG) to process data. This is the default parser.
For more information about ALL(*) processing, see https://www.antlr.org.
Recursive descent parser
Also known as the PEG parser, this parser uses Parsing Expression Grammar (PEG) to process data. Due to the simplicity of redo log statements, this parser can generally provide better performance.
To use the PEG parser, enable the Parse SQL Query property on the Oracle CDC tab, and then enable the Use PEG Parser property on the Advanced tab.
For more information about PEG processing, see https://en.wikipedia.org/wiki/Parsing_expression_grammar.

Table Configuration

When you configure the Oracle CDC Client origin, you specify the tables with the change capture data that you want to process.

A table configuration defines a group of tables with the same table name pattern, that are from one or more schemas with the same name pattern.

You can define one or more table configurations.

Schema, Table Name, and Exclusion Patterns

To specify the tables, you define the schema, a table name pattern, and an optional table exclusion pattern.

When defining the schema and table name pattern, you can use LIKE syntax to define a set of tables within a schema or across multiple schemas. For more information about valid patterns for the LIKE syntax, see the Oracle documentation.

When needed, you can also use a regular expression as an exclusion pattern to exclude a subset of tables from the larger set.

For example, say you want to process change data capture data for all tables in the sales schema that start with SALES while excluding those that end with a dash (-) and single-character suffix. You can use the following configuration to specify the tables to process:
  • Schema: sales
  • Table Name Pattern: SALES%
  • Exclusion Pattern: SALES.*-.

Initial Change

The initial change is the point in the LogMiner redo logs where you want to start processing. When you start the pipeline for the first time, the origin starts processing from the specified initial change. The origin only uses the specified initial change again when you reset the origin.

Note that Oracle CDC Client processes only change capture data. If you need existing data, you might use a JDBC Query Consumer or a JDBC Multitable Consumer in a separate pipeline to read table data before you start an Oracle CDC Client pipeline.

Oracle CDC Client provides several ways to configure the initial change:
From the latest change
The origin processes all changes that occur after you start the pipeline.
From a specified datetime
The origin processes all changes that occurred at the specified datetime and later. Use the following format: DD-MM-YYYY HH24:MI:SS.
Note: The DB Time Zone property must be set correctly to ensure that the appropriate changes are processed.
From a specified system change number (SCN)
The origin processes all changes that occurred in the specified SCN and later. When using the specified SCN, the origin starts processing with the timestamp associated with the SCN. If the SCN cannot be found in the redo logs, the origin continues reading from the next higher SCN that is available in the redo logs.
Typically, a database admin can provide the SCN to use.

Example

You want to process all existing data in the Orders table and then capture changed data, writing all data to Amazon S3. To read the existing data, you use a pipeline with the JDBC Query Consumer and Amazon S3 destination as follows:

Once all existing data is read, you stop the JDBC Query Consumer pipeline and start the following Oracle CDC Client pipeline. This pipeline is configured to pick up changes that occur after you start the pipeline, but if you wanted to prevent any chance of data loss, you could configure the initial change for an exact datetime or earlier SCN:

Choosing Buffers

When processing data, the Oracle CDC Client can buffer data locally on the Data Collector machine or use Oracle LogMiner buffers:

Local buffers

When using local buffers, the origin requests the transactions for the relevant tables and time period. The origin buffers the resulting LogMiner redo SQL statements until it verifies a commit for a transaction. After seeing a commit, it parses and processes the committed data.

The origin can buffer the redo SQL statements completely in memory or write them primarily to disk while using a small amount of memory for tracking.

By default, the origin uses local buffers. In general, using local buffers should provide better performance than Oracle LogMiner buffers.

Use local buffers to process large transactions or to avoid monopolizing the Oracle PGA. Buffer information in memory for better performance when Data Collector resources allow. Buffer information to disk to avoid monopolizing Data Collector resources.

Oracle LogMiner buffers
When using Oracle LogMiner buffers, the origin requests data from Oracle LogMiner for a particular time period. LogMiner then buffers all transactions for that time period for all tables in the database, rather than only the tables needed by the origin.
LogMiner keeps the transaction information in the buffers until it reads commits in the logs, then it passes the committed data to the Oracle CDC Client origin.
Depending on the size and number of the transactions and the time period being buffered, buffering the transactions can monopolize the PGA, the private memory region for the Oracle server process.
Use LogMiner buffers when you do not expect the volume of transactions to overtax Oracle resources.

Local Buffer Resource Requirements

Before using local buffers, you should verify that the allocated resources are sufficient for the needs of the pipeline.

Use the following guidelines based on the local buffering option that you use:
In memory
When buffering in memory, the origin buffers the LogMiner redo SQL statements returned by Oracle. It processes the data after receiving a commit for the statement.
Before buffering in memory, increase the Data Collector Java heap size to accommodate the information that you expect the pipeline to receive.
Use the following equation to estimate the Java heap size requirements:
Required Memory = (L * (30 + S) * T * 1.5) bytes
  • L - Maximum number of statements LogMiner generates for each transaction.

    Each changed row generates one statement with all field names and values.

  • S - Maximum length in characters of each redo SQL statement generated by LogMiner, including all field names and values.
    For example, the following SQL statement contains 92 characters, including spaces and punctuation.
    insert into "SYS"."Y"("ID","D") values ('690054',TO_TIMESTAMP('2017-07-18 02:00:00.389390'))
  • T - Maximum number of transactions in flight for the tables at any given time.
30 represents the number of bytes in the statement ID, which is also stored.
To disk
When buffering to disk, the origin stores only the statement ID for each SQL query in memory. Then it saves the queries to disk.

By default the origin buffers data to the directory specified by the system property, java.io.tmpdir. You can configure the origin to buffer data to an existing directory on the Data Collector machine. The user who starts the pipeline must have read and write access to the directory.

Note that when buffering to disk, Data Collector clears the buffered data when the pipeline starts and stops. When you restart a pipeline, Data Collector uses the last-saved offset unless you reset the origin.
Before buffering to disk, ensure that enough space is available on the local disk. You might also increase the Data Collector Java heap size, but to a lesser degree than when buffering fully in memory.
Use the following calculation to determine the amount of disk space that the pipeline requires:
Required Disk Space = (L * S * T * 1.5) bytes
  • L - Maximum number of statements LogMiner generates for each transaction.

    Each changed row generates one statement with all field names and values.

  • S - Maximum length in characters of each redo SQL statement generated by LogMiner, including all field names and values.
    For example, the following SQL statement contains 92 characters, including spaces and punctuation.
    insert into "SYS"."Y"("ID","D") values ('690054',TO_TIMESTAMP('2017-07-18 02:00:00.389390'))
  • T - Maximum number of transactions in flight for the tables at any given time.
Use the following equation to estimate the Java heap size required by the pipeline:
Required Memory = (L * T * 30 * 1.5) bytes
  • L - Maximum number of statements LogMiner generates for each transaction.

    Each changed row generates one statement with all field names and values.

  • T: Maximum number of transactions in flight for the tables at any given time.

For information about configuring the Data Collector heap size, see Java Heap Size.

Uncommitted Transaction Handling

You can configure how the Oracle CDC Client origin handles old uncommitted transactions when the origin uses local buffers.

When using local buffers, the Oracle CDC Client origin clears the buffers of old uncommitted transactions after each LogMiner session. Old uncommitted transactions are those that are older than the time specified for the Maximum Transaction Length property.

By default, the origin generates error records for the old uncommitted transactions. The origin converts each LogMiner redo SQL statement to a record and passes the record to the stage for error handling.

If you don't need error records for old uncommitted transactions, you can configure the origin to discard uncommitted transactions. This also reduces the overhead used to generate the error records.

For example, say the Maximum Transaction Length and the LogMiner Session Window properties are both set to one hour, and a transaction arrives in the last ten minutes of a LogMiner session. If the transaction remains uncommitted when the LogMiner session closes and the next session begins, the Oracle CDC origin retains the transaction for the next session. This is because the transaction is only ten minutes old and the maximum transaction length is an hour.

Now say the transaction is still uncommitted an hour later, when the next LogMiner session closes. Then, the transaction is considered an old uncommitted transaction because it is older than the maximum transaction length. By default, the origin generates an error record for the transaction. However, if you configure the origin to discard old uncommitted transactions, the origin simply discards the transaction instead.

To discard old uncommitted transactions, select the Discard Old Uncommitted Transactions property. This property is only available when you buffer changes locally.

JDBC Fetch Options

The Oracle CDC Client origin provides the following fetch size properties that determine the maximum number of rows that the origin fetches at one time:
  • JDBC Fetch Size for Current Window - Maximum number of rows to fetch and process together from an executed statement when the LogMiner session includes the current time.
  • JDBC Fetch Size for Past Windows - Maximum number of rows to fetch and process together from an executed statement when the LogMiner session is entirely in the past.
As a best practice, configure these properties based on whether the CONTINUOUS_MINE option is enabled for LogMiner:
CONTINUOUS_MINE enabled
When LogMiner has CONTINUOUS_MINE enabled, set both fetch size properties to as small a value as possible.
When using CONTINUOUS_MINE, LogMiner sends packets of the specified fetch size, with no wait timeout. So setting large fetch sizes can cause LogMiner to wait indefinitely for data. As a result, smaller fetch sizes result in a closer to real-time experience.
When possible, set both fetch size properties to 1.
CONTINUOUS_MINE disabled
When LogMiner has CONTINUOUS_MINE disabled, set both fetch size properties to as large a value as your execution environment allows. For example, you might typically set both properties to between 10,000 - 20,000 or higher.
When processing a large volume of data, using larger fetch sizes can improve performance by reducing the number of database hits. When processing a large volume of existing data, you might set JDBC Fetch Size for Past Windows higher for better performance.
When setting high fetch sizes, make sure that Data Collector has sufficient resources to handle the load. For information about configuring the Data Collector heap size, see Java Heap Size.

Include Nulls

When the Oracle LogMiner performs full supplemental logging, the resulting data includes all columns in the table with null values where no changes occurred. When the Oracle CDC Client processes this data, by default, it ignores null values when generating records.

You can configure the origin to include the null values in the record. You might need to include the null values when the destination system has required fields. To include null values, enable the Include Nulls property on the Oracle CDC tab.

Unsupported Data Types

You can configure how the origin handles records that contain unsupported data types. The origin can perform the following actions:

  • Pass the record to the pipeline without the unsupported data types.
  • Pass the record to error without the unsupported data types.
  • Discard the record.

You can configure the origin to include the unsupported data types in the record. When you include unsupported types, the origin includes the field names and passes the data as unparsed strings, when possible.

The following Oracle data types are not supported by the Oracle CDC Client origin:

  • Array
  • Datalink
  • Distinct
  • Interval
  • Java Object
  • Nclob
  • Other
  • Ref
  • Ref Cursor
  • SQLXML
  • Struct
  • Time with Timezone

Conditional Data Type Support

Note the following information about conditionally supported Oracle data types:
  • You can configure advanced properties to enable the origin to process Oracle Blob and Clob data. The origin converts Blob data to the Data Collector Byte Array data type, and Clob data to the Data Collector String data type.
  • The origin treats Oracle Raw data as a Data Collector Byte Array data type.
  • The origin converts Oracle Timestamp with Timezone data to the Data Collector Zoned Datetime data type. To maximize efficiency while providing more precision, the origin includes only the UTC offset with the data. It omits the time zone ID.

Processing Blob and Clob Columns

Though the Oracle CDC Client origin does not process Blob or Clob columns by default, you can configure the origin to process Blob and Clob columns when it buffers changes locally. You can also specify a limit to the size of the resulting Blob and Clob fields.

When you specify a limit to the field size, the origin applies the limit as bytes to Blob fields and characters to Clob fields. For example, if you specify 100 as a limit, Blob fields are limited to 100 bytes and Clob fields are limited to 100 characters. Data over the specified limit is replaced with a null value.
Tip: Be sure to specify a size limit or ensure that Data Collector has enough memory to process large Blob and Clob values before enabling Blob and Clob processing.

When processing Blob or Clob data, the origin treats Oracle operations on Blob or Clob columns as an update record. For example, when Clob data is trimmed in Oracle, the Oracle CDC Client origin generates a corresponding update record for the change.

The origin generates update records when the following Oracle operations are performed on Blob or Clob columns:

  • LOB_WRITE
  • LOB_ERASE
  • LOB_TRIM
The update record includes additional information about the change in the following record header attributes:
  • oracle.cdc.operation.lobColumnName - Name of the updated Blob or Clob field.
  • oracle.cdc.operation.lobOperationName - Operation performed on the Blob or Clob field. Appears only when the field has been erased or trimmed. Possible values are ERASE or TRIM.
  • oracle.cdc.operation.lobLength - Contains different information depending on the operation performed:
    • When new data is written or trimmed, contains the new length of the Blob or Clob data.
    • When data is erased, contains the length of the segment of the Blob or Clob data that was erased.

    The length is in bytes for Blob fields and characters for Clob fields.

  • oracle.cdc.operation.lobOffset - Offset of the data that was erased, in bytes for Blob fields and characters for Clob fields. Appears only when the field has been erased.
For example, say a Clob field is named ClobData and the original data, abcdefg, has cde erased, leaving an updated value of abfg. Then, the Oracle CDC Client origin generates an update record and includes the following details in record header attributes:
  • oracle.cdc.operation.lobColumnName set to ClobData
  • oracle.cdc.operation.lobOperationName set to ERASE
  • oracle.cdc.operation.lobLength set to 3
  • oracle.cdc.operation.lobOffset set to 2

To enable the origin to process Blob or Clob data, on the Advanced tab, click the Enable Blob and Clob Column Processing property.

To set a limit on the resulting field size, specify the maximum number of bytes and characters to use in the Maximum Lob Size property.

Parse SQL, Supported Operations, and Generated Records

The Oracle CDC Client origin generates records for the operation types that you specify. However, the origin reads all operation types from redo logs to ensure proper transaction handling.

Generated records contain fields in the same order as the corresponding columns in the database tables. The information included in generated records differs depending on whether the origin is configured to parse redo log SQL statements:

Parsing SQL statements
When the origin parses redo SQL statements, you can configure it to create records for the following operations:
  • INSERT
  • DELETE
  • UPDATE
The origin generates records based on the operation type, the logging level and type enabled for the database and tables, and the operations specified in the Operations stage property. For example, if you specify Insert operations, the origin generates only Insert records, ignoring all Update and Delete operations.
The following table describes how the origin generates record data:
Oplog Operation Identification/Primary Key Logging Only Full Supplemental Logging
INSERT All fields that contain data, ignoring fields with null values. All fields.
UPDATE Primary key fields and fields with updated values. All fields.
DELETE Primary key fields. All fields.
When parsing redo SQL statements, the origin also includes CDC and CRUD information in record header attributes.
To parse the SQL statements, on the Oracle CDC tab, select the Parse SQL Query property. This property is enabled by default.
Then, configure the Operations property to specify the operations that you want to process.
Not parsing SQL statements
When the origin does not parse redo SQL statements, it writes each LogMiner SQL statement to a field named sql. The origin also generates field attributes that provide additional information about this field.
You might write SQL statements to generate records to pass the queries to the SQL Parser processor to be parsed or to archive that information.
When not parsing SQL statements, you can configure the origin to create records for the following operations:
  • INSERT
  • DELETE
  • UPDATE
  • SELECT_FOR_UPDATE
To write LogMiner SQL statements to generated records, on the Oracle CDC tab, clear the Parse SQL Query property.
Then, configure the Operations property to specify the operations that you want to process.

CRUD Operation Header Attributes

When the Oracle CDC Client is configured to parse SQL statements, it specifies the operation type in the following record header attributes when generating records
sdc.operation.type
The origin evaluates the Oplog operation type associated with each entry that it processes. When appropriate, it writes the operation type to the sdc.operation.type record header attribute.
The origin uses the following values in the attribute to represent the operation type:
  • 1 for INSERT
  • 2 for DELETE
  • 3 for UPDATE and SELECT_FOR_UPDATE

If you use a CRUD-enabled destination in the pipeline such as JDBC Producer or Elasticsearch, the destination can use the operation type when writing to destination systems. When necessary, you can use an Expression Evaluator processor or any scripting processor to manipulate the value in the header attribute. For an overview of Data Collector changed data processing and a list of CRUD-enabled destinations, see Processing Changed Data.

When using CRUD-enabled destinations, the destination looks for the operation type in the sdc.operation.type attribute before checking the oracle.cdc.operation attribute.
oracle.cdc.operation
The origin also writes the CRUD operation type to an oracle.cdc.operation attribute. This attribute was implemented in an earlier release and is supported for backward compatibility.
The origin uses the following values in the attribute to represent the operation type:
  • INSERT
  • UPDATE
  • SELECT_FOR_ UPDATE
  • DELETE
CRUD-enabled destinations check this attribute for the operation type only if the sdc.operation.type attribute is not set.

CDC Header Attributes

The Oracle CDC Client origin always provides the following CDC record header attributes for each record:
  • oracle.cdc.sequence.oracle - Includes sequence numbers that indicate the order in which statements were processed within the transaction. The sequence number is generated by Oracle. Use to keep statements in order within a single transaction.
    Note: When the origin uses local buffering, Oracle provides the default value of 1 for all records, so should not be used to order statements. Use the oracle.cdc.sequence.internal attribute when using local buffering.
  • oracle.cdc.sequence.internal - Includes sequence numbers equivalent to those in the attribute above, but created by the origin. The sequence starts with 0 each time the pipeline starts.

    Like the attribute above, you can use these values to keep statements in order within a single transaction.

    Best practice is to use this attribute only when using local buffering, when the oracle.cdc.sequence.oracle attribute does not provide the expected data.

SQL Query Header Attributes

When configured to parse the SQL query, the origin also provides the following CDC record header attributes for each record:
  • oracle.cdc.operation
  • oracle.cdc.precisionTimestamp - Timestamp, to the nanosecond, when the database change occurred.

    Used when the database provides the timestamp in nanoseconds.

  • oracle.cdc.query
  • oracle.cdc.redoValue
  • oracle.cdc.rowId
  • oracle.cdc.scn
  • oracle.cdc.table
  • oracle.cdc.timestamp - Timestamp, to the second, when the change occurred.

    Used when the database provides the timestamp in seconds.

  • oracle.cdc.undoValue
  • oracle.cdc.user

Blob and Clob Header Attributes

When configured to process Blob and Clob data and a record includes an update to a Blob or Clob field, the origin also includes the following record header attributes for each record:
  • oracle.cdc.operation.lobColumnName - Name of the updated Blob or Clob field.
  • oracle.cdc.operation.lobOperationName - Operation performed on the Blob or Clob field. Appears only when the field has been erased or trimmed. Possible values are ERASE or TRIM.
  • oracle.cdc.operation.lobLength - Contains different information depending on the operation performed:
    • When new data is written or trimmed, contains the new length of the Blob or Clob data.
    • When data is erased, contains the length of the segment of the Blob or Clob data that was erased.

    The length is in bytes for Blob fields and characters for Clob fields.

  • oracle.cdc.operation.lobOffset - Offset of the data that was erased, in bytes for Blob fields and characters for Clob fields. Appears only when the field has been erased.

Other Header Attributes

When parsing the SQL query, the Oracle CDC Client origin can also provide record header attributes for records of tables with a primary key and for decimal fields:
Primary keys
When a table contains a primary key, the origin includes the following record header attribute:
  • jdbc.primaryKeySpecification - Provides a JSON-formatted string that lists the columns that form the primary key in the table and the metadata for those columns.
    For example, a table with a composite primary key contains the following attribute:
    jdbc.primaryKeySpecification = 
         {{"<primary key column 1 name>":
             {"type": <type>, 
              "datatype": "<data type>", 
              "size": <size>, 
              "precision": <precision>, 
              "scale": <scale>, 
              "signed": <Boolean>,  
              "currency": <Boolean> }},
              ...,
         {"<primary key column N name>":
             {"type": <type>, 
              "datatype": "<data type>", 
              "size": <size>, 
              "precision": <precision>, 
              "scale": <scale>, 
              "signed": <Boolean>,  
              "currency": <Boolean> } } }
    A table without a primary key contains the attribute with an empty value:
    jdbc.primaryKeySpecification = {} 
For an update operation on a table with a primary key, the origin includes the following record header attributes:
  • jdbc.primaryKey.before.<primary key column name> - Provides the old value for the specified primary key column.
  • jdbc.primaryKey.after.<primary key column name> - Provides the new value for the specified primary key column.
Note: The origin provides the new and old values of the primary key columns regardless of whether the value changes.
Decimal fields
When a record includes decimal fields, the origin includes the following record header attributes for each decimal field in the record:
  • jdbc.<fieldname>.precision
  • jdbc.<fieldname>.scale

You can use the record:attribute or record:attributeOrDefault functions to access the information in the attributes. For more information about working with record header attributes, see Working with Header Attributes.

Field Attributes

The Oracle CDC Client origin generates field attributes for columns converted to the Decimal or Datetime data types in Data Collector. The attributes provide additional information about each field.

The following data type conversions do not include all information in the corresponding Data Collector type:
  • The Oracle Number data type is converted to the Data Collector Decimal data type, which does not store scale and precision.
  • The Oracle Timestamp data type is converted to the Data Collector Datetime data type, which does not store nanoseconds.
To preserve this information during data type conversion, the origin generates the following field attributes for these Data Collector data types:
Data Collector Data Type Generated Field Attribute Description
Decimal precision Provides the original precision for every number column.
Decimal scale Provides the original scale for every number column.
Datetime nanoSeconds Provides the original nanoseconds for every timestamp column.

You can use the record:fieldAttribute or record:fieldAttributeOrDefault functions to access the information in the attributes. For more information about working with field attributes, see Field Attributes.

Database Time Zone

When you configure the origin, you must specify the time zone that the database operates in. Select one of the following options for the DB Time Zone property on the Oracle CDC tab:
  • Same as Data Collector - Select when the database operates in the same time zone as Data Collector.
  • Specific time zone - Select when the database operates in a different time zone from Data Collector.

The property must be set correctly to ensure that the appropriate changes are processed.

Change in Daylight Saving Time

A database time zone can be affected by a change in daylight saving time, based on whether the database uses an offset or a named time zone. An offset time zone is defined with a numeric value, such as -08:00, and is not affected by a change in daylight saving time. A named time zone is defined with a name, such as Brazil, America/Vancouver, or CET. Some named time zones are affected by a change in daylight saving time.

If the database operates in a time zone that is affected by daylight saving time, then the pipeline pauses processing during the time change window to ensure that all data is correctly processed. Pipelines can pause from 1.5 to 2.5 hours during the time change, depending on whether the daylight saving time change is forward or backward. After the time change completes, the pipeline resumes processing at the last-saved offset.

During the time change window, the pipeline state remains running, but the internal mining state of the pipeline is paused.

When you monitor a running Oracle CDC Client pipeline, you can view the internal mining state when you select the origin. The mining state can have one of the following values:
  • Active - Pipeline is running and actively processing changes.
  • Paused - Pipeline is running but paused during a time change window. The state lists the remaining paused time.

As you monitor a running Oracle CDC Client pipeline, you can also view the configured database time zone, and whether the time zone is affected by daylight saving time, as indicated by true or false. For example, the following stage statistics indicate that the pipeline has a Paused mining state, the remaining paused time is 3,600 seconds or one hour, and the origin is using the Eastern Standard Time (EST) time zone which does use daylight saving time:

Event Generation

The Oracle CDC Client origin can generate events that you can use in an event stream when the origin uses redo logs as the dictionary source. The origin does not generate events when using the online catalog as the dictionary source.

When you use redo logs as the dictionary source and enable event generation, the Oracle CDC Client generates events when it reads DDL statements. It generates events for ALTER, CREATE, DROP, and TRUNCATE statements.

When you start the pipeline, the origin queries the database and caches the schemas for all tables listed in the origin, and then generates an initial event record for each table. Each event record describes the current schema for each table. The origin uses the cached schemas to generate records when processing data-related redo log entries.

The origin then generates an event record for each DDL statement it encounters in the redo logs. Each event record includes the DDL statement and the related table in record header attributes.

The origin includes table schema information in the event record for new and updated tables. When the origin encounters an ALTER or CREATE statement, it queries the database for the latest schema for the table.

If the ALTER statement is an update to a cached schema, the origin updates the cache and includes the updated table schema in the event record. If the ALTER statement is older than the cached schema, the origin does not include the table schema in the event record. Similarly, if the CREATE statement is for a "new" table, the origin caches the new table and includes the table schema in the event record. Because the origin verifies that all specified tables exist when the pipeline starts, this can occur only when the table is dropped and created after the pipeline starts. If the CREATE statement is for a table that is already cached, the origin does not include the table schema in the event record.

Oracle CDC Client events can be used in any logical way. For example:

For more information about dataflow triggers and the event framework, see Dataflow Triggers Overview.

Event Records

Event records generated by the Oracle CDC Client origin have the following event-related record header attributes. Record header attributes are stored as strings:
Record Header Attribute Description
sdc.event.type Event type. Uses one of the following types:
  • STARTUP
  • ALTER
  • CREATE
  • DROP
  • TRUNCATE
sdc.event.version Integer that indicates the version of the event record type.
sdc.event.creation_timestamp Epoch timestamp when the stage created the event.
oracle.cdc.table Name of the Oracle database table that changed.
oracle.cdc.ddl The DDL statement that triggered the event.
STARTUP event records are generated when the pipeline starts. The origin creates one event record for each table. The event record includes the current table schema.

DROP and TRUNCATE event records include just the record header attributes listed above.

CREATE event records include the schema for the new table when the table has been dropped and recreated. ALTER event records include the table schema when the statement updates a cached schema. For more information about the behavior for CREATE and ALTER statements, see Event Generation.

For example, the following ALTER event record displays the three fields in an updated schema - NICK, ID, and NAME:

In the list of record header attributes, notice the DDL statement that added the NICK field, the name of the updated table, and the ALTER event type.

Multithreaded Parsing

When you configure the origin to use local buffering and to parse the SQL query, you can configure the Oracle CDC Client origin to use multiple threads to parse transactions. You can use multithreaded parsing with both the default Oracle CDC Client parser and the alternate PEG parser.

When performing multithreaded parsing, the origin uses multiple threads to generate records from committed SQL statements in a transaction. It does not perform multithreaded processing of the resulting records.

The Oracle CDC Client origin uses multiple threads for parsing based on the Parse Thread Pool Size property. When you start the pipeline, the origin creates the number of threads specified in the property. The origin connects to Oracle, creates a LogMiner session, and processes

When the origin processes a transaction, it reads and buffers all SQL statements in the transaction to an in-memory queue and waits for statements to be committed before processing them. Once committed, the SQL statements are parsed using all available threads and the original order of the SQL statements is retained.

The resulting records are passed to the rest of the pipeline. Note that enabling multithreaded parsing does not enable multithreaded processing – the pipeline uses a single thread for reading data.

Working with the SQL Parser Processor

For very wide tables, those with hundreds of columns, reading the redo logs and parsing the SQL query using the Oracle CDC Client origin may take longer than expected and can cause the Oracle redo logs to be rotated out before they are read. When this happens, data is lost.

To avoid this, you can use multiple pipelines and the SQL Parser processor. The first pipeline contains the Oracle CDC Client and an intermediate endpoint. Configure the origin to not parse the SQL query. The second pipeline passes records from the intermediate endpoint to the SQL Parser processor to both parse the SQL query and to update the fields. Using this approach, the origin can read the redo logs without waiting for the SQL Parser to finish and therefore no data is lost.

The Oracle CDC Client continues to perform the following tasks:
  • Reads the change data logs.
  • Generates records that contain only the SQL query.
  • Generates events.
The SQL Parser processor performs the following tasks:
  • Parses the SQL query.
  • Generates CDC and CRUD record header attributes.

For more information about the SQL Parser processor, see SQL Parser.

Working with the Drift Synchronization Solution

If you use the Oracle CDC Client origin as part of a Drift Synchronization Solution for Hive pipeline, make sure to pass only records flagged for Insert to the Hive Metadata processor.

The Drift Synchronization Solution for Hive enables automatic updates of Hive tables based on incoming data. The Hive Metadata processor, the first stage in the solution, expects Insert records only. Below are some recommended methods to ensure that the processor receives only Insert records:
  • Configure the Oracle CDC Client to process only Insert records.
  • If you want to process additional record types in the pipeline, use a Stream Selector to route only Insert records to the Hive Metadata processor.

Data Preview with Oracle CDC Client

When using data preview with the Oracle CDC Client origin, note the following details:
  • You might need to increase the Preview Timeout data preview property.

    Due to the complex nature of the Oracle CDC Client origin, initiating preview can take longer than expected. If preview times out, try increasing the Preview Timeout property incrementally to allow the origin time to connect.
  • To generate more than one preview record, set the Max Batch Wait Time property for the origin to a value greater than 0.

Configuring an Oracle CDC Client Origin

Configure an Oracle CDC Client origin to process LogMiner change data capture information from an Oracle database.

Before you use the origin, complete the prerequisite tasks. For more information, see Oracle CDC Client Prerequisites.

  1. In the Properties panel, on the General tab, configure the following properties:
    General Property Description
    Name Stage name.
    Description Optional description.
    Produce Events When the origin uses redo logs as the dictionary source, can generate event records when the origin reads DDL statements. Use for event handling.
    On Record Error Error record handling for the stage:
    • Discard - Discards the record.
    • Send to Error - Sends the record to the pipeline for error handling.
    • Stop Pipeline - Stops the pipeline.
    Note: The origin does not apply this error record handling to the following error codes:
    • JDBC_85 - The record has fields with unsupported field types. Configure the Unsupported Field Types property to determine how the origin handles these records.

    • ORACLE_CDC_CLIENT_LOG_MINER_18 - The transaction started before the set of transactions being considered. To discard these records, enable the Discard Old Uncommittted Transactions property.

  2. On the Oracle CDC tab, configure the following change data capture properties:
    Oracle CDC Property Description
    Tables Tables to track. Specify related properties as needed.

    Using simple or bulk edit mode, click the Add icon to define another table configuration.

    Schema Name Schema to use. You can enter a schema name or use LIKE syntax to specify a set of schemas.

    The origin submits the schema name in all caps by default. To use a lower or mixed-case name, select the Case-Sensitive Names property.

    Table Name Pattern Table name pattern that specifies the tables to track. You can enter a table name or use LIKE syntax to specify a set of tables. The origin submits table names in all caps by default. To use lower or mixed-case names, select the Case-Sensitive Names property.
    Exclusion Pattern Optional table exclusion pattern to define a subset of tables to exclude. You can enter a table name or use a regular expression to specify a subset of tables to exclude.

    To view and configure this option, click Show Advanced Options.

    Case-Sensitive Names Submits case-sensitive schema, table, and column names. When not selected, the origin submits names in all caps.

    Oracle uses all caps for schema, table, and column names by default. Names can be lower- or mixed-case only if the schema, table, or column was created with quotation marks around the name.

    Initial Change Starting point for the read. When you start the pipeline for the first time, the origin starts processing from the specified initial change. The origin only uses the specified initial change again when you reset the origin.

    Use one of the following options:

    • From Latest Change - Processes changes that arrive after you start the pipeline.
    • From Date - Processes changes starting from the specified date.
      Note: The DB Time Zone property must be set correctly to ensure that the appropriate changes are processed.
    • From SCN - Processes changes starting from the specified system change number (SCN).
    Start Date Datetime to read from when you start the pipeline. For a date-based initial change.

    Use the following format: DD-MM-YYYY HH24:MI:SS.

    Start SCN System change number to start reading from when you start the pipeline. If the SCN cannot be found in the redo logs, the origin continues reading from the next higher SCN that is available in the redo logs.

    For an SCN-based initial change.

    Operations Operations to include when creating records. All unlisted operations are ignored.

    Operations are supported based on whether the Parse SQL Query property is selected. For more information, see Parse SQL, Supported Operations, and Generated Records.

    Dictionary Source Location of the LogMiner dictionary:
    • Redo logs - Use when schemas can change. Allows the origin to adapt to schema changes and to generate events for DDL statements.
    • Online catalog - Use for better performance when schemas are not expected to change.
    Buffer Changes Locally Determines the buffers that the origin uses. Select to use local Data Collector buffers. Clear to use Oracle LogMiner buffers. Generally, using local buffers will enhance pipeline performance.

    By default, the origin uses local buffers.

    Buffer Location Location of local buffers:
    • In Memory
    • On Disk

    Before running the pipeline, note the local buffer resource requirements. For more information, see Local Buffer Resource Requirements.

    Available when using local buffers.

    Buffer Directory Directory where origin buffers data. Specify an existing directory on the Data Collector. The user who starts the pipeline must have read and write access to the directory.

    If you do not specify a directory, the origin uses the directory specified in the system property, java.io.tmpdir.

    Available when buffer location is on disk.

    Discard Old Uncommitted Transactions Discards old uncommitted transactions at the end of a LogMiner session instead of processing them into error records.

    Old uncommitted transactions are those that are older than the time specified for the Maximum Transaction Length property.

    Available when using local buffers.

    Unsupported Field Type Action taken when the origin encounters unsupported data types in the record:
    • Ignore and Send Record to Pipeline - The origin ignores unsupported data types and passes the record with only supported data types to the pipeline.
    • Send Record to Error - The origin handles the record based on the error record handling configured for the stage. The error record contains only the supported data types.
    • Discard Record - The origin discards the record.

    For a list of unsupported data types, see Unsupported Data Types.

    Add Unsupported Fields to Records Includes fields with unsupported data types in the record. Includes the field names and the unparsed string values of the unsupported fields, when possible.
    Include Nulls Includes null values in records generated from full supplemental logging that include null values. By default, the origin generates a record without null values.
    Convert Timestamp To String Enables the origin to write timestamps as string values rather than datetime values. Strings maintain the precision stored in the source system. For example, strings can maintain the precision of a high-precision Timestamp(6) field.

    When writing timestamps to Data Collector date or time data types that do not store nanoseconds, the origin stores any nanoseconds from the timestamp in a field attribute.

    Maximum Transaction Length Time in seconds to wait for changes for a transaction. Enter the longest period of time that you expect a transaction to require.

    Transactions older than the specified time are treated as old uncommitted transactions each time a LogMiner session closes.

    Default is ${ 1 * HOURS } which is 3600 seconds.

    Used for local buffers only.

    LogMiner Session Window Time in seconds to keep a LogMiner session open. Set to larger than the maximum transaction length. Reduce when not using local buffering to reduce LogMiner resource use.

    Default is ${ 2 * HOURS } which is 7200 seconds.

    Parse SQL Query Parses SQL queries to generate records.

    If set to false, the origin writes the SQL query to a sql field that can be parsed later by the SQL Parser processor.

    For more information, see Parse SQL, Supported Operations, and Generated Records.

    By default, the origin parses the SQL queries.

    Send Redo Query in Headers Includes the LogMiner redo query in the oracle.cdc.query record header attribute.
    DB Time Zone Time zone of the database:
    • Same as Data Collector - Select when the database operates in the same time zone as Data Collector.
    • Specific time zone - Select when the database operates in a different time zone from Data Collector.

    The property must be set correctly to ensure that the appropriate changes are processed.

  3. On the JDBC tab, configure the following JDBC properties:
    JDBC Property Description
    Connection Connection that defines the information required to connect to an external system.

    To connect to an external system, you can select a connection that contains the details, or you can directly enter the details in the pipeline. When you select a connection, Control Hub hides other properties so that you cannot directly enter connection details in the pipeline.

    To create a new connection, click the Add New Connection icon: . To view and edit the details of the selected connection, click the Edit Connection icon: .

    JDBC Connection String Connection string used to connect to the database.
    Note: If you include the JDBC credentials in the connection string, use the user account created for the origin. Common user accounts for Oracle and Oracle RAC 12c, 18c, 19c, and 21c CDB databases start with c##. For more information, see Task 3. Create a User Account.
    Max Batch Size (records) Maximum number of records processed at one time. Honors values up to the Data Collector maximum batch size.

    Default is 1000. The Data Collector default is 1000.

    Batch Wait Time (secs) Number of seconds to wait before sending a partial or empty batch.
    PDB Name of the pluggable database that contains the schema you want to use. Use only when the schema was created in a pluggable database.

    Required for schemas created in pluggable databases.

    Use Credentials Enables entering credentials. Use when you do not include credentials in the JDBC connection string.
    Additional JDBC Configuration Properties Additional JDBC configuration properties to use. To add properties, click Add and define the JDBC property name and value.

    Use the property names and values as expected by JDBC.

    LogMiner Query Timeout (seconds) Maximum number of seconds that the origin waits for a LogMiner query to complete.

    Use 0 for no limit.

  4. To enter JDBC credentials separately from the JDBC connection string, on the Credentials tab, configure the following properties:
    Credentials Property Description
    Username User name for the JDBC connection. The user account must have the correct permissions or privileges in the database.

    Use the user account created for the origin. Common user accounts for Oracle and Oracle RAC 12c, 18c, 19c, and 21c CDB databases start with c##.

    For more information, see Task 3. Create a User Account.

    Password Password for the account.
    Tip: To secure sensitive information such as user names and passwords, you can use runtime resources or credential stores.
  5. When using JDBC versions older than 4.0, on the Legacy Drivers tab, optionally configure the following properties:
    Legacy Drivers Property Description
    JDBC Class Driver Name Class name for the JDBC driver. Required for JDBC versions older than version 4.0.
    Connection Health Test Query Optional query to test the health of a connection. Recommended only when the JDBC version is older than 4.0.
  6. On the Advanced tab, optionally configure advanced properties.
    The defaults for these properties should work in most cases:
    Advanced Property Description
    JDBC Fetch Size for Current Window Maximum number of rows to fetch and process together from an executed statement when the LogMiner session includes the current time.

    Configure this property based on whether the CONTINUOUS_MINE option is enabled for LogMiner.

    Default is 1.

    JDBC Fetch Size for Past Windows Maximum number of rows to fetch and process together from an executed statement when the LogMiner session is entirely in the past.

    Configure this property based on whether the CONTINUOUS_MINE option is enabled for LogMiner.

    Default is 1.

    Time between Session Windows (ms) Milliseconds to wait after a LogMiner session has been completely processed, ensuring a minimum size for LogMiner session windows.

    Default is 5000.

    Time after Session Window Start (ms) Milliseconds to wait after a LogMiner session window starts, allowing Oracle to finish setting up before processing begins.

    Default is 5000.

    Minimum Lag from Database Time (sec)

    Minimum number of seconds between the time an operation occurs and the database time, before the origin processes the operation.

    Default is ${1 * SECONDS}.

    Use PEG Parser Enables use of the PEG parser instead of the default ALL(*) parser. Can typically improve performance.
    Parse Thread Pool Size Number of threads that the origin uses for multithreaded parsing. Available only when performing local buffering and parsing the SQL query.

    Can be used with both the default parser and PEG parser.

    Disable Continuous Mine Disables the LogMiner CONTINUOUS_MINE option. The CONTINUOUS_MINE option is available in Oracle and Oracle RAC 18c and earlier.
    Enable BLOB and CLOB Column Processing When the origin buffers changes locally, enables processing data in Blob and Clob columns.
    Tip: Be sure to specify a size limit or ensure that Data Collector has enough memory to process large Blob and Clob values before enabling Blob and Clob processing.
    Maximum LOB Size Maximum size for each Blob or Clob field. Allows up to the specified number of bytes for Blob fields and characters for Clob fields.

    Data over the specified size is replaced by a null value.

    Scan Unsupported Operations Returns an Unsupported error message when LogMiner encounters an unsupported operation. When this property is disabled, unsupported operations are ignored without raising an error.

    Default is disabled.

    Maximum Pool Size Maximum number of connections to create.

    Default is 1. The recommended value is 1.

    Minimum Idle Connections Minimum number of connections to create and maintain. To define a fixed connection pool, set to the same value as Maximum Pool Size.

    Default is 1.

    Connection Timeout (seconds) Maximum time to wait for a connection. Use a time constant in an expression to define the time increment.
    Default is 30 seconds, defined as follows:
    ${30 * SECONDS}
    Idle Timeout (seconds) Maximum time to allow a connection to idle. Use a time constant in an expression to define the time increment.

    Use 0 to avoid removing any idle connections.

    When the entered value is close to or more than the maximum lifetime for a connection, Data Collector ignores the idle timeout.

    Default is 10 minutes, defined as follows:
    ${10 * MINUTES}
    Max Connection Lifetime (seconds) Maximum lifetime for a connection. Use a time constant in an expression to define the time increment.

    Use 0 to set no maximum lifetime.

    When a maximum lifetime is set, the minimum valid value is 30 minutes.

    Default is 30 minutes, defined as follows:
    ${30 * MINUTES}
    Enforce Read-only Connection Creates read-only connections to avoid any type of write.

    Default is enabled. Disabling this property is not recommended.

    Transaction Isolation Transaction isolation level used to connect to the database.

    Default is the default transaction isolation level set for the database. You can override the database default by setting the level to any of the following:

    • Read committed
    • Read uncommitted
    • Repeatable read
    • Serializable