Oracle CDC

The Oracle CDC origin processes change data capture (CDC) information stored in redo logs and accessed using Oracle LogMiner. For information about supported versions, see Supported Systems and Versions in the Data Collector documentation.

The Oracle CDC origin provides the same functionality as the original Oracle CDC Client origin in a more efficient package. It also provides functionality not available in the Oracle CDC Client origin. For more information, see Comparing Oracle CDC Origins.

The Oracle CDC origin can process changes for primary and standby databases. It processes data based on the system change number, in ascending order, and buffers changes in memory. The origin captures and adjusts to schema changes.

The origin includes CDC and CRUD information in record header attributes so generated records can be easily processed by CRUD-enabled destinations. For an overview of Data Collector changed data processing and a list of CRUD-enabled destinations, see Processing Changed Data.

You might use this origin to track the data that has changed in your database in order to run tasks using the changed data. With this origin, you can implement data integration processes based on the identification, capture, and delivery of the changes made to the database.

When you configure the origin, you specify change data capture details, such as the schemas and tables to read from, how to read the initial change, and the operations to include. You specify an initial change or time to start processing, transaction details, and uncommitted transaction handling.

Note: Oracle's dictionary tracking does not support virtual or hidden (visible/invisible) columns. As a result, Data Collector does not process virtual columns, and hidden columns are only taken into account when initially loaded from the Oracle dictionary catalog.

You specify the record format to generate, basic or rich. And you can optionally include a wide range of information in the record or record header, such as the redo statement, null columns, and old and new values for the record. You can define resilience properties, such as how many retries to perform for a task and how long to wait between retries.

You can specify JDBC connection information and user credentials. You can also use a connection to configure the origin. If the schema was created in a pluggable database, you state the pluggable database name. You can also define custom properties that the driver requires or supports.

The origin can generate events for an event stream. For more information about dataflow triggers and the event framework, see Dataflow Triggers Overview.

Before using the Oracle CDC origin, you must complete the required prerequisite tasks.
Note: This origin is designed to process large volumes of data with high throughput and minimal latency. To achieve this, the origin consumes more memory and CPU than most other stages. When needed, you can configure tuning properties to reduce the performance and resource usage.

Comparing Oracle CDC Origins

Data Collector provides two origins that use LogMiner to process Oracle change data. The Oracle CDC Client origin was the first origin. This new Oracle CDC origin includes key functionality available in the older origin, as well as improvements and additional new features.

If you are new to Data Collector or building a new pipeline that processes Oracle change data, use the newer Oracle CDC origin for a wider range of functionality.

If you already have pipelines that use the older Oracle CDC Client origin, you can continue using that origin. However, you might consider updating the pipeline to use the newer Oracle CDC origin for access to additional functionality.

The following lists summarize some similarities and differences between the two origins:
Similarities
Key differences
  • The newer Oracle CDC origin can process changes from primary and standby databases. The older Oracle CDC Client origin only supports primary databases.
  • The newer Oracle CDC origin receives raw values from LogMiner that it then efficiently converts to the corresponding data values. As a result, the newer origin does not support processing encrypted data. When Transparent Data Encryption is enabled for columns or tables in the Oracle database, that data remains encrypted when processed by the newer origin.

    The older Oracle CDC Client origin can process encrypted data because it receives the decrypted values from Oracle.

  • The newer origin provides two record formats for processed data.
  • The following options and properties from the older origin are not included in the Oracle CDC origin because the newer origin uses the most efficient option, rather than offering a choice between optimal and sub-optimal options:
    • Parser type - The older origin uses the ALL(*) parser by default because it was the only parser available when the origin was created. When the more performant PEG parser became available, it was added to the older origin as an advanced option.

      The newer origin uses the more efficient PEG parser.

    • Buffer type and location - The older origin provides several buffering options.

      The newer origin is more efficient than the older origin, so it always buffers data locally, in memory. However, you can configure the maximum buffer size and a buffer wait strategy.

    • Dictionary source - The older origin lets Oracle retrieve dictionary definitions from the online catalog or from redo logs, both of which can be problematic.

      The newer origin loads dictionary definitions from the Oracle catalog. While the pipeline runs, it tracks dictionary changes from redo logs. To support pipeline stops and starts, the origin stores the dictionary state in the offset so there is no dependency on dictionaries extracted to redo logs or on LogMiner transforming the original redo log data. This allows for consistent matching from column values to column data types across time.

    • Schema and table specification - When specifying the schemas and tables to monitor, the older origin supports SQL LIKE syntax for inclusion patterns and Java regular expression syntax for exclusion patterns.

      The newer origin uses REGEXP_LIKE syntax for all schema and table patterns to provide a richer pattern specification model. This also avoids problems with commonly-used characters in schema and table names, such as the underscore ( _ ), that are also wildcards.

    • Truncate operations - For Truncate operations, the older origin only generated events.

      The newer Oracle CDC origin also generates events for Truncate operations. However, when configured to use the rich record format, the origin can create records for Truncate operations as well. This enables you to understand when the truncation occurred in relationship to processed data.

Updating Oracle CDC Client Pipelines

The newer Oracle CDC origin provides the same functionality as the older Oracle CDC Client origin, with additional enhancements. As a result, you can update existing pipelines to use the new origin if you like. The older origin remains available, so updating existing pipelines is not required.

Note: To process data from columns or tables with Transparent Data Encryption enabled, you must use the older Oracle CDC Client origin. The newer origin does not support processing encrypted data.

As you configure the replacement origin, you might reconsider how you want to process Oracle change data. However, you can configure the origin to generate the same records, so you don't have to update the pipeline when you replace the old origin.

Below are some configuration issues to consider when updating existing Oracle CDC Client pipelines to use the newer origin:

  • The newer Oracle CDC origin supports processing changes from primary or standby databases.

    The older origin only supports primary databases.

    To configure the Oracle CDC origin to use a primary database, on the Oracle tab, set the Database Role property to Primary.

  • The newer Oracle CDC origin can generate two different record formats: basic or rich.

    To generate records like those generated by the Oracle CDC Client origin, on the Data tab, set the Record Format property to Basic. Then, keep the default values for the custom record options on the same tab.

  • The newer Oracle CDC origin supports REGEXP_LIKE syntax for specifying schemas and tables, while the older origin supports SQL LIKE syntax for inclusion patterns and Java regular expression syntax for exclusion patterns.
    Update the schema and table patterns from the older origin to use the appropriate syntax. The following tips might help:
    • Inclusion patterns - If you used SQL LIKE syntax to define inclusion patterns, replace all occurrences of the percent character ( % ) with a dot and asterisk ( .* ).

      Also, replace all underscores ( _ ) that represent a single character with a dot ( . ).

    • Exclusion patterns - If you used Java regular expressions syntax, you don’t need to make any changes.
    • Java regular expressions - You can use Java regular expressions for both patterns. However, you might anchor them using a carrot character ( ^ ) at the start of the expression, and a dollar sign ( $ ) at the end of the expression.
  • By default, the newer Oracle CDC origin logs details about LogMiner sessions in the Data Collector log, just like the older origin. However, the newer origin can also generate events when new LogMiner sessions are created.

    If you monitor Data Collector log entries to determine when the origin creates new LogMiner sessions, then you might configure the Oracle CDC origin to produce events instead. Data Collector logs are not a reliable source for automated processes.

    To produce events for new LogMiner sessions, on the General tab, select Produce Events. Then, on the Oracle tab, enable the Track LogMiner Sessions property.

Prerequisites

Before using the Oracle CDC origin, complete the prerequisite tasks.
Note: This documentation includes commands that might help you perform the required tasks. However, the commands that you need might differ depending on your Oracle environment and configuration. For definitive steps and commands, see the Oracle documentation and contact your database administrator (DBA).
When using a primary database, complete the following tasks on the primary database. When using a standby database, complete the tasks on the associated primary database, then ensure that those configurations are replicated to the standby database:
  1. Configure the database archiving mode.
  2. Enable supplemental logging for the database or tables.
  3. Create a user account with the required roles and privileges.
  4. Install required libraries on Data Collector.
This documentation uses the following terms, which comply with the Oracle documentation at this time:
  • CDB or multitenant database - A multitenant container database (CDB) that includes one or more pluggable databases (PDB).
  • PDB or pluggable database - A database that is part of a CDB.
  • Non-CDB database - A standalone database created on earlier versions of Oracle.

Task 1. Configure the Database Archiving Mode

LogMiner provides access to database changes stored in redo log files that summarize database activity. The origin uses this information to generate records. LogMiner requires that the database be open and in ARCHIVELOG mode with archiving enabled.

Note: Enabling ARCHIVELOG mode also requires defining policies to handle archived redo logs. This step describes the minimum requirements. See your DBA for recommendations for production-ready configuration.

When reading from a primary database, the database must also be writable. When reading from a standby database, you might archive logs at a higher frequency if you want to approximate real-time processing. For more information, see Primary and Standby Databases.

You configure the database archiving mode differently based on whether you use an Oracle Standard, Oracle RAC, or Oracle Exadata database:

Oracle Standard databases
Use the following steps to determine the status of the database and to configure the database archiving mode:
  1. In a SQL shell, log into the database as a user with DBA privileges.
  2. Check the database logging mode.

    For example:

    select log_mode from v$database;

    If the command returns ARCHIVELOG, you can go to "Task 2. Enable Supplemental Logging".

    If the command returns NOARCHIVELOG, continue with the following steps.

  3. Shut down the database.

    For example:

    shutdown immediate;
  4. Start up and mount the database.

    For example:

    startup mount;
  5. Enable archiving, open the database, and make it writable.

    For example:

    alter database archivelog;
    alter database open read write;
Oracle RAC and Exadata databases
Use the following steps to configure the database archiving mode and verify the status of the database:
  1. In a Terminal session on one of the database nodes, stop and mount the database.

    For example:

    srvctl stop database -d <database name>
    srvctl start database -d <database name> -o mount
  2. In a SQL shell, log into the database as a user with DBA privileges and configure the database.

    For example:

    alter database archivelog;
  3. In the Terminal session, restart the database.

    For example:

    srvctl stop database -d <database name>
    srvctl start database -d <database name>
  4. In the SQL shell, check if the database is in logging mode.

    For example:

    select log_mode from v$database;

Task 2. Enable Supplemental Logging

To retrieve data from redo logs, LogMiner requires supplemental logging at a database level, table level, or a combination of both.

The logging that you enable depends on the data that you want to process and how you plan to use the data:
Minimal supplemental logging
You must enable minimal supplemental logging to use the Oracle CDC origin. Enable it in your CDB for a multitenant database or in your non-CDB database.
Identification key logging
Enable identification key logging based on your needs:
  • Identification key logging is optional depending on the data that you want to process and how you plan to use the data.
  • Primary key identification key logging is required to enable proper primary key handling and change tracking.

    Depending on the structure specification of each table, primary key logging provides primary key columns, first unique index columns, or all columns to the origin for record generation.

  • Full supplemental identification key logging is recommended to provide data from all columns, those with unchanged data as well as the primary key and changed columns.

    Enable full supplemental logging to ensure that the origin can generate records that include all fields, regardless of whether they have been changed.

For details on the data included in records based on the supplemental logging type, see Parse SQL Statements, Supported Operations, and Record Attributes.

  1. Determine the logging that you want to enable.
  2. Before enabling supplemental logging, check if it is enabled for the database.

    For example, you might run the following command on your CDB, then the pluggable database:

    select supplemental_log_data_min, supplemental_log_data_pk, supplemental_log_data_all from v$database;

    If the command returns Yes or Implicit for all three columns, then minimal supplemental logging is enabled, as well as identification key and full supplemental logging. You can skip to "Task 3. Create a User Account."

    If the command returns Yes or Implicit for the first two columns, then supplemental and identification key logging are both enabled. If this is what you want, you can skip to "Task 3. Create a User Account."

  3. Enable minimal supplemental logging.
    For example, you can run the following command from your CDB or from your non-CDB database:
    alter database add supplemental log data;
  4. Optionally, enable identification key logging, if needed.

    When enabling identification key logging for a database, do so for the database that contains the schemas to be monitored.

    To enable identification key logging for a pluggable database, you must also enable it for the CDB.

    When enabling identification key logging for individual tables, you don’t need to enable it for the CDB. According to Oracle, best practice is to enable logging for individual tables, rather than the entire CDB. For more information, see the Oracle LogMiner documentation.

    For example, you can run the following command to apply the changes to just the container:
    alter session set container=<pdb>;
    Enable primary key or full supplemental identification key logging to retrieve data from redo logs. You do not need to enable both:
    To enable primary key logging
    You can enable primary key identification key logging for individual tables or all tables in the database:
    • For individual tables

      For example, you might run the following commands to enable minimal supplemental logging for a pluggable database, and then enable primary key logging for each table that you want to use:

      alter table <schema name>.<table name> add supplemental log data (primary key) columns;
    • For all tables

      For example, you might run the following commands to enable primary key logging for an entire pluggable database:

      alter database add supplemental log data (primary key) columns;
      alter database add supplemental log data (primary key);
    To enable full supplemental logging
    You can enable full supplemental identification key logging for individual tables or all tables in the database:
    • For individual tables

      For example, you might run the following commands to enable minimal supplemental logging for a pluggable database, and then enable full supplemental logging for each table that you want to use:
      alter table <schema name>.<table name> add supplemental log data (all) columns;
    • For all tables

      For example, you might run the following command to enable full supplemental logging for an entire pluggable database:
      alter database add supplemental log data (primary key) columns;
      alter database add supplemental log data (all) columns;
  5. Submit the changes.

    For example:

    alter system archive log current;

Task 3. Create a User Account

Create a user account to use with the origin. You need the account to access the database through JDBC.

You create accounts differently based on the Oracle version that you use:
12c, 18c, 19c, or 21c CDB databases
For Oracle, Oracle RAC, and Oracle Exadata 12c, 18c, 19c, and 21c CDB databases, create a common user account. Common user accounts are created in cdb$root and must use the convention: c##<name>.
  1. In a SQL shell, log into the database as a user with DBA privileges.
  2. Create the common user account.

    For example:

    alter session set container=cdb$root;
    
    create user <user name> identified by <password>;
    grant create session, alter session, set container, logmining, select any dictionary, select_catalog_role, execute_catalog_role to <user name> container=all;
    
    grant select on database_properties to <user name>;
    grant select on all_objects to <user name>;
    grant select on all_tables to <user name>;
    grant select on all_tab_columns to <user name>;
    grant select on all_tab_cols to <user name>;
    grant select on all_constraints to <user name>;
    grant select on all_cons_columns to <user name>;
    
    grant select on v_$database to <user name>;
    grant select on gv_$database to <user name>;
    grant select on v_$instance to <user name>;
    grant select on gv_$instance to <user name>;
    grant select on v_$database_incarnation to <user name>;
    grant select on gv_$database_incarnation to <user name>;
    
    grant select on v_$log to <user name>;
    grant select on gv_$log to <user name>;
    grant select on v_$logfile to <user name>;
    grant select on gv_$logfile to <user name>;
    grant select on v_$archived_log to <user name>;
    grant select on gv_$archived_log to <user name>;
    grant select on v_$standby_log to <user name>;
    grant select on gv_$standby_log to <user name>;
    
    grant select on v_$logmnr_parameters to <user name>;
    grant select on gv_$logmnr_parameters to <user name>;
    grant select on v_$logmnr_logs to <user name>;
    grant select on gv_$logmnr_logs to <user name>;
    grant select on v_$logmnr_contents to <user name>;
    grant select on gv_$logmnr_contents to <user name>;
    grant select on v_$containers to <user name>;
    
    alter session set container=<pdb>;
    
    grant select on v_$containers to <user name>;
    
    grant select <db>.<table> to <user name>; 

    Repeat the final command for each table that you want to monitor.

When you configure the origin, use this user account for the JDBC credentials. Use the entire user name, including the c##, as the JDBC user name.

12c, 18c, 19c, or 21c non-CDB databases
For Oracle, Oracle RAC, and Oracle Exadata 12c, 18c, 19c, and 21c non-CDB databases, create a user account with the necessary privileges:
  1. In a SQL shell, log into the database as a user with DBA privileges.
  2. Create the user account.

    For example:

    create user <user name> identified by <password>;
    grant create session, alter session, set container, logmining, select any dictionary, select_catalog_role, execute_catalog_role to <user name>;
    
    grant select on database_properties to <user name>;
    grant select on all_objects to <user name>;
    grant select on all_tables to <user name>;
    grant select on all_tab_columns to <user name>;
    grant select on all_tab_cols to <user name>;
    grant select on all_constraints to <user name>;
    grant select on all_cons_columns to <user name>;
    
    grant select on v_$database to <user name>;
    grant select on gv_$database to <user name>;
    grant select on v_$instance to <user name>;
    grant select on gv_$instance to <user name>;
    grant select on v_$database_incarnation to <user name>;
    grant select on gv_$database_incarnation to <user name>;
    
    grant select on v_$log to <user name>;
    grant select on gv_$log to <user name>;
    grant select on v_$logfile to <user name>;
    grant select on gv_$logfile to <user name>;
    grant select on v_$archived_log to <user name>;
    grant select on gv_$archived_log to <user name>;
    grant select on v_$standby_log to <user name>;
    grant select on gv_$standby_log to <user name>;
    
    grant select on v_$logmnr_parameters to <user name>;
    grant select on gv_$logmnr_parameters to <user name>;
    grant select on v_$logmnr_logs to <user name>;
    grant select on gv_$logmnr_logs to <user name>;
    grant select on v_$logmnr_contents to <user name>;
    grant select on gv_$logmnr_contents to <user name>;
    grant select on v_$containers to <user name>;
    
    grant select <db>.<table> to <user name>;

    Repeat the final command for each table that you want to monitor.

When you configure the origin, use this user account for the JDBC credentials.

Task 4. Install Required Libraries on Data Collector

The Oracle CDC origin connects to Oracle through JDBC. You cannot access the database until you install the required Oracle JDBC libraries on Data Collector.

Install the following Oracle libraries:
  • ojdbc8.jar - Oracle JDBC driver. The latest driver version is recommended, though earlier versions supported by your database are supported. When Data Collector runs on Java 11 or higher, you can alternatively use ojdbc11.jar.

    The drivers are also available in ojdbc8-full.tar.gz and ojdbc11-full.tar.gz, respectively. Extract the library before installation.

    These drivers and archives are available on the Oracle driver downloads page.

  • orai18n.jar - Oracle globalization library. Use the latest version of this library to ensure accurate processing of international character sets.

    This library is available in the ojdbc8-full.tar.gz archive. When Data Collector runs on Java 11 or higher, you can alternatively use the ojdbc11-full.tar.gz archive.

    The archives are available on the Oracle driver downloads page. Extract the library before installation.

  • orai18n-mapping.jar - Oracle globalization library. Use the latest version of this library to ensure accurate processing of international character sets.

    This library is available within a JDBC Supplement Package (ZIP) through the Oracle Instant Client downloads page. Extract the library before installation.

Install the libraries as external libraries into the JDBC Oracle stage library, streamsets-datacollector-jdbc-oracle-lib, which includes the origin. For more information, see Install External Libraries in the Data Collector documentation.

Primary and Standby Databases

The Oracle CDC origin can process changes from the following types of databases:
Primary database

You can use the Oracle CDC origin to process changes from a primary database. When processing changes from a primary database, LogMiner can process changes directly from online redo logs or from archived redo logs.

The Oracle CDC origin processes changes from both online and archived redo logs as they arrive, providing near real-time processing.
Perform all prerequisite tasks for the primary database as described in Prerequisites.
Standby database
You can use the Oracle CDC origin to process changes from a standby database when archived redo logs are regularly replicated. Standby databases are replica databases managed by Oracle Data Guard on physical or logical databases, or other available Oracle tools. When processing changes from a standby database, LogMiner can only process changes from archived redo logs.
As a result, the Oracle CDC origin processes change data as replica redo logs are archived. The resulting processing occurs in chunks as new archives become available, with breaks while waiting for additional archives to be created.
When desired, you can approximate near real-time processing by configuring Oracle to archive redo logs frequently. You can accomplish this by configuring a small online redo log size, scheduling periodic redo log switches, or by any other means available to the DBA. Oracle recommends no more than 5 redo log rotations an hour.
Perform all prerequisite tasks on the associated primary database, then ensure that those configurations are replicated to the standby database.

Table Filters

When you configure the Oracle CDC origin, you specify table filters that define the schemas and tables with the change capture data that you want to process.

When you configure a table filter, you can specify a schema inclusion and exclusion pattern, and a table inclusion and exclusion pattern. This defines a group of schemas and tables to read from. You can define one or more table filters. When you specify several table filters, a table is tracked if it fulfills the conditions of at least one filter.
Important: When specifying patterns for a CDB, note that Oracle cannot distinguish tables with the same name in different pluggable databases if they reside in schemas of the same name.
When defining the schema and table name patterns, you can use Oracle REGEXP_LIKE syntax to define a set of tables within a schema or across multiple schemas. For more information about REGEXP_LIKE syntax, see the Oracle documentation.
For example, say you want to process change data capture data for all tables in the sales schema that start with SALES while excluding those that end with a dash (-) and single-character suffix. You can use the following filter configuration to specify the tables to process:
  • Schema: sales
  • Table Name Pattern: ^SALES.*$
  • Exclusion Pattern: ^SALES.*-.$

Start Mode

You indicate where the Oracle CDC origin should start processing by configuring the Start Mode property and related settings. You can configure the origin to start at the current change or instant in time so that it processes all available changes. Or, you can specify a specific change or time to start processing.

The origin only uses the Start Mode and any specified initial change or time when you start the pipeline for the first time, or after you reset the origin. At other times, the origin uses its last saved offset to ensure that it processes all appropriate data.

Record Formats

You can configure the Oracle CDC origin to generate records with different formats. Both record formats can be customized to include additional information in record fields or the record header. The origin can generate the following record formats:
Basic
Includes root-level record fields that represent the changes defined in the Oracle redo log, and record header attributes for additional metadata.
By default, basic records are the same as those generated by the older Oracle CDC Client origin and other origins, though they can be customized to some degree.
As a result, database destinations, such as the JDBC Producer and Snowflake destinations, can process basic records without any changes.
In most cases, you should use the basic record format to process changes from redo logs. When replacing the Oracle CDC Client origin in existing pipelines, use the basic record format.
Rich
Has a record structure that allows the inclusion of the old values for changed data in the record, in addition to the new values. Also includes more metadata than the basic record format.
Unlike the basic records, rich records do not include changes from the Oracle redo log in root-level record fields. Instead, the rich records include change data and related metadata in a List-Map field named newValues. Rich records can also contain a List-Map field named oldValues that includes the old values and related metadata for all changed or deleted fields.
As a result, database destinations, such as the JDBC Producer and Snowflake destinations, cannot process records that use the rich format at this time. When using a database destination to write Oracle CDC data, use the basic record format.
You can configure custom record options to include a wide range of additional information in record headers or fields for rich records.
Use the rich record format when your pipeline logic requires both old and new values for changed data.

Custom Record Options

The Oracle CDC origin provides a wide range of properties that allow you to customize the data and metadata included in generated records. For example, you can include redo and undo statements, primary key and table definitions, and the precision and scale for numeric fields in record headers, in the record body, or omit the information from the record entirely.

When the origin parses redo statements and generates rich records, you can include information such as new and old field values, hexadecimal values, or raw values in record fields or omit the information from the record.

Tip: The data that the origin reads from LogMiner is in a hexadecimal format, which the origin then processes. If the data types do not convert as expected, you can use the hexadecimal or raw data to troubleshoot the conversion.

Define record customization on the Data tab of the origin. The default values for the properties generate a record similar to the Oracle CDC Client origin.

Parse SQL Statements, Supported Operations, and Record Attributes

The Oracle CDC origin generates records for the operation types that you specify. However, the origin reads all operation types from redo logs to ensure proper transaction handling.

Generated records contain fields in the same order as the corresponding columns in the database tables. The information included in generated records differs depending on whether the origin is configured to parse redo log SQL statements:
Parsing SQL statements
When the origin parses redo SQL statements, you can configure it to create records for the following operations:
  • INSERT
  • DELETE
  • UPDATE
When the origin generates rich records, the origin can create records or events for Truncate operations based on the Table Truncate Handling property on the Oracle tab. When the origin generates basic records, the origin can create events for Truncate operations.
The origin generates records based on the operation type, the logging level and type enabled for the database and tables, and the operations specified in the SQL Operations property. For example, if you specify Insert operations, the origin generates only Insert records, ignoring all Update and Delete operations.
The following table describes how the origin generates record data:
Operation Identification/Primary Key Logging Only Full Supplemental Logging
INSERT All fields that contain data, ignoring fields with null values. All fields.
UPDATE Primary key fields and fields with updated values. All fields.
DELETE Primary key fields. All fields.
TRUNCATE N/A N/A
When parsing redo SQL statements, the origin also includes CDC and CRUD information in record header attributes by default. It also generates default field attributes and can be configured to include additional field attributes, as needed.
To parse SQL statements, on the Data tab, select the Parse Statements property. This property is enabled by default.
Then on the Oracle tab, configure the SQL Operations property to specify the operations that you want to process.
Not parsing SQL statements
When the origin does not parse redo SQL statements, it writes each SQL statement to a field. When the origin generates basic records, it writes SQL statements to a oracle.cdc.query field. When generating rich records, it writes SQL statements to a jdbc.cdc.oracle.redoStatement field. The origin also generates field attributes that provide additional information about this field.
When not parsing SQL statements, you can configure the origin to create records for the following operations:
  • INSERT
  • DELETE
  • UPDATE
When the origin generates rich records, the origin can create records or events for Truncate operations based on the Table Truncate Handling property on the Oracle tab. When the origin generates basic records, the origin can create events for Truncate operations.
To write only LogMiner SQL statements to generated records without parsing them, on the Data tab, clear the Parse Statements property.
Then on the Oracle tab, configure the SQL Operations property to specify the operations that you want to process.

CRUD Operation Header Attributes

The Oracle CDC origin specifies the operation type for generated records in the following record header attributes:
sdc.operation.type
The origin evaluates the operation type associated with each entry that it processes. When appropriate, it writes the operation type to the sdc.operation.type record header attribute.
The origin uses the following values in the attribute to represent the operation type:
  • 1 for INSERT
  • 2 for DELETE
  • 3 for UPDATE
  • 512 for TRUNCATE when using the rich record format

If you use a CRUD-enabled destination in the pipeline such as JDBC Producer or Elasticsearch, the destination can use the operation type when writing to destination systems. When necessary, you can use an Expression Evaluator processor or any scripting processor to manipulate the value in the header attribute. For an overview of Data Collector changed data processing and a list of CRUD-enabled destinations, see Processing Changed Data.

When using CRUD-enabled destinations, the destination looks for the operation type in this attribute before checking the following attribute.
oracle.cdc.operation
The origin also writes the CRUD operation type to an oracle.cdc.operation attribute. This attribute was implemented in an earlier release and is supported for backward compatibility.
The origin uses the following values in the attribute to represent the operation type:
CRUD-enabled destinations check this attribute for the operation type only if the sdc.operation.type attribute is not set.

CDC Header Attributes

By default, the Oracle CDC origin provides the following CDC record header attributes for each record. The attribute names differ based on the record format the origin uses:
Basic Record CDC Attribute Rich Record CDC Attribute Description
oracle.cdc.sequence.oracle jdbc.cdc.oracle.ordinalOracle Includes sequence numbers that indicate the order in which statements were processed within the transaction. The sequence number is generated by Oracle. Use to keep statements in order within a single transaction.

Not used at this time.

oracle.cdc.sequence.internal jdbc.cdc.oracle.ordinalCollector Includes sequence numbers equivalent to those in the attribute above, but created by the origin. The sequence starts with 0 each time the pipeline starts.

You can use these values to keep statements in order within a single transaction.

Other Header Attributes

When parsing SQL statements, the Oracle CDC origin can also provide record header attributes for the following information:
Primary keys
When a table contains a primary key, the origin includes the following record header attribute:
  • jdbc.primaryKeySpecification - Provides a JSON-formatted string that lists the columns that form the primary key in the table and the metadata for those columns.
    For example, a table with a composite primary key contains the following attribute:
    jdbc.primaryKeySpecification = 
         {{"<primary key column 1 name>":
             {"type": <type>, 
              "datatype": "<data type>", 
              "size": <size>, 
              "precision": <precision>, 
              "scale": <scale>, 
              "signed": <Boolean>,  
              "currency": <Boolean> }},
              ...,
         {"<primary key column N name>":
             {"type": <type>, 
              "datatype": "<data type>", 
              "size": <size>, 
              "precision": <precision>, 
              "scale": <scale>, 
              "signed": <Boolean>,  
              "currency": <Boolean> } } }
    A table without a primary key contains the attribute with an empty value:
    jdbc.primaryKeySpecification = {} 
For an update operation on a table with a primary key, the origin includes the following record header attributes:
  • jdbc.primaryKey.before.<primary key column name> - Provides the old value for the specified primary key column.
  • jdbc.primaryKey.after.<primary key column name> - Provides the new value for the specified primary key column.
Note: The origin provides the new and old values of the primary key columns regardless of whether the value changes.
Decimal fields
When a record includes decimal fields, the origin includes the following record header attributes for each decimal field in the record:
  • jdbc.<fieldname>.precision
  • jdbc.<fieldname>.scale

You can use the record:attribute or record:attributeOrDefault functions to access the information in the attributes. For more information about working with record header attributes, see Working with Header Attributes.

Default Field Attributes

The Oracle CDC origin generates field attributes for columns converted to the Decimal or Datetime data types in Data Collector. The attributes provide additional information about each field.

The following data type conversions do not include all information in the corresponding Data Collector type:
  • The Oracle Number data type is converted to the Data Collector Decimal data type, which does not store scale and precision.
  • The Oracle Timestamp data type is converted to the Data Collector Datetime data type, which does not store nanoseconds.
To preserve this information during data type conversion, the origin generates the following field attributes for these Data Collector data types:
Data Collector Data Type Generated Field Attribute Description
Decimal precision Provides the original precision for every number column.
Decimal scale Provides the original scale for every number column.
Datetime nanoSeconds Provides the original nanoseconds for every timestamp column.

You can use the record:fieldAttribute or record:fieldAttributeOrDefault functions to access the information in the attributes. For more information about working with field attributes, see Field Attributes.

Annotation Field Attributes

You can configure the Oracle CDC origin to generate field attributes that provide additional information about each field in the record. These attributes can help you determine why a field contains a null value.

The origin can generate the following attributes for each field:
Annotation Attribute Name Description
supported Indicates if the original column type of the field is supported by the origin. Possible values include:
  • yes - The original column type was converted to a Data Collector type as expected.
  • no - The original column type is not supported by the origin, and the origin is configured to include unsupported types in records.
presence Indicates if the corresponding column for the field was included in the redo SQL statement.

Possible values include:

  • missing - The corresponding column was not in the SQL statement. Can appear when a field contains a null value.
  • void - The corresponding column was in the SQL statement, but had no value. Can appear when a field contains a null value.
  • empty - The corresponding column was in the SQL statement, but the value was empty. Can appear when a field contains a null value.
  • present - The corresponding column was in the SQL statement with a value. Appears when a field contains a non-null value.
To generate these field attributes, on the Data tab, enable the Annotate Column Availability property.

Multithreaded Parsing

By default, the Oracle CDC origin uses multiple threads to parse SQL statements. When performing multithreaded parsing, the origin uses multiple threads to generate records from committed SQL statements in a transaction. It does not perform multithreaded processing of the resulting records.

The Oracle CDC origin uses multiple threads for parsing based on the SQL Parser Threads property. When you start the pipeline, the origin creates the number of threads specified in the property. The origin connects to Oracle, creates a LogMiner session, and processes all transactions that contain changes for monitored tables.

When the origin processes a transaction, it reads and buffers all SQL statements in the transaction to an in-memory queue and waits for statements to be committed before processing them. Once committed, the SQL statements are parsed using all available threads and the original order of the SQL statements is retained.

The resulting records are passed to the rest of the pipeline. Note that enabling multithreaded parsing does not enable multithreaded processing – the pipeline uses a single thread for reading data.

Note: When configuring the SQL Parser Threads property, consider the number of cores on the Data Collector machine. A good initial value might be four times the number of cores. Set to 0 to use as many threads as available processing cores.

Query Fetch Size

The Oracle CDC origin provides a Query Fetch Size property that determines the maximum number of rows that the origin fetches at one time. When processing a large volume of data, using larger fetch sizes will improve performance by reducing the number of database hits.

As a best practice, set the query fetch size to as large a value as your execution environment allows. For example, you should set this property to at least 10,000 rows, though when possible, 25,000 rows or higher might be optimal.

When setting high fetch sizes, make sure that Data Collector has sufficient resources to handle the load. For information about configuring the Data Collector heap size, see Java Heap Size in the Data Collector documentation.

Uncommitted Transaction Handling

You can configure how the Oracle CDC origin handles long-running, stale, or abandoned uncommitted transactions. These are transactions with some operations on monitored tables but whose commit or rollback takes longer than expected to occur.

The following properties define how the origin handles uncommitted transactions:
  • Transaction Maximum Duration - The maximum time one transaction is expected to last, from transaction start to commit. If a commit or rollback is not received within the specified duration, the transaction is marked for eviction.

    A moderately large setting for this property is recommended to allow for long-running batch processes.

  • Operations Maximum Gap - The maximum amount of time that can pass between two consecutive operations within the same transaction. If the specified time passes after the last operation without the arrival of a new operation, control, or rollback, the transaction is marked for eviction.

    A relatively small value for this property is recommended to ensure that zombie transactions do not consume memory.

Evicted Transaction Handling

You can configure how the Oracle CDC origin handles evicted transactions and the operations that arrive for a transaction after it is evicted. An evicted transaction is an uncommitted transaction that has expired based on the configuration of the Transaction Maximum Duration or Operations Maximum Gap properties.

The following properties define how the origin handles evicted transactions and related operations:
  • Transaction Maximum Retention - The maximum time that the origin keeps an internal reference to the evicted transaction. After the time elapses, the internal reference is removed.

    If the origin receives additional operations for an evicted transaction before the internal reference is removed, the origin silently discards these operations. If the origin receives additional operations for an evicted transaction after the internal reference is removed, the origin passes the operations to the pipeline when a commit is received, treating them as if they are from a new transaction.

    A moderate amount of time is recommended for this property to prevent operations from being incorrectly passed to the pipeline, without allowing unnecessary references to evicted transactions to consume memory.

  • Evicted Transactions - Defines how the origin handles operations that arrive for an evicted transaction before the internal reference is removed:
    • Send Transaction Operations to Pipeline - Sends operations to the pipeline like normal records. Use to treat the operations as if the transaction was committed.

      This option can create inconsistencies in your data. Use with caution.

    • Discard Transaction Operations - Discards operations. Use to treat the operations as if the transaction was rolled back.
    • Send Transaction Operations to Error - Sends operations to the stage for error handling. Use to treat the operations like error records.
    • Abort Pipeline - Stops the pipeline. Use to review the error and address underlying issues.

Event Generation

The Oracle CDC origin can generate events that you can use in an event stream when the pipeline starts, when a LogMiner session starts, and when changes occur to monitored tables.

Oracle CDC events can be used in any logical way. For example:

For more information about dataflow triggers and the event framework, see Dataflow Triggers Overview.

Event Records

Event records generated by the Oracle CDC origin include the following event-related record header attributes:
Record Header Attribute Description
sdc.event.type Event type.

Uses one of the following types. The names of some event types differ depending on the record format the origin uses.

  • initial-database-state - Generated when the origin first connects to Oracle.
  • logminer-session-start - Generated when the origin starts a new LogMiner session.
  • STARTUP for basic records and mined-table-loaded for rich records - Generated for each table when the origin loads the table.
  • CREATE for basic records and mined-table-created for rich records - Generated when a monitored table is newly created.
  • ALTER for basic records and mined-table-altered for rich records - Generated when a monitored table is updated.
  • DROP for basic records and mined-table-dropped for rich records - Generated when a monitored table is dropped.
  • TRUNCATE for basic records or mined-table-truncated for rich records - Generated when a monitored table is truncated.
sdc.event.version Integer that indicates the version of the event record type.
sdc.event.creation_timestamp Epoch timestamp when the stage created the event.

The Oracle CDC origin can generate the following types of event records:

initial-database-state
After the pipeline starts and the origin queries the database, the origin generates an event that describes the state of the database.
The event records have the sdc.event.type attribute set to initial-database-state and include the following fields:
Event Record Field Description
catalog Database catalog for the database connections.
root-instance Instance of the root connection.
owner-instance Instance of the owner connection.
incarnation Incarnation of the database.
system-change-number System change number of the database.
database-time-zone System time zone of the database.
session-time-zone Session time zone of the database.
system-date Oracle system sysdate.
system-timestamp Oracle system systimestamp.
current-date Oracle session current_date.
local-timestamp Oracle session localtimestamp.
current-timestamp Oracle session current_timestamp.
language Oracle session nls_language.
territory Oracle session nls_territory.
sort Oracle session nls_sort.
numeric-characters Oracle session nls_numeric_characters.
calendar Oracle session nls_calendar.
date-language Oracle session nls_date_language.
date-format Oracle session nls_date_format.
time-format Oracle session nls_time_format.
timestamp-format Oracle session nls_timestamp_format.
time-zoned-format Oracle session nls_time_tz_format.
comparison Oracle session nls_comp.
length-semantics Oracle session nls_length_semantics.
nchar-conversion-exception Oracle session nls_nchar_conv_excp.
currency Oracle session nls_currency.
iso-currency Oracle session nls_iso_currency.
dual-currency Oracle session nls_dual_currency.
nls_characterset Oracle session nls_character_set.
nls_nchar_characterset Oracle session nls_nchar_characterset.
instant Event instant.
logminer-session-start
When the origin starts a LogMiner session, the origin generates an event.
To generate this event, enable the Track LogMiner Sessions property on the Oracle tab, in addition to the Produce Events property on the General tab.
The event records have the sdc.event.type attribute set to logminer-session-start and include the following fields:
Event Record Field Description
startDate Start date of the LogMiner session.
requiredStartDate Required date to start the session if DDL tracking is enabled.
endDate End date of the session.
startChange First change processed in the session.
requiredStartChange

System change number to start the session.

endChange

System change number to end the session.

options

Options for the current LogMiner session.

info

According to Oracle, this column is always null.

status

According to Oracle, this field is always set to 0.

container ID of the container for the data.
logFiles List of log files used in the session.
table loaded, created, changed, and removed
When a monitored table has been loaded, created, changed, or removed, the origin generates an event. The event records have the sdc.event.type attribute set to the different values depending on the record format that the origin uses:
  • Table loaded - STARTUP for basic records and mined-table-loaded for rich records.
  • Table created - CREATE for basic records and mined-table-created for rich records.
  • Table changed - ALTER for basic records and mined-table-altered for rich records.
  • Table removed - DROP for basic records and mined-table-dropped for rich records.
The event records have the table, columns, and primary key in the body of the record. They also include the following additional record header attributes. The attribute names differ depending on the record format that the origin uses:
Header Attribute - Basic Header Attribute - Rich Description
oracle.cdc.action action Identified action:
  • insert - Table inserted in the local dictionary.
  • change - Table changed in the local dictionary.
  • rename - Table renamed in the local dictionary.
  • remove - Table removed from the local dictionary.
oracle.cdc.origin origin Source of the event:
  • catalog - Table loaded at first pipeline run or when reset from Oracle dictionary tables.
  • offset - Table loaded from offset of previous run.
  • log - Table loaded at first pipeline run or when reset from Oracle dictionary in redo log. Not used at this time.
  • tracking - Table loaded from tracked change in a redo log.
oracle.cdc.timestamp instant Oracle timestamp when the action occurred.
oracle.cdc.change change SCN when action happened
oracle.cdc.ddl statement Statement executed by the action.
oracle.cdc.schema schema Schema of the table.
oracle.cdc.table table Name of the table.
oracle.cdc.object object Object ID for the table.
oracle.cdc.time time Server time when the event was created.
table truncated
When a monitored table has been truncated, the origin can generate an event. To generate events, in addition to enable the origin to generate events, you must configure the Table Truncate Handling property on the Oracle tab appropriately.
The event records have the sdc.event.type attribute set to TRUNCATE for basic records or mined-table-truncated for rich records.
The event records have no fields in the body of the record, but include the following additional record header attributes. The attribute names differ depending on the record format that the origin uses:
Header Attribute - Basic Header Attribute - Rich Description
oracle.cdc.action action Identified action. Set to clear for a truncated table.
oracle.cdc.origin origin Source of the event:
  • catalog - Table loaded at first pipeline run or when reset from Oracle dictionary tables.
  • offset - Table loaded from offset of previous run.
  • log - Table loaded at first pipeline run or when reset from Oracle dictionary in redo log. Not used at this time.
  • tracking - Table loaded from tracked change in a redo log.
oracle.cdc.timestamp instant Oracle timestamp when the action occurred.
oracle.cdc.change change SCN when action happened
oracle.cdc.ddl statement Statement executed by the action.
oracle.cdc.schema schema Schema of the table.
oracle.cdc.table table Name of the table.
oracle.cdc.object object Object ID for the table.
oracle.cdc.time time Server time when the event was created.

Oracle Data Types

The Oracle CDC origin converts Oracle data types to Data Collector data types.

The following tables list the data types that the Oracle CDC origin supports and the Data Collector data types that the origin converts them to.
Supported built-in data types
The origin supports the following Oracle built-in data types:
Built-In Data Type Data Collector Data Type
Binary_Double Double
Binary_Float Float
Char String
Date Date
Float Float
NChar, NVarchar String
Number Varies depending on the precision and scale. For details, see the numeric conversion table.
Timestamp Datetime
Timestamp with Local Timezone, Timestamp with Timezone Zoned_Datetime
Varchar, Varchar2 String
Supported ANSI data types
The origin supports the following ANSI data types:
ANSI Data Type Data Collector Data Type
Char Varying, Character, Character Varying String
Decimal Decimal
Double Precision Double
Int, Integer, SmallInt Varies depending on the precision and scale. For details, see the numeric conversion table.
National Char, National Char Varying, National Character, National Character Varying String
NChar Varying String
Numeric Varies depending on the precision and scale. For details, see the numeric conversion table.
Real Varies depending on the precision and scale. For details, see the numeric conversion table.
Numeric data type conversions
Oracle numeric types are converted to Data Collector data types based on the precision and scale of the Oracle numeric type. The following table describes how the types are converted:
Precision Scale Data Collector Data Type
Not specified Not specified Decimal
Not specified 0 Long
Not specified Greater than 0 Decimal
Less than 3 0 Byte
3-4 0 Short
5-9 0 Integer
10 or greater 0 Long
Any specified value Greater than 0 Decimal

Unsupported Data Types

You can configure how the origin handles unsupported data types at the record or field level.

The Unsupported Column Data Types property defines how the origin handles records that contain unsupported data types. The property provides the following options:

  • Pass the record to the pipeline - Passes the record to the pipeline. You can configure how the origin handles the fields that contain unsupported data types using the Unsupported Columns property. You can configure the origin to remove empty fields using the Null Columns property.
  • Discard the record - Drops the record from the pipeline.
  • Send record to error - Passes records to the stage for error handling without the unsupported data type fields.
  • Abort pipeline - Stops the pipeline with an error. Use to review errors and address underlying issues.
When you configure the Unsupported Column Data property to pass records to the pipeline, the Unsupported Columns property defines how the origin handles fields that contain unsupported data types. The property provides the following options:
  • Keep Column with Raw Value - Includes the column as a record field with its original raw value.
  • Keep Column with Null Value - Includes the column as a record field with an empty value.
  • Drop Column - Omits the column from record.
The Oracle CDC origin does not support the following data types:
Unsupported built-in data types
The origin does not support the following built-in data types:
  • BFile
  • Blob
  • Clob
  • Interval Day to Second
  • Interval Year to Month
  • JSON
  • Long
  • Long Raw
  • NClob
  • Raw
Unsupported ANSI data type
The origin does not support the following ANSI data type:
  • Long Varchar

Unsupported data type categories
The origin does not support the following categories of data types:
  • Any
  • Rowid
  • Spatial
  • Supplied
  • User Defined
  • XML Data

For example, this means that the following data types are not supported:

  • DBUritype
  • HTTPUritype
  • Rowid
  • SDO_Geometry
  • SDO_Georaster
  • SDO_Topo_Geometry
  • URowid
  • XDBUritype
  • XMLType

Data Preview with Oracle CDC

When using data preview with the Oracle CDC origin, note the following details:
  • Preview generates a maximum of 10 records, ignoring the Preview Batch Size property when set to a higher value.
  • Preview allows up to 5 seconds for the origin to receive records, which starts after connecting to Oracle. As a result, when previewing a pipeline with an Oracle CDC origin, preview ignores the configured Preview Timeout property when set to a higher value.

    For other origins, the preview timeout period begins when you start preview and includes the connection time to external systems.

Working with the Drift Synchronization Solution

If you use the Oracle CDC origin as part of a Drift Synchronization Solution for Hive pipeline, make sure to pass only records flagged for Insert to the Hive Metadata processor.

The Drift Synchronization Solution for Hive enables automatic updates of Hive tables based on incoming data. The Hive Metadata processor, the first stage in the solution, expects Insert records only. Below are some recommended methods to ensure that the processor receives only Insert records:
  • Configure the Oracle CDC origin to process only Insert records.
  • If you want to process additional record types in the pipeline, use a Stream Selector processor to route only Insert records to the Hive Metadata processor.

Configuring an Oracle CDC Origin

Configure an Oracle CDC origin to use LogMiner to process change data capture information from an Oracle database. Before you use the origin, complete the prerequisite tasks.

  1. In the Properties panel, on the General tab, configure the following properties:
    General Property Description
    Name Stage name.
    Description Optional description.
    Produce Events Generates event records when events occur. Use for event handling.
    On Record Error Error record handling for the stage:
    • Discard - Discards the record.
    • Send to Error - Sends the record to the pipeline for error handling.
    • Stop Pipeline - Stops the pipeline.
  2. On the Connection tab, configure the following properties:
    Connection Property Description
    Connection Connection that defines the information required to connect to an external system.

    To connect to an external system, you can select a connection that contains the details, or you can directly enter the details in the pipeline. When you select a connection, Control Hub hides other properties so that you cannot directly enter connection details in the pipeline.

    Connection Type Type of connection to use:
    • Thin Style - Service Name
    • Thin Style - System Identifier
    • Net Connection Descriptor
    • Net Connection Alias
    • Easy Connect
    Host Host name or IP address of the database.

    Available for Thin Style connections.

    Port Port number to use.

    Available for Thin Style connections.

    Service Name Listener-registered service name.

    Available for Thin Style - Service Name connections.

    System Identifier Unique system identifier for the database.

    Available for Thin Style - System Identifier connections.

    TNS Connection Connection specification using a TNSNames pattern.

    Available for Net Connection Descriptor connections.

    TNS Alias

    Connection alias, as defined in your local tnsnames.ora file.

    The tnsnames.ora file must be available in the expected Oracle location on the Data Collector machine, typically ${ORACLE_HOME}/network/admin.

    Available for Net Connection Descriptor connections.

    Connection URL URL for the connection. Use standard JDBC formatting.

    Available for Easy Connect connections.

    Pluggable Database Name of the pluggable database that contains the schema you want to use. Use when the schema was created in a pluggable database.

    Required for schemas created in pluggable databases.

    Use Credentials Enables specifying credentials.
    Username User account to connect to the database.

    Use the user account created for the origin. Common user accounts for Oracle, Oracle RAC, and Oracle Exadata 12c, 18c, 19c, and 21c CDB databases start with c##. For more information, see Task 3. Create a User Account.

    Password Password for the user account.
    Connection Timeout Maximum milliseconds to wait for a connection. Use a time constant in an expression to define the time increment. Default is 30 seconds, defined as follows: ${30 * SECONDS * 1000}.
    Connection Properties Additional connection properties to use. To add properties, click Add and define the property name and value.

    Use the property names and values as expected by Oracle.

  3. On the Oracle tab, configure the following properties:
    Data Property Description
    Tables Filter Specify inclusion and exclusion properties as needed to define the schemas and tables to monitor. You can use REGEXP_LIKE syntax to define the patterns:
    • Schema Inclusion Pattern - Pattern describing the schema names to include in processing.

      Default is ^.*$.

    • Schema Exclusion Pattern - Pattern describing the schema names to exclude from processing.

      Default is ^$.

    • Table Exclusion Pattern - Pattern describing the table names to include in processing.

      Default is ^.*$.

    • Table Exclusion Pattern - Pattern describing the table names to exclude from processing.

      Default is ^$.

    Using simple or bulk edit mode, click the Add icon to define another table configuration.

    Start Mode Starting point for processing. When you start the pipeline for the first time, the origin starts processing from the specified initial change or time. The origin only uses the specified initial change or time again when you reset the origin.

    Use one of the following options:

    • From Current Change/Instant - Processes changes that arrive after you start the pipeline.
    • From Specified Change - Processes changes starting from the specified system change number (SCN).
    • From Specified Instant - Processes changes starting from the specified date.

      You cannot use this mode when your database honors daylight savings time.

    Initial System Change Number System change number (SCN) to start reading from when you start the pipeline. If the SCN cannot be found in the redo logs, the origin continues reading from the next higher SCN that is available in the redo logs.

    Available when Start Mode is set to From Specified Change.

    Initial Instant Datetime to read from when you start the pipeline. For a date-based initial change. Use the following format:

    YYYY-MM-DD HH24:MM:SS

    You cannot use this mode when your database honors daylight savings time.

    Available when Start Mode is set to From Specified Instant.

    Window Mode Method used to advance the LogMiner window:
    • Amount of Changes
    • Amount of Time
    Window Length (changes) Number of changes to use as the LogMiner window length.

    Available when using the Amount of Changes window mode.

    Window Length (time) Amount of time to use as the LogMiner window length, in milliseconds.

    Available when using the Amount of Time window mode.

    Transaction Maximum Duration (ms) Maximum duration to allow from the transaction start time until the transaction is evicted, in milliseconds.

    Affects how the origin handles uncommitted transactions.

    Set to -1 for no limit.

    Transaction Maximum Retention (ms) Maximum time to retain an evicted transaction, in milliseconds.

    Affects how the origin handles evicted transactions.

    Set to -1 for no limit.

    Operations Maximum Gap (ms) Maximum time to allow between consecutive operations before a transaction is evicted, in milliseconds.

    Affects how the origin handles uncommitted transactions.

    Set to -1 for no limit.

    Maximum Wait for Archive Logs (ms)

    Maximum time to wait for archive logs.

    Available when Database Role is set to Standby.

    Database Role Role of the database: primary or standby.
    SQL Operations SQL operations to generate records for.
    Table Truncate Handling Method for handling Truncate statements in redo logs:
    • As Record - Generates a record, like a regular operation.
    • As Event - Generates an event if the origin is configured to generate events.
    • As Record and Event - Generates a record. Also generates an event if the origin is configured to generate events.
    • Ignore - Ignores Truncate statements.

    When using the Basic record format, As Event is the only valid option.

    Default is As Event.

    Case Sensitive Names Submits case-sensitive schema, table, and column names. When not selected, the origin submits names in all caps.

    Oracle uses all caps for schema, table, and column names by default. Names can be lower- or mixed-case only if the schema, table, or column was created with quotation marks around the name.

    Track LogMiner Sessions Includes LogMiner session details and events in Data Collector log files.

    When the origin is configured to generate events, the origin also generates a logminer-session-start event each time a new LogMiner session starts.

  4. On the Data tab, you can configure the following options.
    The default values produce a record similar to that of the older Oracle CDC Client origin. For more information, see Comparing Oracle CDC Origins and Updating Oracle CDC Client Pipelines.
    Options Property Description
    Record Format Record format to use.
    • Basic - Includes data and metadata about processed changes.

      Provides a record similar to the Oracle CDC Client origin that can be written using CRUD-enabled destinations.

    • Rich - Includes new and old values for data and additional metadata about processed changes.

      Use when your pipeline logic requires both old and new values for changed data.

    Default is Basic.

    Evicted Transactions Behavior for evicted transactions:
    • Send Transaction Operations to Pipeline - Sends operations to the pipeline like normal records. Use to treat the operations as if the transaction was committed.

      This option can create inconsistencies in your data. Use with caution.

    • Discard Transaction Operations - Discards operations. Use to treat the operations as if the transaction was rolled back.
    • Send Transaction Operations to Error - Sends operations to the stage for error handling. Use to treat the operations like error records.
    • Abort Pipeline - Stops the pipeline. Use to review the error and address underlying issues.

    Default is Send Transaction Operations to Error.

    Available when Parse Statements is enabled.

    Parse Statements Parses redo SQL statements and places resulting values in records.

    This property is enabled by default.

    Unreadable Column Values Behavior when column values cannot be parsed:
    • Send Record to Pipeline - Includes the field names for unreadable columns in records, with no data. You can configure the origin to remove empty fields using the Null Columns property.
    • Discard Record - Drops the record from the pipeline.
    • Send Record to Error - Passes records to the stage for error handling without the unreadable columns.
    • Abort Pipeline - Stops the pipeline with an error. Use to review errors and address underlying issues.

    Default is Send Record to Error.

    Available when Parse Statements is enabled.

    Unsupported Column Data Types Behavior for columns with unsupported data types:
    • Pass the record to the pipeline - Passes the record to the pipeline. You can configure how the origin handles the fields that contain unsupported data types using the Unsupported Columns property. You can configure the origin to remove empty fields using the Null Columns property.
    • Discard the record - Drops the record from the pipeline.
    • Send record to error - Passes records to the stage for error handling without the unsupported data type fields.
    • Abort pipeline - Stops the pipeline with an error. Use to review errors and address underlying issues.

    Default is Send Record to Error.

    Available when Parse Statements is enabled.

    Missing Columns Behavior when columns are missing from the redo SQL statement:
    • Send Empty Column - Includes empty fields in records.
    • Discard Column - Omits fields from records.

    Default is Discard Column.

    Available when Parse Statements is enabled.

    Null Columns Behavior when fields include null or missing values:
    • Send Empty Column - Includes empty fields in records.
    • Discard Column - Omits fields from records.

    Default is Discard Column.

    Available when Parse Statements is enabled.

    Unsupported Columns Action to take on columns that contain unsupported data types:
    • Keep Column with Raw Value - Includes the column as a record field with its original raw value.
    • Keep Column with Null Value - Includes the column as a record field with an empty value.
    • Drop Column - Omits the column from record.

    Default is Keep Column with Raw Value.

    Available when Parse Statements is enabled.

    Only used when Unsupported Column Data Types is set to Pass the record to the pipeline.

    Annotate Column Availability Includes additional attributes about column availability and type support.

    This property is not selected by default.

    Available when Parse Statements is enabled.

    Change Control Values Location of change control value information, such as the operation type, in records:
    • In Body
    • In Header - Use to allow CRUD-enabled destinations to write records based on the operation type.
    • Omit

    Default is In Header.

    Redo Statement Location of the redo statement in records:
    • In Body
    • In Header
    • Omit

    Default is In Header.

    Undo Statement Location of the undo statement in records:
    • In Body
    • In Header
    • Omit

    Default is In Header.

    Table Definition Location of table definitions in records:
    • In Body - Valid only when Record Format is set to Rich.
    • In Header
    • Omit

    Default is Omit.

    Primary Key Definition Location of primary key definitions in records:
    • In Body
    • In Header
    • Omit

    Default is In Header.

    Numeric Columns Digit Structure Location of precision and scale for numeric fields in records:
    • In Header and Body
    • In Body
    • In Header
    • Omit

    Default is In Header and Body.

    New Values Location of new values in records:
    • In Body - Use to include changed and new values in records.
    • Omit

    Default is In Body.

    Available when Parse Statements is enabled.

    Old Values Location of old values in records:

    Default is Omit.

    Available when Parse Statements is enabled.

    New Hexadecimal Values Location of hexadecimal values of all new or changed data in records:Default is Omit.

    Available when Parse Statements is enabled.

    Old Hexadecimal Values Location of old hexadecimal values in records:Default is Omit.

    Available when Parse Statements is enabled.

    New Raw Values Location of raw values of all new or changed data in records:Default is Omit.

    Available when Parse Statements is enabled.

    Old Raw Values Location of old raw values in records:Default is Omit.

    Available when Parse Statements is enabled.

    Primary Key New Values Location of new primary key values in records:
    • In Body
    • In Header
    • Omit

    Default is In Header.

    Available when Parse Statements is enabled.
    Primary Key Old Values Location of old primary key values in records:
    • In Body - Valid only when Record Format is set to Rich.
    • In Header
    • Omit
    Default is In Header.

    Available when Parse Statements is enabled.

    Oracle's Pseudo Columns New Values Location of new values for pseudocolumns in records:
    • In Extended Body - New pseudocolumn values are included in a /newPseudocolumnValues field.

      Valid only when Record Format is set to Rich.

    • In Body - New pseudocolumn values are included in the /newValues field.

      Valid only when Record Format is set to Rich.

    • In Header
    • Omit

    Default is In Header.

    Available when Parse Statements is enabled.
    Oracle's Pseudo Columns Old Values Location of new values for pseudo columns in records:
    • In Extended Body - Old pseudocolumn values are included in a /oldPseudocolumnValues field.

      Valid only when Record Format is set to Rich.

    • In Body - Old pseudocolumn values are included in the /oldValues field.

      Valid only when Record Format is set to Rich.

    • In Header - Valid only when Record Format is set to Rich.
    • Omit

    Default is Omit.

    Available when Parse Statements is enabled.

  5. On the Resilience tab, configure the following properties, as needed:
    Resilience Property Description
    Database Reconnect Interval (ms) Milliseconds to wait between trying to reconnect to Oracle.
    Database Reconnect Tries Maximum number of times to try to connect to Oracle.
    LogMiner Session Restart Interval (ms) Milliseconds to wait between trying to create a LogMiner session.
    LogMiner Session Restart Tries Maximum number of times to try to create a LogMiner session.
    Wait Time Before Session Start (ms) Milliseconds to wait before starting a LogMiner session.
    Wait Time After Session Start (ms) Milliseconds to wait before sending a query after a LogMiner session starts.
    Wait Time After Session End (ms) Milliseconds to wait after a LogMiner session ends before starting a new LogMiner session.
    Orphan Operations Event Scan Interval (ms) Milliseconds to wait between scans when the origin detects a transaction with data that is not yet persisted to the redo log.

    Default is ${5 * SECONDS * 1000}.

    Orphan Operations Event Scan Tries Maximum number of times to rescan when the origin detects a transaction with data that is not yet persisted to the redo log.

    Default is 12.

    Minimum Lag from Database Time (sec)

    Minimum number of seconds between the time an operation occurs and the database time, before the origin processes the operation.

    Default is ${1 * SECONDS}.

  6. Optionally, on the Tuning tab, configure properties to tune the performance of the origin:
    Tuning Property Description
    Max Batch Size (records) Maximum number of records processed at one time. Honors values up to the Data Collector maximum batch size.

    Default is 1000. The Data Collector default is 1000.

    Max Batch Wait Time (ms) Number of milliseconds to wait before sending a partial or empty batch.

    Specify a value greater than 0, or set to -1 for no limit.

    Max Batch Vault Size (batches) Maximum number of batches the origin pre-generates while the pipeline is processing other batches.

    Limiting the number of pre-generated batches can reduce the amount of memory used by the origin. When the Data Collector machine has severe memory constraints, StreamSets recommends setting the value to 1.

    Default is 64.

    Archived Logs Availability Check Frequency (ms) Frequency the origin checks for archived logs, in milliseconds.

    Used for standby databases only.

    Eviction Check Frequency (ms) Frequency the origin checks for evictions in transactions, in milliseconds.
    Buffer Size Size of the internal buffer queues. This number correlates to capacity, not to memory size.

    Specify an integer that is a power of 2.

    Default is 8192.

    Buffer Wait Strategy Strategy to use when waiting for available buffers:
    • Blocking - Based on thread lock and wake-up. This is the slowest strategy, but the most conservative and consistent for the CPU.
    • Sleeping - Based on busy wait loops, using thread pause. This is a slightly faster strategy that is also CPU conservative, but with lower latency.
    • Yielding - Based on thread yield. This is a high-performing strategy aiming for low latencies. Recommended when hyper-threading is enabled.
    • Busy Spin - Based on busy spin. This is the highest performing strategy that works better with Java 11 or higher. Recommended when hyper-threading is disabled.
    LogMiner Contents Query Strategy Strategy used to query LogMiner. Use the table name or object ID.
    Note: The way the origin interacts with LogMiner, the table name strategy uses the following format instead of the actual table name: OBJ# <table object id>.
    Query Fetch Size Number of rows to fetch with each query. Specify a larger number to reduce the number of times the origin queries the database.

    Larger query fetch sizes can require more memory.

    Query Timeout (ms) Maximum milliseconds to wait for a query to complete. Set to -1 for no limit.

    Default is 10 minutes.

    SQL Parser Threads Maximum number of threads to use to parse SQL in parallel. Set to 0 for no limit.

    Default is 32.

    Summary Update Frequency (ms) Frequency to update information when you monitor the pipeline.