Oracle

The Oracle destination writes data to tables in a Oracle database. For information about supported versions, see Supported Systems and Versions in the Data Collector documentation.

You can use the Oracle destination to write new data or change data capture (CDC) data to Oracle. The destination writes data from record fields to table columns based on matching names. The destination can compensate for data drift by creating new columns and tables in Oracle when new fields and table references appear in records.

The Oracle destination uses a JDBC connection string or Oracle host and service name to connect to the Oracle database. When you configure the destination, you specify the connection string or host and service name, and credentials to use to connect to the database. You can also use a connection to configure the destination.

You configure the Oracle database and tables to use, and specify error handling.

You can optionally enable data drift. You can have the destination create new tables, as needed. You can also specify whether to create all new columns as String instead of inferring the type.

Before you can use the Oracle destination, you must complete a prerequisite task.

Prerequisite

Before using the Oracle destination, install the Oracle JDBC driver for the database. The destination cannot access the database until you install this driver.

For information about installing additional drivers, see Install External Libraries in the Data Collector documentation.
  1. Download the Oracle JDBC driver from the Oracle website.
    Note: Writing XML data to Oracle requires installing the Oracle Data Integrator Driver for XML. For more information, see the Oracle documentation.
  2. Install the driver as an external library into the JDBC branded Oracle stage library, streamsets-datacollector-jdbc-branded-oracle-lib, which includes the origin.

Enabling Data Drift Handling

The Oracle destination can automatically compensate for changes in column or table requirements, also known as data drift.

Note: If you change the destination table schema manually rather than enabling data drift handling, you must restart the pipeline to allow the destination to discover schema changes.
The destination can handle data drift in the following ways:
Create new columns

The destination can create new columns in Oracle tables when new fields appear in records. For example, if a record suddenly includes a new Address2 field, the destination creates a new Address2 column in the target table.

By default, the destination creates new columns based on the data in the new fields, such as creating a Double column for decimal data. You can, however, configure the destination to create all new columns as String.

To enable the automatic creation of new columns, select the Enable Data Drift property on the JDBC tab. To create new columns as String, select the Create New Columns as String property.
Create new tables
The destination can create new tables as needed. For example, say the destination writes data to tables based on the region name in the Region field. When a new SW-3 region shows up in a record, the destination creates a new SW-3 table in Oracle and writes the record to the new table.

You can use this functionality to create all necessary tables in an empty Oracle database schema.

To enable the creation of new tables, select the Auto Create Table property.

Generated Data Types

When creating new tables or creating new columns in existing tables, the Oracle destination uses field names to generate the new column names.

You can configure the destination to create all new columns as String. However, by default, the Oracle destination creates columns as follows:
Record Field Data Type Oracle Column Data Type
Byte Array Binary Float
Char Char
String Varchar2
Decimal Number
Long Long
Float Float
Integer Number
Short Number
Date Date
Datetime Timestamp
Zoned Datetime Timestamp With Time Zone
Double Float

The destination does not support nested Data Collector data types: List, List-Map, and Map. By default, the destination treats fields with invalid data types as an error record. You can configure the destination to ignore fields with invalid data types, replacing them with an empty value.

Tip: To process fields in a record including nested list, list-map or map fields, use a Field Flattener processor to flatten the entire record to produce a record with no nested fields.

Creating Tables

If you configure the Oracle destination to handle data drift, you can also configure the destination to create tables. Once configured, the destination creates tables when the specified or needed tables do not exist in Oracle.

With the Auto Create Table property enabled on the JDBC tab, the destination creates tables and columns as follows:
Table
The destination creates a table if the table specified in the Table Name property does not exist.
While handling data drift, the destination creates tables needed to write records to tables that do not exist.
Table columns
In the created tables, the destination determines the columns from the first batch of data processed. The destination infers the data types from the data in the batch.
The columns in the created tables are nullable.

CRUD Operation Processing

The Oracle destination can insert, update, upsert, or delete data when you configure the destination to process CDC data.

When writing data, the Oracle destination uses the CRUD operation specified in the sdc.operation.type record header attribute. The destination performs operations based on the following numeric values:
  • 1 for INSERT
  • 2 for DELETE
  • 3 for UPDATE
  • 4 for UPSERT

If your pipeline has a CRUD-enabled origin that processes changed data, the destination simply reads the operation type from the sdc.operation.type header attribute that the origin generates. If your pipeline has a non-CDC origin, you can use the Expression Evaluator processor or a scripting processor to define the record header attribute. For more information about Data Collector changed data processing and a list of CDC-enabled origins, see Processing Changed Data.

Configuring an Oracle Destination

Configure an Oracle destination to write data to Oracle tables.

  1. In the Properties panel, on the General tab, configure the following properties:
    General Property Description
    Name Stage name.
    Description Optional description.
    Required Fields Fields that must include data for the record to be passed into the stage.
    Tip: You might include fields that the stage uses.

    Records that do not include all required fields are processed based on the error handling configured for the pipeline.

    Preconditions Conditions that must evaluate to TRUE to allow a record to enter the stage for processing. Click Add to create additional preconditions.

    Records that do not meet all preconditions are processed based on the error handling configured for the stage.

    On Record Error Error record handling for the stage:
    • Discard - Discards the record.
    • Send to Error - Sends the record to the pipeline for error handling.
    • Stop Pipeline - Stops the pipeline.
  2. On the JDBC tab, configure the following properties:
    JDBC Property Description
    Connection Connection that defines the information required to connect to an external system.

    To connect to an external system, you can select a connection that contains the details, or you can directly enter the details in the pipeline. When you select a connection, Control Hub hides other properties so that you cannot directly enter connection details in the pipeline.

    Hostname Host name of the database.
    Port Port number to use.
    Service Name Listener-registered service name.
    Use Connection String Use a JDBC connection string to connect to the Oracle database.
    Oracle JDBC Connection String Connection string used to connect to the database.

    Enter the connection
 string in the following format: jdbc:oracle:<driver_type>:@<host>:<port>:<service_name>

    You can optionally include the user name and password in the connection string.

    Required when Use Connection String is enabled.

    Schema Name Optional database or schema name to use.
    Use when the database requires a fully-qualified table name.
    Tip: Oracle uses all caps for schema, table, and column names by default. Names can be lower- or mixed-case only if the schema, table, or column was created with quotation marks around the name.

    To use a lower- or mixed-case schema name, enter the name and enable the Enclosed Object Names property.

    Table Name Database table name to use.
    Tip: Oracle uses all caps for schema, table, and column names by default. Names can be lower- or mixed-case only if the schema, table, or column was created with quotation marks around the name.

    To use a lower- or mixed-case table name, enter the name and enable the Enclosed Object Names property.

    Enable Data Drift Enables the stage to create new columns in Oracle tables when new fields appear in records.
    Auto Create Table Automatically creates tables when needed.

    Available when data drift is enabled.

    Use Credentials Enables entering credentials on the Credentials tab. Select when you do not include credentials in the JDBC connection string.
    Field to Column Mapping Use to override the default field to column mappings. By default, fields are written to columns of the same name.
    When you override the mappings, you can define parameterized values to apply SQL functions to the field values before writing them to columns. For example, to convert a field value to an integer, enter the following for the parameterized value:
    CAST(? AS INTEGER)

    The question mark (?) is substituted with the value of the field. Leave the default value of ? if you do not need to apply a SQL function.

    Using simple or bulk edit mode, click the Add icon to create additional field to column mappings.

    Change Log Format Format of change capture data log produced by the CDC-enabled origin. Used to process change capture data.
    Default Operation Default CRUD operation to perform if the sdc.operation.type record header attribute is not set.
    Unsupported Operation Handling Action to take when the CRUD operation type defined in the sdc.operation.type record header attribute is not supported:
    • Discard - Discards the record.
    • Send to Error - Sends the record to the pipeline for error handling.
    • Use Default Operation - Writes the record to the destination system using the default operation.
    Rollback Batch On Error Rolls back the entire batch when an error occurs within the batch.
    Use Quoted Identifiers Encloses table and column names in quotation marks.

    When this property is disabled, the destination uses the standard naming conventions for Oracle.

    Create New Columns as String Creates all new columns as String. By default, the destination creates new columns based on the type of data in the field.

    Available when data drift is enabled.

    Additional JDBC Configuration Properties Additional JDBC configuration properties to use. To add properties, click Add and define the JDBC property name and value.

    Use the property names and values as expected by JDBC.

  3. If you configured the destination to use credentials separately from the Oracle JDBC connection string on the JDBC tab, then configure the following properties on the Credentials tab:
    Credentials Property Description
    Username User name for the JDBC connection.

    The user account must have the correct permissions or privileges in the database.

    Password Password for the JDBC user name.
    Tip: To secure sensitive information such as user names and passwords, you can use runtime resources or credential stores. For more information about credential stores, see Credential Stores in the Data Collector documentation.
  4. On the Advanced tab, optionally configure advanced properties.
    The defaults for these properties should work in most cases:
    Advanced Property Description
    Maximum Pool Size Maximum number of connections to create.

    Default is 1. The recommended value is 1.

    Minimum Idle Connections Minimum number of connections to create and maintain. To define a fixed connection pool, set to the same value as Maximum Pool Size.

    Default is 1.

    Connection Timeout (seconds) Maximum time to wait for a connection. Use a time constant in an expression to define the time increment.
    Default is 30 seconds, defined as follows:
    ${30 * SECONDS}
    Idle Timeout (seconds) Maximum time to allow a connection to idle. Use a time constant in an expression to define the time increment.

    Use 0 to avoid removing any idle connections.

    When the entered value is close to or more than the maximum lifetime for a connection, Data Collector ignores the idle timeout.

    Default is 10 minutes, defined as follows:
    ${10 * MINUTES}
    Max Connection Lifetime (seconds) Maximum lifetime for a connection. Use a time constant in an expression to define the time increment.

    Use 0 to set no maximum lifetime.

    When a maximum lifetime is set, the minimum valid value is 30 minutes.

    Default is 30 minutes, defined as follows:
    ${30 * MINUTES}
    Transaction Isolation Transaction isolation level used to connect to the database.

    Default is the default transaction isolation level set for the database. You can override the database default by setting the level to any of the following:

    • Read committed
    • Read uncommitted
    • Repeatable read
    • Serializable
    Init Query SQL query to perform immediately after the stage connects to the database. Use to set up the database session as needed.

    The query is performed after each connection to the database. If the stage disconnects from the database during the pipeline run, for example if a network timeout occurrs, the stage performs the query again when it reconnects to the database.

    For example, in case of Oracle, the following query returns 1 to verify that the stage is connected to the database: Select 1 from dual;

    Data SQLSTATE Codes List of SQLSTATE codes to treat as data errors. The destination applies error record handling to records that trigger a listed code.

    When a record triggers a SQLSTATE code not listed, the destination generates a stage error that stops the pipeline.

    To add a code, click Add and enter the code.

  5. On the Data tab, configure the following properties:
    Data Property Description
    Column Fields to Ignore A list of fields to ignore when writing to the destination. You can enter a comma-separated list of first-level fields to ignore.
    Ignore Fields with Invalid Type Allows replacing fields that contain data of an invalid type with an empty value.

    When not enabled, records with data of invalid types are treated as error records.

    Merge CDC Data Enables performing CRUD operations and using the MERGE command to write to Oracle tables. Select to process CDC data.
    Important: To maintain the original order of data, do not use multiple threads or cluster execution mode when processing CDC data.
  6. On the SSL/TLS Encryption tab, configure the following properties:
    SSL/TLS Property Description
    Encrypt Connection Encrypt the connection using SSL.
    Server Certificate PEM Server certificate in PEM format used to verify the SSL/TLS certificate of the Oracle server.

    Use a text editor to open the PEM encoded certificate, and then copy and paste the full contents of the file into the property, including the header and footer.

    Cipher Suites Cipher suites to use. To use a cipher suite that is not a part of the default set, click the Add icon and enter the name of the cipher suite. You can use simple or bulk edit mode to add cipher suites.

    Enter the Java Secure Socket Extension (JSSE) name for the additional cipher suites that you want to use.

    Verify Hostname Verify that the host name specified in the connection string matches the certificate name in the server certificate.
    SSL Distinguished Name Distinguished Name (DN) of the server. This must match the DN in the server certificate.