SAP HANA Query Consumer

The SAP HANA Query Consumer origin reads from an SAP HANA database using the specified SQL query. For information about supported versions, see Supported Systems and Versions in the Data Collector documentation.

The SQL query can read data from a single table or from a join of tables. The origin returns data as a map with column names and field values.

When you configure SAP HANA Query Consumer, you specify connection information and credentials that determine how the origin connects to the database. You configure the query mode, SQL query and related information to define the data returned by the database. You can call stored procedures from the SQL query.

You can enable SAP HANA split batch commands, which allow parallel execution of the query on partitioned tables. You can specify custom properties that your driver requires. And you can specify what the origin does when encountering an unsupported data type.

By default, the origin generates JDBC record header attributes and JDBC field attributes that provide additional information about each record and field. You can configure the origin to generate SAP HANA record header attributes that provide details about the connection.

The origin can generate events for an event stream. For more information about dataflow triggers and the event framework, see Dataflow Triggers Overview.

Before you use the origin, you must install a JDBC driver.

Installing the JDBC Driver

Before you use the SAP HANA Query Consumer origin, install the JDBC driver for the database. You cannot access the database until you install the required driver.

You install the driver into the JDBC SAP HANA stage library, streamsets-datacollector-jdbc-sap-hana-lib, which includes the origin.

To use the JDBC driver with multiple stage libraries, install the driver into each stage library associated with the stages.

For information about installing additional drivers, see Install External Libraries in the Data Collector documentation.

Offset Column and Offset Value

The SAP HANA Query Consumer origin uses an offset column and initial offset value to determine where to start reading data within a table. Include both the offset column and the offset value in the WHERE clause of the SQL query.

The offset column must be a column in the table with unique non-null values, such as a primary key or indexed column. The initial offset value is a value within the offset column where you want the origin to start reading.

When the origin performs an incremental query, you must configure the offset column and offset value. For full queries, you can optionally configure them.

Full and Incremental Mode

The SAP HANA Query Consumer origin can perform queries in two modes:

Incremental mode
To use incremental mode, you must select the Incremental Mode property and configure an offset column and initial offset value for the origin. When you define the SQL query, you must use the ${OFFSET} parameter to represent the offset value in the WHERE clause.
When the origin performs an incremental query, it uses the initial offset value in place of the ${OFFSET} parameter in the first SQL query. As the origin completes processing the results of the first query, it saves the last offset value that it processes. Then it waits the specified query interval before performing a subsequent query.
When the origin performs a subsequent query, it uses the last-saved offset value in place of the ${OFFSET} parameter in the query. When needed, you can reset the origin to use the initial offset value.
Use incremental mode for append-only tables or when you do not need to capture changes to older rows. By default, the origin uses incremental mode.
For more SQL query guidelines, see SQL Query for Incremental Mode.
Full mode
To use full mode, you must clear the Incremental Mode property for the origin. You can optionally configure an offset column and initial offset value and can define any type of SQL query.
When the origin performs a full query, it runs the specified SQL query. If you optionally configure the offset column and initial offset value, the origin uses the initial offset as the offset value in the SQL query each time it requests data.
When the origin completes processing the results of the full query, it waits the specified query interval, and then performs the same query again.
Use full mode to capture all row updates. You might use a Record Deduplicator processor in the pipeline to minimize repeated rows. Full mode is not ideal for large tables.
Tip: If you want to process the results from a single full query and then stop the pipeline, you can enable the origin to generate events and use the Pipeline Finisher executor to stop the pipeline automatically. For more information, see Event Generation.
For more SQL query guidelines, see SQL Query for Full Mode.

Recovery

The SAP HANA Query Consumer origin supports recovery after a deliberate or unexpected stop when it performs incremental queries. Recovery is not supported for full queries.

In incremental mode, the origin uses offset values in the offset column to determine where to continue processing after a deliberate or unexpected stop. To ensure seamless recovery in incremental mode, use a primary key or indexed column as the offset column. As the SAP HANA Query Consumer origin processes data, it tracks the offset value internally. When the pipeline stops, the origin notes where it stopped processing data. When you restart the pipeline, it continues from the last-saved offset.

When the origin performs full queries, the origin runs the full query again after you restart the pipeline.

SQL Query

The SQL query defines the data returned from the database.

You define the query in the SQL Query property on the JDBC tab. Or, you can define the query in a runtime resource, and then use the runtime:loadResource function in the SQL Query property to load the query from the resource file at runtime. For example, you might enter the following expression for the property:

${runtime:loadResource("myquery.sql", false)}
The SQL query guidelines that you use depend on whether you configure the origin to perform an incremental or full query.
Note: Oracle uses all caps for schema, table, and column names by default. Names can be lower- or mixed-case only if the schema, table, or column was created with quotation marks around the name.

SQL Query for Incremental Mode

When you define the SQL query for incremental mode, the SAP HANA Query Consumer origin requires a WHERE and ORDER BY clause in the query.

Use the following guidelines when you define the WHERE and ORDER BY clauses in the query:

In the WHERE clause, include the offset column and the offset value
The origin uses an offset column and value to determine the data that is returned. Include both in the WHERE clause of the query.
Use the OFFSET parameter to represent the offset value
In the WHERE clause, use ${OFFSET} to represent the offset value.
For example, when you start a pipeline, the following query returns all data from the table where the data in the offset column is greater than the initial offset value:
SELECT * FROM <tablename> WHERE <offset column> > ${OFFSET}
Tip: When the offset values are strings, enclose ${OFFSET} in single quotation marks.
In the ORDER BY clause, include the offset column as the first column
To avoid returning duplicate data, use the offset column as the first column in the ORDER BY clause.
Note: Using a column that is not a primary key or indexed column in the ORDER BY clause can slow performance.
For example, the following query for incremental mode returns data from an invoice table where the ID column is the offset column. The query returns all data where the ID is greater than the offset and orders the data by the ID:
 SELECT * FROM invoice WHERE id > ${OFFSET} ORDER BY id

SQL Query for Full Mode

You can define any type of SQL query for full mode.

For example, you can run the following query to return all data from an invoice table:
SELECT * FROM invoice

When you define the SQL query for full mode, you can optionally include the WHERE and ORDER BY clauses using the same guidelines as for incremental mode. However, using these clauses to read from large tables can cause performance issues.

JDBC Attributes

The SAP HANA Query Consumer origin generates record header attributes and field attributes that provide additional information about each record and field.

The origin receives these details from the JDBC driver.

JDBC Header Attributes

By default, the SAP HANA Query Consumer origin generates JDBC record header attributes that provide additional information about each record, such as the original data type of a field or the source tables for the record. The origin receives these details from the JDBC driver.

You can use the record:attribute or record:attributeOrDefault functions to access the information in the attributes. For more information about working with record header attributes, see Working with Header Attributes.

JDBC header attributes include a user-defined prefix to differentiate the JDBC header attributes from other record header attributes. By default, the prefix is jdbc.

You can change the prefix that the origin uses and you can configure the origin not to create JDBC header attributes with the Create JDBC Header Attributes and JDBC Header Prefix properties on the Advanced tab.

The origin can provide the following JDBC header attributes:
JDBC Header Attribute Description
<JDBC prefix>.tables
Provides a comma-separated list of source tables for the fields in the record.
Note: Not all JDBC drivers provide this information.
<JDBC prefix>.<column name>.jdbcType Provides the numeric value of the original SQL data type for each field in the record. See the Java documentation for a list of the data types that correspond to numeric values.
<JDBC prefix>.<column name>.precision Provides the original precision for all numeric and decimal fields.
<JDBC prefix>.<column name>.scale Provides the original scale for all numeric and decimal fields.

JDBC Field Attributes

The SAP HANA Query Consumer origin generates field attributes for columns converted to the Decimal or Datetime data types in Data Collector. The attributes provide additional information about each field.

The following data type conversions do not include all information in the corresponding Data Collector data type:
  • Decimal and Numeric data types are converted to the Data Collector Decimal data type, which does not store scale and precision.
  • The Timestamp data type is converted to the Data Collector Datetime data type, which does not store nanoseconds.
To preserve this information during data type conversion, the origin generates the following field attributes for these Data Collector data types:
Data Collector Data Type Generated Field Attribute Description
Decimal precision Provides the original precision for every decimal or numeric column.
Decimal scale Provides the original scale for every decimal or numeric column.
Datetime nanoSeconds Provides the original nanoseconds for every timestamp column.

You can use the record:fieldAttribute or record:fieldAttributeOrDefault functions to access the information in the attributes. For more information about working with field attributes, see Field Attributes.

SAP HANA Header Attributes

The SAP HANA Query Consumer origin can include SAP HANA connection information , such as the driver version or application name, in record header attributes. The origin receives these details from the JDBC driver.

The attributes are named SapHANA.<attribute>.

For example, the driver might include the following record header attributes:
SapHANA.APPLICATIONUSER:<user name>
SapHANA.DRIVERVERSION:2.4.76-7ca985c0cc5ea9fa063ab376dab1bf7b859dd9cc
SapHANA.APPLICATION:com.streamsets.pipeline.BootstrapMain

Use the Include SAP HANA Connection Details on the SAP HANA tab to enable generating the SAP HANA header attributes.

Event Generation

The SAP HANA Query Consumer origin can generate events that you can use in an event stream. When you enable event generation, the origin generates an event when it completes processing the data returned by the specified query. The origin also generates an event when a query completes successfully and when it fails to complete.

Events generated by the origin can be used in any logical way. For example:
  • With the Pipeline Finisher executor to stop the pipeline and transition the pipeline to a Finished state when the origin completes processing available data.

    When you restart a pipeline stopped by the Pipeline Finisher executor, the origin processes data based on how you configured the origin. For example, if you configure the origin to run in incremental mode, the origin saves the offset when the executor stops the pipeline. When it restarts, the origin continues processing from the last-saved offset. In contrast, if you configure the origin to run in full mode, when you restart the pipeline, the origin uses the initial offset, if specified.

    For an example, see Stopping a Pipeline After Processing All Available Data.

  • With the Email executor to send a custom email after receiving an event.

    For an example, see Sending Email During Pipeline Processing.

For more information about dataflow triggers and the event framework, see Dataflow Triggers Overview.

Event Record

Event records generated by the SAP HANA Query Consumer origin have the following event-related record header attributes:
Record Header Attribute Description
sdc.event.type Event type. Uses one of the following types:
  • no-more-data - Generated when the origin completes processing all data returned by a query.
  • jdbc-query-success - Generated when the origin successfully completes a query.
  • jdbc-query-failure - Generated when the origin fails to complete a query.
sdc.event.version Integer that indicates the version of the event record type.
sdc.event.creation_timestamp Epoch timestamp when the stage created the event.
The origin can generate the following types of event records:
No-more-data
The origin generates a no-more-data event record when it completes processing all data returned by a query.
When necessary, you can configure the origin to delay the generation of the no-more-data event by a specified number of seconds. You might configure a delay to ensure that the query success or query failure events are generated and delivered to the pipeline before the no-more-data event record. To use a delay, configure the No-more-data Event Generation Delay property on the JDBC tab.
No-more-data event records generated by the origin have the sdc.event.type set to no-more-data and include the following field:
Event Record Field Description
record-count Number of records successfully generated since the pipeline started or since the last no-more-data event was created.
Query success
The origin generates a query success event record when it completes processing the data returned from a query.
The query success event records have the sdc.event.type record header attribute set to jdbc-query-success and include the following fields:
Field Description
query Query that completed successfully.
timestamp Timestamp when the query completed.
row-count Number of processed rows.
source-offset Offset after the query completed.
Query failure
The origin generates a query failure event record when it fails to complete processing the data returned from a query.
The query failure event records have the sdc.event.type record header attribute set to jdbc-query-failure and include the following fields:
Field Description
query Query that failed to complete.
timestamp Timestamp when the query failed to complete.
row-count Number of records from the query that were processed.
source-offset Origin offset after query failure.
error First error message.

Configuring an SAP HANA Query Consumer Origin

Configure an SAP HANA Query Consumer origin to read from an SAP HANA database using the specified SQL query. Before you use the origin, you must install a JDBC driver.

  1. In the Properties panel, on the General tab, configure the following properties:
    General Property Description
    Name Stage name.
    Description Optional description.
    Produce Events Generates event records when events occur. Use for event handling.
    On Record Error Error record handling for the stage:
    • Discard - Discards the record.
    • Send to Error - Sends the record to the pipeline for error handling.
    • Stop Pipeline - Stops the pipeline.
  2. On the JDBC tab, configure the following properties:
    JDBC Property Description
    Host Name of the host to connect to.
    Port Port number to use.
    Database Name of the database to connect to.
    SQL Query SQL query to use when reading data from the database.

    Define the query in the property. Or, define the query in a runtime resource, and then use the runtime:loadResource function in the property to load the query from the resource file at runtime.

    Initial Offset Offset value to use when the pipeline starts.

    Required in incremental mode.

    Offset Column Column to use for the offset value.

    As a best practice, an offset column should be an incremental and unique column that does not contain null values. Having an index on this column is strongly encouraged since the underlying query uses an ORDER BY and inequality operators on this column.

    Required in incremental mode.

    Use Credentials Enables entering credentials on the Credentials tab.
    Incremental Mode Defines how the origin queries the database. Select to perform incremental queries. Clear to perform full queries.

    Default is incremental mode.

    Root Field Type Root field type to use for generated records. Use the default List-Map option unless using the origin in a pipeline built with Data Collector version 1.1.0 or earlier.
    Query Interval Amount of time to wait between queries. Enter an expression based on a unit of time. You can use SECONDS, MINUTES, or HOURS.

    Default is 10 seconds: ${10 * SECONDS}.

    Max Batch Size (records) Maximum number of records to include in a batch.
    Max Clob Size (characters) Maximum number of characters to be read in a Clob field. Larger data is truncated.
    Max Blob Size (bytes)

    Maximum number of bytes to be read in a Blob field.

    Number of Retries on SQL Error Number of times the origin tries to execute the query after receiving an SQL error. After retrying this number of times, the origin handles the error based on the error handling configured for the origin.

    Use to handle transient network or connection issues that prevent the origin from submitting a query.

    Default is 0.

    Convert Timestamp to String Enables the origin to write timestamps as string values rather than datetime values. Strings maintain the precision stored in the source system.

    When writing timestamps to Data Collector date or time data types that do not store nanoseconds, the origin stores any nanoseconds from the timestamp in a field attribute.

    Additional JDBC Configuration Properties Additional JDBC configuration properties to use. To add properties, click Add and define the JDBC property name and value.

    Use the property names and values as expected by JDBC.

  3. On the Credentials tab, configure the following properties:
    Credentials Property Description
    Username User name for the JDBC connection.

    The user account must have the correct permissions or privileges in the database.

    Password Password for the JDBC user name.
    Tip: To secure sensitive information such as user names and passwords, you can use runtime resources or credential stores. For more information about credential stores, see Credential Stores in the Data Collector documentation.
  4. On the Advanced tab, optionally configure advanced properties.
    The defaults for these properties should work in most cases:
    Advanced Property Description
    Maximum Pool Size Maximum number of connections to create.

    Default is 1. The recommended value is 1.

    Minimum Idle Connections Minimum number of connections to create and maintain. To define a fixed connection pool, set to the same value as Maximum Pool Size.

    Default is 1.

    Connection Timeout (seconds) Maximum time to wait for a connection. Use a time constant in an expression to define the time increment.
    Default is 30 seconds, defined as follows:
    ${30 * SECONDS}
    Idle Timeout (seconds) Maximum time to allow a connection to idle. Use a time constant in an expression to define the time increment.

    Use 0 to avoid removing any idle connections.

    When the entered value is close to or more than the maximum lifetime for a connection, Data Collector ignores the idle timeout.

    Default is 10 minutes, defined as follows:
    ${10 * MINUTES}
    Max Connection Lifetime (seconds) Maximum lifetime for a connection. Use a time constant in an expression to define the time increment.

    Use 0 to set no maximum lifetime.

    When a maximum lifetime is set, the minimum valid value is 30 minutes.

    Default is 30 minutes, defined as follows:
    ${30 * MINUTES}
    Auto Commit Determines if auto-commit mode is enabled. In auto-commit mode, the database commits the data for each record.

    Default is disabled.

    Enforce Read-only Connection Creates read-only connections to avoid any type of write.

    Default is enabled. Disabling this property is not recommended.

    Transaction Isolation Transaction isolation level used to connect to the database.

    Default is the default transaction isolation level set for the database. You can override the database default by setting the level to any of the following:

    • Read committed
    • Read uncommitted
    • Repeatable read
    • Serializable
    Init Query SQL query to perform immediately after the stage connects to the database. Use to set up the database session as needed.

    The query is performed after each connection to the database. If the stage disconnects from the database during the pipeline run, for example if a network timeout occurrs, the stage performs the query again when it reconnects to the database.

    Create JDBC Header Attributes Adds JDBC header attributes to records. The origin creates JDBC header attributes by default.
    JDBC Header Prefix Prefix for JDBC header attributes.
    Disable Query Validation Disables the query validation that occurs by default. Use to avoid time consuming query validation situations.
    Warning: Query validation prevents running a pipeline with invalid queries. Use this option with care.
    On Unknown Type Action to take when encountering an unsupported data type:
    • Stop Pipeline - Stops the pipeline after completing the processing of the previous records.
    • Convert to String - When possible, converts the data to string and continues processing.
  5. On the SAP HANA tab, configure the following properties:
    SAP HANA Property Description
    Split Batch Commands Enables reading from multiple partitions at the same time. For more information, see the SAP HANA documentation.
    Include SAP HANA Connection Details Includes the information used to connect to the database in SAP HANA record header attributes.