CONNX
The CONNX origin uses a SQL query to read mainframe data from a CONNX server. To read mainframe change data from CONNX, use the CONNX CDC origin. Connecting to CONNX requires a Mainframe Collector license. The license allows reading mainframe data from CONNX and writing the data to cloud destinations. Contact your StreamSets account manager for more information. For information about supported versions, see Supported Systems and Versions in the Data Collector documentation.
The CONNX server connects to different types of mainframes. It uses a data dictionary to present the data like a database table, using a typical database structure: database, schema, table or view. The CONNX origin uses a SQL query to retrieve the data that you want to process.
When the pipeline stops, the CONNX origin notes where it stops reading. When the pipeline starts again, the origin continues processing from where it stopped by default. You can reset the origin to process all requested data.
When you configure the CONNX origin, you define the query mode and SQL query to define the data returned by CONNX. The query can read data from a single table or from a join of tables. When in full query mode and reading from certain systems, you can use a stored procedure instead of a SQL query.
You specify connection information, query interval, and custom JDBC configuration properties to determine how the origin connects to CONNX. You can also use a connection to configure the origin.
When the source data includes high-precision timestamps, you can configure the origin to write strings rather than datetime values to maintain the precision.
You can specify custom properties that your driver requires and configure advanced properties. And you can specify what the origin does when encountering an unsupported data type. By default, the origin generates JDBC record header and field attributes that provide additional information about each record and field.
The origin can generate events for an event stream. For more information about dataflow triggers and the event framework, see Dataflow Triggers Overview.
Prerequisites
Connecting to CONNX requires a Mainframe Collector license. The license allows reading mainframe data from CONNX and writing the data to cloud destinations. Contact your StreamSets account manager for more information.
Before using the CONNX origin, you must install the CONNX driver.
Before you build a CONNX pipeline, you must install the CONNX SQL engine local to the mainframe machine. For information about installing a CONNX SQL engine, see the CONNX documentation.
Installing the CONNX JDBC Driver
Before using the CONNX origin, install the CONNX JDBC driver for the server. The origin cannot access the server until you install this driver.
For information about installing additional drivers, see Install External Libraries in the Data Collector documentation.
- Download the CONNX JDBC driver from the CONNX website.
-
Install the driver as an external library into the CONNX stage library,
streamsets-datacollector-connx-lib
, which includes the origin.
Supported Destinations
The destinations that you can use in a CONNX pipeline depend on your Mainframe Collector license. For more information, see Prerequisites.
- Amazon S3
- Azure Data Lake Storage Gen2
- Azure Synapse
- Databricks Delta Lake
- JDBC Producer to write to Amazon Redshift
- Google BigQuery
- Google Cloud Storage
- Snowflake
Offset Column and Offset Value
The CONNX origin uses an offset column and initial offset value to determine where to start reading data within a table. Include both the offset column and the offset value in the WHERE clause of the SQL query.
The offset column must be a column in the table with unique non-null values, such as a primary key or indexed column. The initial offset value is a value within the offset column where you want the origin to start reading.
When the origin performs an incremental query, you must configure the offset column and offset value. For full queries, you can optionally configure them.
Full and Incremental Mode
The CONNX origin can perform queries in two modes:
- Incremental mode
- To use incremental mode, you must select the Incremental Mode property and configure an offset column and initial offset value for the origin. When you define the SQL query, you must use the ${OFFSET} parameter to represent the offset value in the WHERE clause.
- Full mode
- To use full mode, you must clear the Incremental Mode property for the origin. You can optionally configure an offset column and initial offset value and can define any type of SQL query.
Recovery
The CONNX origin supports recovery after a deliberate or unexpected stop when it performs incremental queries. Recovery is not supported for full queries.
In incremental mode, the origin uses offset values in the offset column to determine where to continue processing after a deliberate or unexpected stop. To ensure seamless recovery in incremental mode, use a primary key or indexed column as the offset column. As the CONNX origin processes data, it tracks the offset value internally. When the pipeline stops, the origin notes where it stopped processing data. When you restart the pipeline, it continues from the last-saved offset.
When the CONNX origin performs full queries, the origin runs the full query again after you restart the pipeline.
SQL Query
A SQL query defines the data returned from the CONNX server. You define the query in the SQL Query property on the JDBC tab.
runtime:loadResource
function in the SQL Query property to load the query from the resource file at runtime.
For example, you might enter the following expression for the
property:${runtime:loadResource("myquery.sql", false)}
When running the origin in full query mode and reading from certain systems, you can define a stored procedure, then call the stored procedure using the SQL Query property.
The SQL query guidelines that you use depend on whether you configure the origin to perform an incremental or full query.
SQL Query for Incremental Mode
When you define the SQL query for incremental mode, the CONNX origin requires a WHERE and ORDER BY clause in the query.
Use the following guidelines when you define the WHERE and ORDER BY clauses in the query:
- In the WHERE clause, include the offset column and the offset value
- The origin uses an offset column and value to determine the data that is returned. Include both in the WHERE clause of the query.
- Use the OFFSET parameter to represent the offset value
- In the WHERE clause, use ${OFFSET} to represent the offset value.
- In the ORDER BY clause, include the offset column as the first column
- To avoid returning duplicate data, use the offset column as the first column in the ORDER BY clause.
invoice
table where the ID column is the offset column. The query
returns all data where the ID is greater than the offset and orders the data by the
ID: SELECT * FROM invoice WHERE id > ${OFFSET} ORDER BY id
SQL Query for Full Mode
You can define any type of SQL query for full mode.
invoice
table:SELECT * FROM invoice
When you define the SQL query for full mode, you can optionally include the WHERE and ORDER BY clauses using the same guidelines as for incremental mode. However, using these clauses to read from large tables can cause performance issues.
Stored Procedure in Full Mode
You can call a stored procedure from the CONNX origin when using the origin in full mode and when reading from a DRDA-compliant (distributed relational database) source system.
- In your source system, define the stored procedure.
- Import the stored procedure to CONNX.
For more information, see the CONNX documentation.
- In the origin, on the JDBC tab, configure the SQL Query property to call the stored procedure. Use the appropriate syntax for your source system.
- Also on the JDBC tab, clear the Incremental Mode property, which is selected by default.
- Test the pipeline to ensure that the procedure performs as expected.
JDBC Attributes
The CONNX origin generates record header attributes and field attributes that provide additional information about each record and field. The origin receives these details from the JDBC driver.
JDBC Header Attributes
By default, the CONNX origin generates JDBC record header attributes that provide additional information about each record, such as the original data type of a field or the source tables for the record. The origin receives these details from the JDBC driver.
You can use the record:attribute
or
record:attributeOrDefault
functions to access the information
in the attributes. For more information about working with record header attributes,
see Working with Header Attributes.
JDBC header attributes include a user-defined prefix to
differentiate the JDBC header attributes from other record header attributes. By
default, the prefix is jdbc
.
You can change the prefix that the origin uses and you can configure the origin not to create JDBC header attributes with the Create JDBC Header Attributes and JDBC Header Prefix properties on the Advanced tab.
JDBC Header Attribute | Description |
---|---|
<JDBC prefix>.tables |
Provides a
comma-separated list of source tables for the fields in the
record.
Note: Not all JDBC drivers
provide this information. |
<JDBC prefix>.<column name>.jdbcType | Provides the numeric value of the original SQL data type for each field in the record. See the Java documentation for a list of the data types that correspond to numeric values. |
<JDBC prefix>.<column name>.precision | Provides the original precision for all numeric and decimal fields. |
<JDBC prefix>.<column name>.scale | Provides the original scale for all numeric and decimal fields. |
Header Attributes with the Drift Synchronization Solution
When you use the CONNX origin with the Drift Synchronization Solution, ensure that the origin creates JDBC header attributes. JDBC header attributes allow the Hive Metadata processor to use the precision and scale information in the attributes to define decimal fields. The origin creates JDBC header attributes, by default.
- In the origin, on the Advanced tab, make sure that the Create JDBC Header Attributes property is selected.
- On the same tab, you can optionally change the default for the JDBC Header Prefix property.
- If you changed the default value for the JDBC Header Prefix property, then on
the Hive tab of the Hive Metadata processor, configure
the Decimal Precision Expression and Decimal
Scale Expression properties. Update the
jdbc.
string in each property to match the specified JDBC header prefix.If you did not change the JDBC Header Prefix default value, then use the default expressions for the properties.
JDBC Field Attributes
The CONNX origin generates field attributes for columns converted to the Decimal or Datetime data types in Data Collector. The attributes provide additional information about each field.
- Decimal and Numeric data types are converted to the Data Collector Decimal data type, which does not store scale and precision.
- The Timestamp data type is converted to the Data Collector Datetime data type, which does not store nanoseconds.
Data Collector Data Type | Generated Field Attribute | Description |
---|---|---|
Decimal | precision | Provides the original precision for every decimal or numeric column. |
Decimal | scale | Provides the original scale for every decimal or numeric column. |
Datetime | nanoSeconds | Provides the original nanoseconds for every timestamp column. |
You can use the record:fieldAttribute
or
record:fieldAttributeOrDefault
functions to access the information
in the attributes. For more information about working with field attributes, see Field Attributes.
Event Generation
The CONNX origin can generate events that you can use in an event stream. When you enable event generation, the origin generates an event when it completes processing the data returned by the specified query. The origin also generates an event when a query completes successfully and when it fails to complete.
- With the Pipeline Finisher executor to stop the pipeline and
transition the pipeline to a Finished state when the origin completes processing
available data.
When you restart a pipeline stopped by the Pipeline Finisher executor, the origin processes data based on how you configured the origin. For example, if you configure the origin to run in incremental mode, the origin saves the offset when the executor stops the pipeline. When it restarts, the origin continues processing from the last-saved offset. In contrast, if you configure the origin to run in full mode, when you restart the pipeline, the origin uses the initial offset, if specified.
For an example, see Stopping a Pipeline After Processing All Available Data.
- With the Email executor to send a custom email
after receiving an event.
For an example, see Sending Email During Pipeline Processing.
-
With a destination to store information about completed queries.
For an example, see Preserving an Audit Trail of Events.
For more information about dataflow triggers and the event framework, see Dataflow Triggers Overview.
Event Records
Record Header Attribute | Description |
---|---|
sdc.event.type | Event type. Uses one of the following types:
|
sdc.event.version | Integer that indicates the version of the event record type. |
sdc.event.creation_timestamp | Epoch timestamp when the stage created the event. |
- No-more-data
- The origin generates a no-more-data event record when it completes processing all data returned by a query.
- Query success
- The origin generates a query success event record when it completes processing the data returned from a query.
- Query failure
- The origin generates a query failure event record when it fails to complete processing the data returned from a query.
CONNX SQL Data Types
CONNX SQL Data Type | Data Collector Data Type |
---|---|
BigInt | Long |
Binary | Byte Array |
Bit | Boolean |
Char, Nchar | String |
Date | Date |
Decimal | Decimal |
Double | Double |
Float, QFloat | Float |
Integer | Int |
Numeric | Decimal |
Real | Real |
Smallint, Tinyint | Short |
Time | Time |
Timestamp | Datetime |
Varbinary, Longvarbinary | Byte Array |
Varchar, Longvarchar | String |
Varnchar, Longnvarchar | String |
Unsupported Data Types
- Stops the pipeline
- If the stage encounters an unsupported data type, the stage stops the
pipeline after completing the processing of the previous records and
displays the following
error:
JDBC_37 - Unsupported type 1111 for column.
- Converts to string
- If the stage encounters an unsupported data type, the stage converts the data to string when possible, and then continues processing. Not all unsupported data types can successfully be converted to string. When using this option, verify that the data is converted to string as expected.
Configuring a CONNX Origin
Configure a CONNX origin to use a SQL query to read mainframe data from a CONNX server. Before using the origin in a pipeline, perform the prerequisite tasks.
-
In the Properties panel, on the General tab, configure the
following properties:
General Property Description Name Stage name. Description Optional description. Produce Events Generates event records when events occur. Use for event handling. On Record Error Error record handling for the stage: - Discard - Discards the record.
- Send to Error - Sends the record to the pipeline for error handling.
- Stop Pipeline - Stops the pipeline.
-
On the JDBC tab, configure the following properties:
JDBC Property Description Connection Connection that defines the information required to connect to an external system. To connect to an external system, you can select a connection that contains the details, or you can directly enter the details in the pipeline. When you select a connection, Control Hub hides other properties so that you cannot directly enter connection details in the pipeline.
Use Connection String Enables specifying a connection string instead of defining connection details. CONNX JDBC Connection String JDBC connection string. Use the following format: jdbc:connx:DD=<dsn>;Gateway=<gateway>;Port=<port>
The CONNX JDBC driver uses port 7500 by default.
Gateway CONNX JDBC server gateway. Port Port number to use. Default is 7500.
Database Name Name of the database to read from. Use SSL Uses SSL/TLS to connect to CONNX. Be sure to perform the required tasks to enable SSL. For more information, see SSL/TLS in CONNX Stages.
SQL Query SQL query to use when reading data from the CONNX server. You can use several methods to specify the query: - Define the query in the property.
- Define the query in a runtime resource, and then use the
runtime:loadResource
function in the property to load the query from the resource file at runtime. - When in full query mode, you can define a stored procedure in the source system, then call the stored procedure from the property.
Initial Offset Offset value to use when the pipeline starts. When you start the pipeline for the first time, the origin starts processing from the specified initial offset. The origin only uses the specified initial offset again when you reset the origin. Required in incremental mode.
Offset Column Column to use for the offset value. As a best practice, an offset column should be an incremental and unique column that does not contain null values. Having an index on this column is strongly encouraged since the underlying query uses an ORDER BY and inequality operators on this column.
Required in incremental mode.
Use Credentials Enables entering credentials on the Credentials tab. Select when you do not include credentials in the JDBC connection string. Incremental Mode Defines how the origin queries the CONNX server. Select to perform incremental queries. Clear to perform full queries. Default is incremental mode.
Root Field Type Root field type to use for generated records. Use the default List-Map option unless using the origin in a pipeline built with Data Collector version 1.1.0 or earlier. Query Interval Amount of time to wait between queries. Enter an expression based on a unit of time. You can use SECONDS, MINUTES, or HOURS. Default is 10 seconds: ${10 * SECONDS}.
Max Batch Size (records) Maximum number of records to include in a batch. Max Clob Size (characters) Maximum number of characters to be read in a Clob field. Larger data is truncated. Max Blob Size (bytes) Maximum number of bytes to be read in a Blob field.
Number of Retries on SQL Error Maximum number of times the origin tries to execute the query after encountering a SQL error. After retrying this number of times, the origin handles the error based on the error handling configured for the origin. Use to handle transient network or connection issues that prevent the origin from submitting a query.
Default is 0.
No-more-data Event Generation Delay Number of seconds to delay generation of the no-more-data event after processing all rows. Use to allow time for additional data to arrive before generating the no-more-data event. Convert Timestamp To String Enables the origin to write timestamps as string values rather than datetime values. Strings maintain the precision stored in the source system. For example, strings can maintain the precision of a high-precision IBM Db2 TIMESTAMP(9) field. When writing timestamps to Data Collector date or time data types that do not store nanoseconds, the origin stores any nanoseconds from the timestamp in a field attribute.
Additional JDBC Configuration Properties Additional JDBC configuration properties to use. To add properties, click Add and define the JDBC property name and value. Use the property names and values as expected by JDBC.
-
If you configured the origin to enter JDBC credentials
separately from the JDBC connection string on the JDBC tab,
then configure the following properties on the Credentials
tab:
Credentials Property Description Username User name for the JDBC connection. The user account must have the correct permissions or privileges in the database.
Password Password for the JDBC user name. Tip: To secure sensitive information such as user names and passwords, you can use runtime resources or credential stores. For more information about credential stores, see Credential Stores in the Data Collector documentation. -
On the Advanced tab, optionally configure advanced
properties.
The defaults for these properties should work in most cases:
Advanced Property Description Maximum Pool Size Maximum number of connections to create. Default is 1. The recommended value is 1.
Minimum Idle Connections Minimum number of connections to create and maintain. To define a fixed connection pool, set to the same value as Maximum Pool Size. Default is 1.
Connection Timeout (seconds) Maximum time to wait for a connection. Use a time constant in an expression to define the time increment. Default is 30 seconds, defined as follows:${30 * SECONDS}
Idle Timeout (seconds) Maximum time to allow a connection to idle. Use a time constant in an expression to define the time increment. Use 0 to avoid removing any idle connections.
When the entered value is close to or more than the maximum lifetime for a connection, Data Collector ignores the idle timeout.
Default is 10 minutes, defined as follows:${10 * MINUTES}
Max Connection Lifetime (seconds) Maximum lifetime for a connection. Use a time constant in an expression to define the time increment. Use 0 to set no maximum lifetime.
When a maximum lifetime is set, the minimum valid value is 30 minutes.
Default is 30 minutes, defined as follows:${30 * MINUTES}
Auto Commit Determines if auto-commit mode is enabled. In auto-commit mode, the database commits the data for each record. Default is disabled.
Enforce Read-only Connection Creates read-only connections to avoid any type of write. Default is enabled. Disabling this property is not recommended.
Transaction Isolation Transaction isolation level used to connect to the database. Default is the default transaction isolation level set for the database. You can override the database default by setting the level to any of the following:
- Read committed
- Read uncommitted
- Repeatable read
- Serializable
Init Query SQL query to perform immediately after the stage connects to the database. Use to set up the database session as needed. The query is performed after each connection to the database. If the stage disconnects from the database during the pipeline run, for example if a network timeout occurrs, the stage performs the query again when it reconnects to the database.
Create JDBC Header Attributes Adds JDBC header attributes to records. The origin creates JDBC header attributes by default. Note: When using the origin with a Drift Synchronization Solution, make sure this property is selected.JDBC Header Prefix Prefix for JDBC header attributes. Disable Query Validation Disables the query validation that occurs by default. Use to avoid time consuming query validation situations. Warning: Query validation prevents running a pipeline with invalid queries. Use this option with care.On Unknown Type Action to take when encountering an unsupported data type: - Stop Pipeline - Stops the pipeline after completing the processing of the previous records.
- Convert to String - When possible, converts the data to string and continues processing.