PostgreSQL Metadata
The PostgreSQL Metadata processor determines the PostgreSQL table where each record should be written, compares the record structure against the table structure, then creates or alters the table as needed. For information about supported versions, see Supported Systems and Versions in the Data Collector documentation.
Use the PostgreSQL Metadata processor as part of the Drift Synchronization Solution for PostgreSQL.
When processing data, the PostgreSQL Metadata processor uses a table name expression to determine the name of the target table to use for each record. If the target table is not in the processor's cache, the processor queries the database for table information and caches the results. When the target table is in the cache, the processor compares the record structure against cached table structure.
When a record includes fields that do not exist in the table, the PostgreSQL Metadata processor alters the table as needed, then updates the table information in the cache. When a record should be written to a table that does not exist, the processor creates the table based on the fields in the record.
Like other database-related stages, when you configure the PostgreSQL Metadata processor, you can specify custom JDBC properties, enter connection credentials, and configure advanced properties such as an initial query and timeouts.
For more information about the Drift Synchronization Solution for PostgreSQL and a case study, see Drift Synchronization Solution for PostgreSQL.
JDBC Driver
When connecting to a PostgreSQL database, you do not need to install a JDBC driver. Data Collector includes the JDBC driver required for PostgreSQL.
Schema and Table Names
When you configure the schemas and tables where records should be written, you can use the actual schema and table names or expressions that resolve to the schemas and tables to use.
Use names when all data can be written to the same schema or table. Use expressions to use information in the record to determine the schema or table to write to.
For example, the JDBC Multitable Consumer origin writes the
originating table name in the jdbc.tables
record header
attribute.
If you want to write records to tables of the
same name, you can use ${record:attribute('jdbc.tables')}
for the table name property.
Similarly, the JDBC Query Consumer writes the originating table name
in a <user-defined prefix>.tables
record header
attribute when the origin is configured to create record header
attributes.
So if you want to write records to tables of the
same name, you can use ${record:attribute('<user defined
prefix>.tables')}
for the table name property.
-
If all records are to be written to a single schema or table, you can enter the schema or table name instead of an expression.
-
If the schema or table name can be extrapolated from record data or header attributes, you can use an expression that evaluates to the schema or table name.
- When necessary, you can use an Expression Evaluator earlier in the pipeline to perform calculations and write the results to a new field or a header attribute. Then, configure the PostgreSQL Metadata processor to use that information.
Decimal Precision and Scale Field Attributes
Use the Decimal Precision Attribute and Decimal Scale Attribute properties to specify the precision and scale for the Decimal columns that the PostgreSQL Metadata processor creates.
While other data types have hard coded definitions that the processor uses to create columns in database tables, Decimal columns require a specified precision and scale.
When processing data from the JDBC Query Consumer or JDBC Multitable Consumer origins, use the default attribute names, "precision" and "scale". Both origins store the precision and scale of Decimal columns in "precision" and "scale" field attributes for each Decimal field.
When processing data from other origins, you can use the Expression Evaluator processor earlier in the pipeline to create precision and scale field attributes for Decimal fields.
Caching Information
When processing records, the PostgreSQL Metadata processor queries the database for the necessary table information and caches the results. After creating or altering a table, it updates the table information in the cache. The processor uses the cache for record comparison when possible, to avoid unnecessary queries.
PostgreSQL Data Types
Data Collector Data Type | PostgreSQL Data Type |
---|---|
Boolean | Boolean |
Byte Array | Bytea |
Char | Character |
Date | Date |
Datetime | Timestamp without time zone |
Decimal | Numeric |
Double | Double precision |
Float | Real |
Integer | Integer |
Long | Bigint |
List | Not supported |
List-Map | Not supported |
Map | Not supported |
Short | Smallint |
String | Character varying |
Time | Time without time zone |
Zoned Datetime | Timestamp with time zone |
Configuring a PostgreSQL Metadata Processor
Configure a PostgreSQL Metadata processor as part of the Drift Synchronization Solution for PostgreSQL.
-
In the Properties panel, on the General tab, configure the
following properties:
General Property Description Name Stage name. Description Optional description. Required Fields Fields that must include data for the record to be passed into the stage. Tip: You might include fields that the stage uses.Records that do not include all required fields are processed based on the error handling configured for the pipeline.
Preconditions Conditions that must evaluate to TRUE to allow a record to enter the stage for processing. Click Add to create additional preconditions. Records that do not meet all preconditions are processed based on the error handling configured for the stage.
On Record Error Error record handling for the stage: - Discard - Discards the record.
- Send to Error - Sends the record to the pipeline for error handling.
- Stop Pipeline - Stops the pipeline. Not valid for cluster pipelines.
-
On the JDBC tab, configure the following properties:
JDBC Property Description JDBC Connection String Connection string to use to connect to the database. Use the following format: jdbc:postgresql://<host>:<port>/<database>
Schema Name Name of the schema to use. You can enter an expression that evaluates to the schema name. Table Name Name of the database table to use.
Enter one of the following:- Name of an existing database table.
- An expression that evaluates to the table name.
For example, to use the table name in a
"tableName" record header attribute, enter the
following
expression:
${record:attribute('tableName')}
The processor creates the table if the table does not exist and updates the table when needed.
Decimal Scale Attribute The field attribute that contains the scale for Decimal fields. Use the default for data generated by the JDBC Query Consumer or the JDBC Multitable Consumer origins.
Decimal Precision Attribute The field attribute that contains the precision for Decimal fields. Use the default for data generated by the JDBC Query Consumer or the JDBC Multitable Consumer origins.
Omit Constraints when Creating Tables Ignores primary key and not-null constraints from the original table when creating new tables. To use the primary key and not-null constraints from the data received from the origin, clear this option.
Note: Changes to this property only affect new tables and do not change the constraints on tables that already exist in the database.Default is true.
Use Credentials Enables entering credentials on the Credentials tab. Select when you do not include credentials in the JDBC connection string. The user account used for the connection credentials must have both the Create Table and Alter Table permissions on the database.
Additional JDBC Configuration Properties Additional JDBC configuration properties to use. To add properties, click Add and define the JDBC property name and value. Use the property names and values as expected by JDBC.
-
If you configured the origin to enter JDBC credentials
separately from the JDBC connection string on the JDBC tab,
then configure the following properties on the Credentials
tab:
Credentials Property Description Username User name for the JDBC connection. The user account must have the correct permissions or privileges in the database.
Password Password for the JDBC user name. Tip: To secure sensitive information such as user names and passwords, you can use runtime resources or credential stores. For more information about credential stores, see Credential Stores in the Data Collector documentation. -
When using JDBC versions older than 4.0, on the Legacy
Drivers tab, optionally configure the following
properties:
Legacy Drivers Property Description JDBC Class Driver Name Class name for the JDBC driver. Required for JDBC versions older than version 4.0. Connection Health Test Query Optional query to test the health of a connection. Recommended only when the JDBC version is older than 4.0. -
On the Advanced tab, optionally configure advanced
properties.
The defaults for these properties should work in most cases:
Advanced Property Description Maximum Pool Size Maximum number of connections to create. Default is 1. The recommended value is 1.
Minimum Idle Connections Minimum number of connections to create and maintain. To define a fixed connection pool, set to the same value as Maximum Pool Size. Default is 1.
Connection Timeout (seconds) Maximum time to wait for a connection. Use a time constant in an expression to define the time increment. Default is 30 seconds, defined as follows:${30 * SECONDS}
Idle Timeout (seconds) Maximum time to allow a connection to idle. Use a time constant in an expression to define the time increment. Use 0 to avoid removing any idle connections.
When the entered value is close to or more than the maximum lifetime for a connection, Data Collector ignores the idle timeout.
Default is 10 minutes, defined as follows:${10 * MINUTES}
Max Connection Lifetime (seconds) Maximum lifetime for a connection. Use a time constant in an expression to define the time increment. Use 0 to set no maximum lifetime.
When a maximum lifetime is set, the minimum valid value is 30 minutes.
Default is 30 minutes, defined as follows:${30 * MINUTES}
Transaction Isolation Transaction isolation level used to connect to the database. Default is the default transaction isolation level set for the database. You can override the database default by setting the level to any of the following:
- Read committed
- Read uncommitted
- Repeatable read
- Serializable
Init Query SQL query to perform immediately after the stage connects to the database. Use to set up the database session as needed. The query is performed after each connection to the database. If the stage disconnects from the database during the pipeline run, for example if a network timeout occurrs, the stage performs the query again when it reconnects to the database.
Lowercase Column Names Creates columns with lowercased names. By default, the processor uses the column names as they appear in the record. Select this option to create all new columns with lowercased names.