GPSS Producer (deprecated)
Supported pipeline types:
|
When you configure the GPSS Producer destination, you specify the connection information for a Greenplum Database master and a Greenplum Stream Server, define the table to use, and optionally define field mappings. By default, the destination writes field data to columns with matching names.
The GPSS Producer destination can use CRUD operations defined in the
sdc.operation.type
record header attribute to write
data. You can define a default operation for records without the header
attribute or value. You can also configure how to handle records with
unsupported operations.
For information about Data Collector change data
processing and a list of CDC-enabled origins, see Processing Changed Data.
Before you use the GPSS Producer destination, you must install the GPSS stage library as a custom stage library and complete the other prerequisite tasks.
Prerequisites
Install the GPSS Stage Library as a Custom Stage Library
You must install the GPSS stage library before using the GPSS Producer destination. You can install the GPSS stage library as a custom stage library for a tarball, RPM, or Cloudera Manager Data Collector installation.
- To download the stage library, go to the StreamSets archives page.
- Under StreamSets Enterprise Connectors, click Enterprise Connectors.
-
Click the stage library name and version that you want to download.
The stage library downloads.
-
Install and manage the stage library as a custom stage library.
For more information, see Custom Stage Libraries.
Install, Configure, and Start GPSS in Greenplum Database
The Greenplum Stream Server (GPSS) manages communication and data transfer between the GPSS Producer destination and Greenplum Database. Before using the destination, you must install, configure, and start GPSS in the Greenplum Database cluster. For more information, see the Pivotol Greenplum documentation.
CRUD Operation Processing
The GPSS Producer destination can insert, update, or merge data. The destination writes the records based on the CRUD operation defined in a CRUD operation header attribute or in operation-related stage properties.
The destination uses the header attribute and stage properties as follows:
- CRUD operation header attribute
- The destination
looks for the CRUD operation in the
sdc.operation.type
record header attribute. - Operation stage properties
- If there is no CRUD operation in the
sdc.operation.type
record header attribute, the destination uses the operation configured in the Default Operation property.
Configuring a GPSS Producer Destination
Before you use the GPSS Producer destination in a pipeline, complete the prerequisite tasks.
-
In the Properties panel, on the General tab, configure the
following properties:
General Property Description Name Stage name. Description Optional description. Required Fields Fields that must include data for the record to be passed into the stage. Tip: You might include fields that the stage uses.Records that do not include all required fields are processed based on the error handling configured for the pipeline.
Preconditions Conditions that must evaluate to TRUE to allow a record to enter the stage for processing. Click Add to create additional preconditions. Records that do not meet all preconditions are processed based on the error handling configured for the stage.
On Record Error Error record handling for the stage: - Discard - Discards the record.
- Send to Error - Sends the record to the pipeline for error handling.
- Stop Pipeline - Stops the pipeline. Not valid for cluster pipelines.
-
On the GPSS tab, configure the following properties:
GPSS Property Description Greenplum Database Host Host name of the Greenplum Database master that the Greenplum Stream Server connects to. Greenplum Database Port Port that the Greenplum Stream Server uses to connect with the Greenplum Database master. GPSS Host Host name of the Greenplum Stream Server. GPSS Port Port that the destination uses to connect with the Greenplum Stream Server. Schema Name Name of the schema that contains the database and table to write data to. Database Name Name of the database that contains table to write data to. Table Name Name of the table to write data to. Unsupported Operation Handling Action to take when the CRUD operation type defined in the sdc.operation.type
record header attribute is not supported:- Discard - Discards the record.
- Send to Error - Sends the record to the pipeline for error handling.
- Use Default Operation - Writes the record to the destination system using the default operation.
Default Operation Default CRUD operation to perform if the sdc.operation.type
record header attribute is not set.Field to Column Mapping Mappings between record fields and database table columns. By default, the destination maps fields to columns with the same name. Specify the following properties: - Column Name - Name of a column in the database table.
- SDC Field - Field in the Data Collector record.
- Default Value - Value written when record contains no value.
- Greenplum Data Type - Data type to write. If not specified, writes the data type specified in the schema for the column.
Primary Key Fields List of table columns that designate the primary key. The destination updates or merges the database row with data from the record when values in the mapped record fields match values in the listed columns. -
On the Credentials tab, configure the following
properties:
Credentials Property Description Greenplum Username User name to access the Greenplum Stream Server and Greenplum Database. Greenplum Password Password for the user name. Tip: To secure sensitive information such as user names and passwords, you can use runtime resources or credential stores.