GPSS Producer (deprecated)
When you configure the GPSS Producer destination, you specify the connection information for a Greenplum Database master and a Greenplum Stream Server, define the table to use, and optionally define field mappings. By default, the destination writes field data to columns with matching names.
The GPSS Producer destination can use CRUD operations defined in the
sdc.operation.type
record header attribute to write
data. You can define a default operation for records without the header
attribute or value. You can also configure how to handle records with
unsupported operations.
For information about Data Collector change data
processing and a list of CDC-enabled origins, see Processing Changed Data.
Before you use the GPSS Producer destination, you must install the GPSS stage library and complete the other prerequisite tasks. The GPSS stage library is an Enterprise stage library. Releases of Enterprise stage libraries occur separately from Data Collector releases. For more information, see Enterprise Stage Libraries in the Data Collector documentation.
Prerequisites
Install the GPSS Stage Library
You must install the GPSS stage library before using the GPSS Producer destination.
You can install Enterprise stage libraries using Package Manager for a tarball Data Collector installation or as custom stage libraries for a tarball, RPM, or Cloudera Manager Data Collector installation.
Supported Versions
Data Collector Version | Supported Stage Library Version |
---|---|
Data Collector 3.8.2 and later | GPSS Enterprise Library 1.0.x |
Installing with Package Manager
You can use Package Manager to install the GPSS stage library on a tarball Data Collector installation.
-
Click the Package Manager icon:
.
- In the Navigation panel, click Enterprise Stage Libraries.
-
Select GPSS Enterprise Library, then click the
Install icon:
.
-
Click Install.
Data Collector installs the selected stage library.
- Restart Data Collector.
Installing as a Custom Stage Library
You can install the GPSS Enterprise stage library as a custom stage library on a tarball, RPM, or Cloudera Manager Data Collector installation.
- To download the stage library, go to the StreamSets archives page.
- Under StreamSets Enterprise Connectors, click Enterprise Connectors.
-
Click the Enterprise stage library name and version that you want to
download.
The stage library downloads.
-
Install and manage the Enterprise stage library as a custom stage library.
For more information, see Custom Stage Libraries in the Data Collector documentation.
Install, Configure, and Start GPSS in Greenplum Database
The Greenplum Stream Server (GPSS) manages communication and data transfer between the GPSS Producer destination and Greenplum Database. Before using the destination, you must install, configure, and start GPSS in the Greenplum Database cluster. For more information, see the Pivotol Greenplum documentation.
CRUD Operation Processing
The GPSS Producer destination can insert, update, or merge data. The destination writes the records based on the CRUD operation defined in a CRUD operation header attribute or in operation-related stage properties.
The destination uses the header attribute and stage properties as follows:
- CRUD operation header attribute
- The destination
looks for the CRUD operation in the
sdc.operation.type
record header attribute. - Operation stage properties
- If there is no CRUD operation in the
sdc.operation.type
record header attribute, the destination uses the operation configured in the Default Operation property.
Configuring a GPSS Producer Destination
Before you use the GPSS Producer destination in a pipeline, complete the prerequisite tasks.
-
In the Properties panel, on the General tab, configure the
following properties:
General Property Description Name Stage name. Description Optional description. Required Fields Fields that must include data for the record to be passed into the stage. Tip: You might include fields that the stage uses.Records that do not include all required fields are processed based on the error handling configured for the pipeline.
Preconditions Conditions that must evaluate to TRUE to allow a record to enter the stage for processing. Click Add to create additional preconditions. Records that do not meet all preconditions are processed based on the error handling configured for the stage.
On Record Error Error record handling for the stage: - Discard - Discards the record.
- Send to Error - Sends the record to the pipeline for error handling.
- Stop Pipeline - Stops the pipeline. Not valid for cluster pipelines.
-
On the GPSS tab, configure the following properties:
GPSS Property Description Greenplum Database Host Host name of the Greenplum Database master that the Greenplum Stream Server connects to. Greenplum Database Port Port that the Greenplum Stream Server uses to connect with the Greenplum Database master. GPSS Host Host name of the Greenplum Stream Server. GPSS Port Port that the destination uses to connect with the Greenplum Stream Server. Schema Name Name of the schema that contains the database and table to write data to. Database Name Name of the database that contains table to write data to. Table Name Name of the table to write data to. Unsupported Operation Handling Action to take when the CRUD operation type defined in the sdc.operation.type
record header attribute is not supported:- Discard - Discards the record.
- Send to Error - Sends the record to the pipeline for error handling.
- Use Default Operation - Writes the record to the destination system using the default operation.
Default Operation Default CRUD operation to perform if the sdc.operation.type
record header attribute is not set.Field to Column Mapping Mappings between record fields and database table columns. By default, the destination maps fields to columns with the same name. Specify the following properties: - Column Name - Name of a column in the database table.
- SDC Field - Field in the Data Collector record.
- Default Value - Value written when record contains no value.
- Greenplum Data Type - Data type to write. If not specified, writes the data type specified in the schema for the column.
Primary Key Fields List of table columns that designate the primary key. The destination updates or merges the database row with data from the record when values in the mapped record fields match values in the listed columns. -
On the Credentials tab, configure the following
properties:
Credentials Property Description Greenplum Username User name to access the Greenplum Stream Server and Greenplum Database. Greenplum Password Password for the user name. Tip: To secure sensitive information such as user names and passwords, you can use runtime resources or credential stores. For more information about credential stores, see Credential Stores in the Data Collector documentation.