GPSS Producer (deprecated)

The GPSS Producer destination writes data to Greenplum Database through a Greenplum Stream Server (GPSS). For information about supported versions, see Supported Systems and Versions in the Data Collector documentation.

Important: This stage is deprecated and may be removed in a future release.

When you configure the GPSS Producer destination, you specify the connection information for a Greenplum Database master and a Greenplum Stream Server, define the table to use, and optionally define field mappings. By default, the destination writes field data to columns with matching names.

The GPSS Producer destination can use CRUD operations defined in the sdc.operation.type record header attribute to write data. You can define a default operation for records without the header attribute or value. You can also configure how to handle records with unsupported operations. For information about Data Collector change data processing and a list of CDC-enabled origins, see Processing Changed Data.

Before you use the GPSS Producer destination, you must install the GPSS stage library as a custom stage library and complete the other prerequisite tasks.

Prerequisites

Before using the GPSS Producer destination, complete the following prerequisites:

Install the GPSS Stage Library as a Custom Stage Library

You must install the GPSS stage library before using the GPSS Producer destination. You can install the GPSS stage library as a custom stage library for a tarball, RPM, or Cloudera Manager Data Collector installation.

To download the stage library, go to the StreamSets archives page.
Under StreamSets Enterprise Connectors, click Enterprise Connectors.
Click the stage library name and version that you want to download.
The stage library downloads.
Install and manage the stage library as a custom stage library.
For more information, see Custom Stage Libraries in the Data Collector documentation.

Install, Configure, and Start GPSS in Greenplum Database

The Greenplum Stream Server (GPSS) manages communication and data transfer between the GPSS Producer destination and Greenplum Database. Before using the destination, you must install, configure, and start GPSS in the Greenplum Database cluster. For more information, see the Pivotol Greenplum documentation.

CRUD Operation Processing

The GPSS Producer destination can insert, update, or merge data. The destination writes the records based on the CRUD operation defined in a CRUD operation header attribute or in operation-related stage properties.

The destination uses the header attribute and stage properties as follows:

CRUD operation header attribute

The destination looks for the CRUD operation in the sdc.operation.type record header attribute.

The attribute can contain one of the following numeric values:

1 for INSERT
3 for UPDATE
8 for MERGE

If your pipeline has a CRUD-enabled origin that processes changed data, the destination simply reads the operation type from the sdc.operation.type header attribute that the origin generates. If your pipeline has a non-CDC origin, you can use the Expression Evaluator processor or a scripting processor to define the record header attribute. For more information about Data Collector changed data processing and a list of CDC-enabled origins, see Processing Changed Data.

Operation stage properties

If there is no CRUD operation in the sdc.operation.type record header attribute, the destination uses the operation configured in the Default Operation property.

If the sdc.operation.type record header attribute contains an unsupported value, the destination takes the action configured in the Unsupported Operation Handling property. The destination can discard the record, send the record for error handling, or write the record using the default operation.

Configuring a GPSS Producer Destination

Configure the GPSS Producer destination to insert, update, or merge data in Greenplum Database through a Greenplum Stream Server (GPSS).

Important: This stage is deprecated and may be removed in a future release.

Before you use the GPSS Producer destination in a pipeline, complete the prerequisite tasks.

In the Properties panel, on the General tab, configure the following properties:


General Property	Description
Name	Stage name.
Description	Optional description.
Required Fields	Fields that must include data for the record to be passed into the stage. Tip: You might include fields that the stage uses. Records that do not include all required fields are processed based on the error handling configured for the pipeline.
Preconditions	Conditions that must evaluate to TRUE to allow a record to enter the stage for processing. Click Add to create additional preconditions. Records that do not meet all preconditions are processed based on the error handling configured for the stage.
On Record Error	Error record handling for the stage: Discard - Discards the record. Send to Error - Sends the record to the pipeline for error handling. Stop Pipeline - Stops the pipeline. Not valid for cluster pipelines.

On the GPSS tab, configure the following properties:


GPSS Property	Description
Greenplum Database Host	Host name of the Greenplum Database master that the Greenplum Stream Server connects to.
Greenplum Database Port	Port that the Greenplum Stream Server uses to connect with the Greenplum Database master.
GPSS Host	Host name of the Greenplum Stream Server.
GPSS Port	Port that the destination uses to connect with the Greenplum Stream Server.
Schema Name	Name of the schema that contains the database and table to write data to.
Database Name	Name of the database that contains table to write data to.
Table Name	Name of the table to write data to.
Unsupported Operation Handling	Action to take when the CRUD operation type defined in the `sdc.operation.type` record header attribute is not supported: Discard - Discards the record. Send to Error - Sends the record to the pipeline for error handling. Use Default Operation - Writes the record to the destination system using the default operation.
Default Operation	Default CRUD operation to perform if the `sdc.operation.type` record header attribute is not set.
Field to Column Mapping	Mappings between record fields and database table columns. By default, the destination maps fields to columns with the same name. Specify the following properties: Column Name - Name of a column in the database table. SDC Field - Field in the Data Collector record. Default Value - Value written when record contains no value. Greenplum Data Type - Data type to write. If not specified, writes the data type specified in the schema for the column.
Primary Key Fields	List of table columns that designate the primary key. The destination updates or merges the database row with data from the record when values in the mapped record fields match values in the listed columns.

On the Credentials tab, configure the following properties:


Credentials Property	Description
Greenplum Username	User name to access the Greenplum Stream Server and Greenplum Database.
Greenplum Password	Password for the user name. Tip: To secure sensitive information such as user names and passwords, you can use runtime resources or credential stores. For more information about credential stores, see Credential Stores in the Data Collector documentation.