GPSS Producer (deprecated)

The GPSS Producer destination writes data to Greenplum Database through a Greenplum Stream Server (GPSS). For information about supported versions, see Supported Systems and Versions in the Data Collector documentation.
Important: This stage is deprecated and may be removed in a future release.

When you configure the GPSS Producer destination, you specify the connection information for a Greenplum Database master and a Greenplum Stream Server, define the table to use, and optionally define field mappings. By default, the destination writes field data to columns with matching names.

The GPSS Producer destination can use CRUD operations defined in the sdc.operation.type record header attribute to write data. You can define a default operation for records without the header attribute or value. You can also configure how to handle records with unsupported operations. For information about Data Collector change data processing and a list of CDC-enabled origins, see Processing Changed Data.

Before you use the GPSS Producer destination, you must install the GPSS stage library and complete the other prerequisite tasks. The GPSS stage library is an Enterprise stage library. Releases of Enterprise stage libraries occur separately from Data Collector releases. For more information, see Enterprise Stage Libraries in the Data Collector documentation.

Prerequisites

Before using the GPSS Producer destination, complete the following prerequisites:

Install the GPSS Stage Library

You must install the GPSS stage library before using the GPSS Producer destination.

The GPSS stage library is an Enterprise stage library. Releases of Enterprise stage libraries occur separately from Data Collector releases. As a result, you must install Enterprise stage libraries on all Data Collector installations.
Note: Data Collector accessed through a cloud service provider marketplace automatically includes the latest version of this Enterprise stage library.

You can install Enterprise stage libraries using Package Manager for a tarball Data Collector installation or as custom stage libraries for a tarball, RPM, or Cloudera Manager Data Collector installation.

Supported Versions

The following table lists the versions of the GPSS Enterprise stage library to use with specific Data Collector versions:
Data Collector Version Supported Stage Library Version
Data Collector 3.8.2 and later GPSS Enterprise Library 1.0.x

Installing with Package Manager

You can use Package Manager to install the GPSS stage library on a tarball Data Collector installation.

  1. Click the Package Manager icon: .
  2. In the Navigation panel, click Enterprise Stage Libraries.
  3. Select GPSS Enterprise Library, then click the Install icon: .
  4. Click Install.
    Data Collector installs the selected stage library.
  5. Restart Data Collector.

Installing as a Custom Stage Library

You can install the GPSS Enterprise stage library as a custom stage library on a tarball, RPM, or Cloudera Manager Data Collector installation.

  1. To download the stage library, go to the StreamSets archives page.
  2. Under StreamSets Enterprise Connectors, click Enterprise Connectors.
  3. Click the Enterprise stage library name and version that you want to download.
    The stage library downloads.
  4. Install and manage the Enterprise stage library as a custom stage library.
    For more information, see Custom Stage Libraries in the Data Collector documentation.

Install, Configure, and Start GPSS in Greenplum Database

The Greenplum Stream Server (GPSS) manages communication and data transfer between the GPSS Producer destination and Greenplum Database. Before using the destination, you must install, configure, and start GPSS in the Greenplum Database cluster. For more information, see the Pivotol Greenplum documentation.

CRUD Operation Processing

The GPSS Producer destination can insert, update, or merge data. The destination writes the records based on the CRUD operation defined in a CRUD operation header attribute or in operation-related stage properties.

The destination uses the header attribute and stage properties as follows:

CRUD operation header attribute
The destination looks for the CRUD operation in the sdc.operation.type record header attribute.
The attribute can contain one of the following numeric values:
  • 1 for INSERT
  • 3 for UPDATE
  • 8 for MERGE
If your pipeline has a CRUD-enabled origin that processes changed data, the destination simply reads the operation type from the sdc.operation.type header attribute that the origin generates. If your pipeline has a non-CDC origin, you can use the Expression Evaluator processor or a scripting processor to define the record header attribute. For more information about Data Collector changed data processing and a list of CDC-enabled origins, see Processing Changed Data.
Operation stage properties
If there is no CRUD operation in the sdc.operation.type record header attribute, the destination uses the operation configured in the Default Operation property.
If the sdc.operation.type record header attribute contains an unsupported value, the destination takes the action configured in the Unsupported Operation Handling property. The destination can discard the record, send the record for error handling, or write the record using the default operation.

Configuring a GPSS Producer Destination

Configure the GPSS Producer destination to insert, update, or merge data in Greenplum Database through a Greenplum Stream Server (GPSS).
Important: This stage is deprecated and may be removed in a future release.

Before you use the GPSS Producer destination in a pipeline, complete the prerequisite tasks.

  1. In the Properties panel, on the General tab, configure the following properties:
    General Property Description
    Name Stage name.
    Description Optional description.
    Required Fields Fields that must include data for the record to be passed into the stage.
    Tip: You might include fields that the stage uses.

    Records that do not include all required fields are processed based on the error handling configured for the pipeline.

    Preconditions Conditions that must evaluate to TRUE to allow a record to enter the stage for processing. Click Add to create additional preconditions.

    Records that do not meet all preconditions are processed based on the error handling configured for the stage.

    On Record Error Error record handling for the stage:
    • Discard - Discards the record.
    • Send to Error - Sends the record to the pipeline for error handling.
    • Stop Pipeline - Stops the pipeline. Not valid for cluster pipelines.
  2. On the GPSS tab, configure the following properties:
    GPSS Property Description
    Greenplum Database Host Host name of the Greenplum Database master that the Greenplum Stream Server connects to.
    Greenplum Database Port Port that the Greenplum Stream Server uses to connect with the Greenplum Database master.
    GPSS Host Host name of the Greenplum Stream Server.
    GPSS Port Port that the destination uses to connect with the Greenplum Stream Server.
    Schema Name Name of the schema that contains the database and table to write data to.
    Database Name Name of the database that contains table to write data to.
    Table Name Name of the table to write data to.
    Unsupported Operation Handling Action to take when the CRUD operation type defined in the sdc.operation.type record header attribute is not supported:
    • Discard - Discards the record.
    • Send to Error - Sends the record to the pipeline for error handling.
    • Use Default Operation - Writes the record to the destination system using the default operation.
    Default Operation Default CRUD operation to perform if the sdc.operation.type record header attribute is not set.
    Field to Column Mapping Mappings between record fields and database table columns. By default, the destination maps fields to columns with the same name. Specify the following properties:
    • Column Name - Name of a column in the database table.
    • SDC Field - Field in the Data Collector record.
    • Default Value - Value written when record contains no value.
    • Greenplum Data Type - Data type to write. If not specified, writes the data type specified in the schema for the column.
    Primary Key Fields List of table columns that designate the primary key. The destination updates or merges the database row with data from the record when values in the mapped record fields match values in the listed columns.
  3. On the Credentials tab, configure the following properties:
    Credentials Property Description
    Greenplum Username User name to access the Greenplum Stream Server and Greenplum Database.
    Greenplum Password Password for the user name.
    Tip: To secure sensitive information such as user names and passwords, you can use runtime resources or credential stores. For more information about credential stores, see Credential Stores in the Data Collector documentation.