Google Cloud Storage

The Google Cloud Storage executor performs a task in Google Cloud Storage each time it receives an event. For information about supported versions, see Supported Systems and Versions in the Data Collector documentation.

Upon receiving an event, the executor can perform one of the following tasks:
  • Create a new object for the specified content.
  • Copy an existing object to another location in the same project.
  • Move an existing object to another location in the same project.
  • Add metadata to an existing object.

Each Google Cloud Storage executor can perform one type of task. To perform additional tasks, use additional executors.

Use the Google Cloud Storage executor as part of an event stream. You can use the executor in any logical way, such as moving objects after they are read by the Google Cloud Storage origin or adding metadata to objects after they are written by the Google Cloud Storage destination.

When you configure the Google Cloud Storage executor, you specify the project ID and the credentials to use to connect. You can also use a connection to configure the executor.

When creating new objects, you specify the location for the objects, and the content and optional metadata for the objects. When copying or moving objects, you specify the source and target location for the objects and optional metadata to add. When adding metadata to an existing object, you specify the metadata to use.

You can also configure the executor to generate events for another event stream. For more information about dataflow triggers and the event framework, see Dataflow Triggers Overview.

Credentials

To connect to Google Cloud Storage, the Google Cloud Storage executor must pass credentials to Google Cloud Storage.

You can provide credentials using one the following options:
  • Google Cloud default credentials
  • Credentials in a file
  • Credentials in a stage property

For details on how to configure each option, see Security in Google Cloud Stages.

Create New Objects

You can use the Google Cloud Storage executor to create a new Google Cloud Storage object and write the specified content to the object when the executor receives an event record.

When you create an object, you specify where to create the object and the content to write to the object. You can use an expression to represent both the location for the object and the content to use. You can also specify metadata to include with the object.

For example, say you want the executor to create a new Google Cloud Storage object for each object that the Google Cloud Storage destination writes, and to use the new object to store the record count information for each written object.

The destination generates a GCS Object Written event record each time that it writes an object. The record includes the bucket for the written object in a bucket field and the object path in an objectKey field. So, to create a new record-count object in the same bucket as the written object, you can use the following expression for the Object property, as follows:
${record:value('/bucket')}/${record:value('/objectKey')}.recordcount
The GCS Object Written event record also includes the number of records written to the object. So, to write this information to the new object, you can use the following expression for the Content property, as follows:
${record:value('/recordCount')}
Tip: Stage-generated event records differ from stage to stage. For a description of stage events, see "Event Record" in the documentation for the event-generating stage. For a description of pipeline events, see Pipeline Event Records.

Copy or Move Objects

You can use the Google Cloud Storage executor to copy or move an object to another location when the executor receives an event record.

To copy or move objects, you specify the properties that define the location of the object to be copied, and the target location for the copy. The target location must be within the same project as the source location. You can use expressions to define both locations.

You can configure the executor to include metadata when copying or moving the object. If you configure the executor to include metadata while copying an object, only the copied object receives the metadata.

For example, you can use a Google Cloud Storage executor to move each object written by a Google Cloud Storage destination to a Completed directory after it is closed. To do this, you configure the Google Cloud Storage destination to generate events.

The destination generates a GCS Object Written event record each time that it writes an object. The record includes the bucket for the written object in a bucket field and the object path in an objectKey field. You can use this information to configure the source location properties in the executor as follows:
  • Source Bucket: ${record:value('/bucket')}
  • Source Object: ${record:value('/objectKey')}
Then, to move the object to a Completed directory and retain the same object name, you can configure the target location properties as follows:
  • Target Bucket: ${record:value('/bucket')}
  • Target Object: completed/${record:value('/objectKey')}

To do something more complicated, like move only the subset of objects with a _west suffix to a different location, you can add a Stream Selector processor to route only events where the objectKey field includes the suffix to the Google Cloud Storage executor.

Set Metadata

When you create, copy, or move objects, you can define metadata for the objects at the same time. You can also use the Set Metadata option to add metadata to existing Google Cloud Storage objects as the primary task.

When you define metadata, you specify one or more key-value pairs that you can use to categorize objects, such as product: <product>. You can use expressions to define the keys and the values.

For example, you can use an expression to specify the number of records that were written to an object based on the recordCount field in the event record, as follows:
key: processed records
value: ${record:value('/recordCount')}

For more information about metadata, see the Google Cloud Storage documentation.

Event Generation

The Google Cloud Storage executor can generate events that you can use in an event stream. When you enable event generation, the executor generates events each time it performs a task, such as creating or moving an object.

Google Cloud Storage executor events can be used in any logical way. For example:

For more information about dataflow triggers and the event framework, see Dataflow Triggers Overview.

Event Records

Event records generated by the Google Cloud Storage executor have the following event-related record header attributes. Record header attributes are stored as String values.
Record Header Attribute Description
sdc.event.type Event type. Uses the following event types:
  • gcs-object-created - Generated after the executor creates a new object.
  • gcs-object-copied - Generated after the executor copies an object to a new location.
  • gcs-object-moved - Generated after the executor moves an object to a new location.
  • gcs-object-changed - Generated after the executor adds metadata to an existing object as the primary task. Not generated when adding metadata while creating, copying, or moving objects.
sdc.event.version Integer that indicates the version of the event record type.
sdc.event.creation_timestamp Epoch timestamp when the stage created the event.
The Google Cloud Storage executor can generate the following types of event records:
gcs-object-created

The executor generates a gcs-object-created event record when it creates a new object.

These event records have the sdc.event.type record header attribute set to gcs-object-created and include the following fields:
Event Field Name Description
object.bucket Bucket where the object was created.
object.name Location and name of the object.
gcs-object-copied

The executor generates a gcs-object-copied event record after it copies an object.

These event records have the sdc.event.type record header attribute set to gcs-object-copied and include the following fields:
Event Field Name Description
source.object.bucket Bucket for the object that was copied.
source.object.name Location and name of the object that was copied.
target.object.bucket Bucket where the object was copied to.
target.object.name Target location and name for the copy.
gcs-object-moved

The executor generates a gcs-object-moved event record after it copies an object.

These event records have the sdc.event.type record header attribute set to gcs-object-moved and include the following fields:
Event Field Name Description
source.object.bucket Original bucket for the object that was moved.
source.object.name Original location and name of the object.
target.object.bucket Bucket where the object was moved to.
target.object.name New location and name for the object.
gcs-object-changed

The executor generates a gcs-object-changed event record after it adds metadata to an existing object as the primary task. This event record is not generated when adding metadata while creating, copying, or moving objects.

These event records have the sdc.event.type record header attribute set to gcs-object-changed and include the following fields:
Event Field Name Description
object.bucket Bucket of the object that was changed.
object.name Location and name of the object that was changed.

Configuring a Google Cloud Storage Executor

Configure a Google Cloud Storage executor to perform tasks in Google Cloud Storage upon receiving event records.

  1. In the Properties panel, on the General tab, configure the following properties:
    General Property Description
    Name Stage name.
    Description Optional description.
    Produce Events Generates event records when events occur. Use for event handling.
    Required Fields Fields that must include data for the record to be passed into the stage.
    Tip: You might include fields that the stage uses.

    Records that do not include all required fields are processed based on the error handling configured for the pipeline.

    Preconditions Conditions that must evaluate to TRUE to allow a record to enter the stage for processing. Click Add to create additional preconditions.

    Records that do not meet all preconditions are processed based on the error handling configured for the stage.

    On Record Error Error record handling for the stage:
    • Discard - Discards the record.
    • Send to Error - Sends the record to the pipeline for error handling.
    • Stop Pipeline - Stops the pipeline.
  2. On the Tasks tab, configure the following properties:
    Task Property Description
    Task Task to perform upon receiving an event record. Select one of the following options:
    • Create New Object - Use to create a new Google Cloud Storage object with the configured content.
    • Copy Object - Use to copy a Google Cloud Storage object to another location in the same project.
    • Move Object - Use to move a Google Cloud Storage object to another location in the same project.
    • Set Object Metadata - Use to add metadata to an existing Google Cloud Storage object.
    Bucket Bucket for an object. You can use an expression to define the object.

    Available when creating objects or adding metadata to existing objects.

    Object Path to the object to use. You can use an expression to define the object.

    For example, to use the object whose closure by the Google Cloud Storage destination generated the event record, use the following expression:

    ${record:value('/bucket')}/${record:value('/objectKey)}
    To use a whole file whose closure generated the event record, use the following expression:
    ${record:value('/targetFileInfo/bucket')}/${record:value('/targetFileInfo/objectKey)}

    Available when creating objects or adding metadata to existing objects as the primary task.

    Content The content to write to new objects. You can use an expression to represent the content to use.
    Source Bucket Bucket of the object to be copied or moved. You can use an expression to define the source bucket.
    Source Object Location and name of the object to be copied or moved. You can use an expression to define the source object.
    Target Bucket Bucket for an object to be copied or moved to. You can use an expression to define the target bucket.
    Target Object Target location and name for the object being copied or moved. You can use an expression to define the target object.
    Metadata Metadata to add to an existing object, or to add to an object while creating, copying, or moving the object.

    Using simple or bulk edit mode, click the Add icon to define the metadata to use.

    You can specify just the key or specify a key-value pair. You can use expressions to define the keys and values.

  3. On the Credentials tab, configure the following properties:
    Credentials Property Description
    Connection Connection that defines the information required to connect to an external system.

    To connect to an external system, you can select a connection that contains the details, or you can directly enter the details in the pipeline. When you select a connection, Control Hub hides other properties so that you cannot directly enter connection details in the pipeline.

    Project ID

    Google Cloud project ID to use.

    Credentials Provider Provider for Google Cloud credentials:
    • Default credentials provider - Uses Google Cloud default credentials.
    • Service account credentials file (JSON) - Uses credentials stored in a JSON service account credentials file.
    • Service account credentials (JSON) - Uses JSON-formatted credentials information from a service account credentials file.
    Credentials File Path (JSON) Path to the Google Cloud service account credentials file used to connect. The credentials file must be a JSON file.

    Enter a path relative to the Data Collector resources directory, $SDC_RESOURCES, or enter an absolute path.

    Credentials File Content (JSON) Contents of a Google Cloud service account credentials JSON file used to connect.

    Enter JSON-formatted credential information in plain text, or use an expression to call the information from runtime resources or a credential store. For more information about credential stores, see Credential Stores in the Data Collector documentation.