Google Cloud Storage
The Google Cloud Storage executor performs a task in Google Cloud Storage each time it receives an event. For information about supported versions, see Supported Systems and Versions in the Data Collector documentation.
- Create a new object for the specified content.
- Copy an existing object to another location in the same project.
- Move an existing object to another location in the same project.
- Add metadata to an existing object.
Each Google Cloud Storage executor can perform one type of task. To perform additional tasks, use additional executors.
Use the Google Cloud Storage executor as part of an event stream. You can use the executor in any logical way, such as moving objects after they are read by the Google Cloud Storage origin or adding metadata to objects after they are written by the Google Cloud Storage destination.
When you configure the Google Cloud Storage executor, you specify the project ID and the credentials to use to connect. You can also use a connection to configure the executor.
When creating new objects, you specify the location for the objects, and the content and optional metadata for the objects. When copying or moving objects, you specify the source and target location for the objects and optional metadata to add. When adding metadata to an existing object, you specify the metadata to use.
You can also configure the executor to generate events for another event stream. For more information about dataflow triggers and the event framework, see Dataflow Triggers Overview.
Credentials
To connect to Google Cloud Storage, the Google Cloud Storage executor must pass credentials to Google Cloud Storage.
- Google Cloud default credentials
- Credentials in a file
- Credentials in a stage property
For details on how to configure each option, see Security in Google Cloud Stages.
Create New Objects
You can use the Google Cloud Storage executor to create a new Google Cloud Storage object and write the specified content to the object when the executor receives an event record.
When you create an object, you specify where to create the object and the content to write to the object. You can use an expression to represent both the location for the object and the content to use. You can also specify metadata to include with the object.
For example, say you want the executor to create a new Google Cloud Storage object for each object that the Google Cloud Storage destination writes, and to use the new object to store the record count information for each written object.
bucket
field and the object path in an objectKey
field. So, to create a new record-count object in the same bucket as the written object,
you can use the following expression for the Object property, as
follows:${record:value('/bucket')}/${record:value('/objectKey')}.recordcount
${record:value('/recordCount')}
Copy or Move Objects
You can use the Google Cloud Storage executor to copy or move an object to another location when the executor receives an event record.
To copy or move objects, you specify the properties that define the location of the object to be copied, and the target location for the copy. The target location must be within the same project as the source location. You can use expressions to define both locations.
You can configure the executor to include metadata when copying or moving the object. If you configure the executor to include metadata while copying an object, only the copied object receives the metadata.
For example, you can use a Google Cloud Storage executor to move each object written by a Google Cloud Storage destination to a Completed directory after it is closed. To do this, you configure the Google Cloud Storage destination to generate events.
bucket
field and the object path in an objectKey
field. You can use this information to configure the source location properties in the
executor as follows:- Source Bucket:
${record:value('/bucket')}
- Source Object:
${record:value('/objectKey')}
- Target Bucket:
${record:value('/bucket')}
- Target Object:
completed/${record:value('/objectKey')}
To do something more complicated, like move only the subset of objects with a
_west
suffix to a different location, you can add a Stream Selector
processor to route only events where the objectKey
field includes the
suffix to the Google Cloud Storage executor.
Set Metadata
When you create, copy, or move objects, you can define metadata for the objects at the same time. You can also use the Set Metadata option to add metadata to existing Google Cloud Storage objects as the primary task.
When you define metadata, you specify one or more key-value pairs that you can use to
categorize objects, such as product: <product>
. You can use
expressions to define the keys and the values.
recordCount
field in the event record, as
follows:key: processed records
value: ${record:value('/recordCount')}
For more information about metadata, see the Google Cloud Storage documentation.
Event Generation
The Google Cloud Storage executor can generate events that you can use in an event stream. When you enable event generation, the executor generates events each time it performs a task, such as creating or moving an object.
- With the Email executor to send a custom email
after receiving an event.
For an example, see Sending Email During Pipeline Processing.
- With a destination to store event information.
For an example, see Preserving an Audit Trail of Events.
For more information about dataflow triggers and the event framework, see Dataflow Triggers Overview.
Event Records
Record Header Attribute | Description |
---|---|
sdc.event.type | Event type. Uses the following event types:
|
sdc.event.version | Integer that indicates the version of the event record type. |
sdc.event.creation_timestamp | Epoch timestamp when the stage created the event. |
- gcs-object-created
-
The executor generates a gcs-object-created event record when it creates a new object.
These event records have thesdc.event.type
record header attribute set togcs-object-created
and include the following fields:Event Field Name Description object.bucket Bucket where the object was created. object.name Location and name of the object.
- gcs-object-copied
The executor generates a gcs-object-copied event record after it copies an object.
These event records have thesdc.event.type
record header attribute set togcs-object-copied
and include the following fields:Event Field Name Description source.object.bucket Bucket for the object that was copied. source.object.name Location and name of the object that was copied. target.object.bucket Bucket where the object was copied to. target.object.name Target location and name for the copy. - gcs-object-moved
The executor generates a gcs-object-moved event record after it copies an object.
These event records have thesdc.event.type
record header attribute set togcs-object-moved
and include the following fields:Event Field Name Description source.object.bucket Original bucket for the object that was moved. source.object.name Original location and name of the object. target.object.bucket Bucket where the object was moved to. target.object.name New location and name for the object. - gcs-object-changed
The executor generates a gcs-object-changed event record after it adds metadata to an existing object as the primary task. This event record is not generated when adding metadata while creating, copying, or moving objects.
These event records have thesdc.event.type
record header attribute set togcs-object-changed
and include the following fields:Event Field Name Description object.bucket Bucket of the object that was changed. object.name Location and name of the object that was changed.
Configuring a Google Cloud Storage Executor
Configure a Google Cloud Storage executor to perform tasks in Google Cloud Storage upon receiving event records.
-
In the Properties panel, on the General tab, configure the
following properties:
General Property Description Name Stage name. Description Optional description. Produce Events Generates event records when events occur. Use for event handling. Required Fields Fields that must include data for the record to be passed into the stage. Tip: You might include fields that the stage uses.Records that do not include all required fields are processed based on the error handling configured for the pipeline.
Preconditions Conditions that must evaluate to TRUE to allow a record to enter the stage for processing. Click Add to create additional preconditions. Records that do not meet all preconditions are processed based on the error handling configured for the stage.
On Record Error Error record handling for the stage: - Discard - Discards the record.
- Send to Error - Sends the record to the pipeline for error handling.
- Stop Pipeline - Stops the pipeline.
-
On the Tasks tab, configure the following
properties:
Task Property Description Task Task to perform upon receiving an event record. Select one of the following options: - Create New Object - Use to create a new Google Cloud Storage object with the configured content.
- Copy Object - Use to copy a Google Cloud Storage object to another location in the same project.
- Move Object - Use to move a Google Cloud Storage object to another location in the same project.
- Set Object Metadata - Use to add metadata to an existing Google Cloud Storage object.
Bucket Bucket for an object. You can use an expression to define the object. Available when creating objects or adding metadata to existing objects.
Object Path to the object to use. You can use an expression to define the object. For example, to use the object whose closure by the Google Cloud Storage destination generated the event record, use the following expression:
${record:value('/bucket')}/${record:value('/objectKey)}
To use a whole file whose closure generated the event record, use the following expression:${record:value('/targetFileInfo/bucket')}/${record:value('/targetFileInfo/objectKey)}
Available when creating objects or adding metadata to existing objects as the primary task.
Content The content to write to new objects. You can use an expression to represent the content to use. Source Bucket Bucket of the object to be copied or moved. You can use an expression to define the source bucket. Source Object Location and name of the object to be copied or moved. You can use an expression to define the source object. Target Bucket Bucket for an object to be copied or moved to. You can use an expression to define the target bucket. Target Object Target location and name for the object being copied or moved. You can use an expression to define the target object. Metadata Metadata to add to an existing object, or to add to an object while creating, copying, or moving the object. Using simple or bulk edit mode, click the Add icon to define the metadata to use.
You can specify just the key or specify a key-value pair. You can use expressions to define the keys and values.
-
On the Credentials tab, configure the following
properties:
Credentials Property Description Connection Connection that defines the information required to connect to an external system. To connect to an external system, you can select a connection that contains the details, or you can directly enter the details in the pipeline. When you select a connection, Control Hub hides other properties so that you cannot directly enter connection details in the pipeline.
Project ID Google Cloud project ID to use.
Credentials Provider Provider for Google Cloud credentials: - Default credentials provider - Uses Google Cloud default credentials.
- Service account credentials file (JSON) - Uses credentials stored in a JSON service account credentials file.
- Service account credentials (JSON) - Uses JSON-formatted credentials information from a service account credentials file.
Credentials File Path (JSON) Path to the Google Cloud service account credentials file used to connect. The credentials file must be a JSON file. Enter a path relative to the Data Collector resources directory,
$SDC_RESOURCES
, or enter an absolute path.Credentials File Content (JSON) Contents of a Google Cloud service account credentials JSON file used to connect. Enter JSON-formatted credential information in plain text, or use an expression to call the information from runtime resources or a credential store. For more information about credential stores, see Credential Stores in the Data Collector documentation.