Google Cloud Storage

Supported pipeline types:
  • Data Collector

The Google Cloud Storage origin reads objects stored in Google Cloud Storage. The objects must be fully written and reside in a single bucket. The object names must share a prefix pattern. For information about supported versions, see Supported Systems and VersionsSupported Systems and Versions in the Data Collector documentation.

With the Google Cloud Storage origin, you define the bucket, prefix pattern, and optional common prefix. These properties determine the objects that the origin processes.

You also define the project ID and credentials to use when connecting to Google Cloud Storage.

When an error occurs while processing an object, the origin can keep, archive, or delete the object. When archiving, the origin can copy or move the object.

When the pipeline stops, the Google Cloud Storage origin notes where it stops reading. When the pipeline starts again, the origin continues processing from where it stopped by default. You can reset the origin to process all requested objects.

You can also use a connectionconnection to configure the origin.

The origin can generate events for an event stream. For more information about dataflow triggers and the event framework, see Dataflow Triggers Overview.