Google BigQuery (Legacy) (deprecated)

The Google BigQuery (Legacy) destination streams data into existing datasets and tables in Google BigQuery. For information about supported versions, see Supported Systems and Versions in the Data Collector documentation.
Important: This stage is deprecated and may be removed in a future release. To write to Google BigQuery, use the Google BigQuery destination, which can process change data capture (CDC) data and handle data drift.

When you configure the destination, you define the existing BigQuery dataset and table to stream data into. The destination streams each record into a row in a BigQuery table. You can optionally define an expression to specify the insert ID to insert or update. The insert ID is a unique ID for each row. If you do not specify an insert ID, the destination inserts each record into a new row.

The destination maps fields from records to BigQuery columns based on matching names and compatible data types. You can configure the destination to ignore invalid columns when the destination cannot map a field to an existing BigQuery column. You can also configure the table cache size for the destination.

You define the project ID and credentials to use when connecting to Google BigQuery.

You can also use a connection to configure the destination.

For more information about streaming data into Google BigQuery, see the Google BigQuery documentation.

BigQuery Data Types

The Google BigQuery (Legacy) destination maps fields from records to BigQuery columns in existing tables based on matching names and compatible data types. If needed, the destination converts Data Collector data types to BigQuery data types.

The following table lists the Data Collector data types and the BigQuery data types that the destination converts them to:

Data Collector Data Type BigQuery Data Type
Boolean Boolean
Byte Array Bytes
Date Date
Datetime Datetime or Timestamp
Decimal Decimal
Double Float
Float Float
Integer Integer
List Array
List-Map Record with repeated fields
Long Integer
Map Record
Short Integer
String String
Time Time
The Google BigQuery (Legacy) destination cannot convert the following Data Collector data types:
  • Byte
  • Character

Credentials

When the Google BigQuery (Legacy) destination streams data into Google BigQuery, the destination must pass credentials to BigQuery.

You can provide credentials using one the following options:
  • Google Cloud default credentials
  • Credentials in a file
  • Credentials in a stage property

For details on how to configure each option, see Security in Google Cloud Stages.

Configuring a Google BigQuery (Legacy) Destination

Configure a Google BigQuery (Legacy) destination to stream data into Google BigQuery.

  1. In the Properties panel, on the General tab, configure the following properties:
    General Property Description
    Name Stage name.
    Description Optional description.
    Required Fields Fields that must include data for the record to be passed into the stage.
    Tip: You might include fields that the stage uses.

    Records that do not include all required fields are processed based on the error handling configured for the pipeline.

    Preconditions Conditions that must evaluate to TRUE to allow a record to enter the stage for processing. Click Add to create additional preconditions.

    Records that do not meet all preconditions are processed based on the error handling configured for the stage.

    On Record Error Error record handling for the stage:
    • Discard - Discards the record.
    • Send to Error - Sends the record to the pipeline for error handling.
    • Stop Pipeline - Stops the pipeline.
  2. On the BigQuery tab, configure the following properties:
    BigQuery Property Description
    Dataset BigQuery dataset to write to.
    Enter the name of an existing dataset or an expression that evaluates to the name of an existing dataset. For example, if the dataset name is stored in the 'dataset' record attribute, enter the following expression:
    ${record:attribute('dataset')}
    Table Name Name of the BigQuery table to write to.
    Enter the name of an existing table or an expression that evaluates to the name of an existing table. For example, if the table name is stored in the 'table' record attribute, enter the following expression:
    ${record:attribute('table')}
    Insert ID Expression Expression that evaluates to the BigQuery insert ID to insert or update. The insert ID is a unique ID for each row. Leave blank to insert each record into a new row.

    For more information about the insert ID property used to stream data into BigQuery, see the Google BigQuery documentation.

    Ignore Invalid Column Ignores invalid columns.

    If selected and the destination encounters a field path that cannot be mapped to a BigQuery column with the same name and a compatible data type, the destination ignores the invalid column and writes the remaining fields in the record to BigQuery. If cleared and the destination encounters an invalid column, the record is sent to the stage for error handling.

    Table Cache Size Maximum number of table ID entries to cache locally.

    When the destination evaluates the dataset and table name to write to, it checks that the table ID exists in BigQuery and then caches the table ID. When possible, the destination uses the cache to avoid unnecessary retrievals from BigQuery. When the cache reaches the maximum size, the oldest cached entries are evicted to allow for new data.

    Default is -1, an unlimited cache size.

  3. On the Credentials tab, configure the following properties:
    Credentials Property Description
    Connection Connection that defines the information required to connect to an external system.

    To connect to an external system, you can select a connection that contains the details, or you can directly enter the details in the pipeline. When you select a connection, Control Hub hides other properties so that you cannot directly enter connection details in the pipeline.

    Project ID

    Google Cloud project ID to use.

    Credentials Provider Provider for Google Cloud credentials:
    • Default credentials provider - Uses Google Cloud default credentials.
    • Service account credentials file (JSON) - Uses credentials stored in a JSON service account credentials file.
    • Service account credentials (JSON) - Uses JSON-formatted credentials information from a service account credentials file.
    Credentials File Path (JSON) Path to the Google Cloud service account credentials file used to connect. The credentials file must be a JSON file.

    Enter a path relative to the Data Collector resources directory, $SDC_RESOURCES, or enter an absolute path.

    Credentials File Content (JSON) Contents of a Google Cloud service account credentials JSON file used to connect.

    Enter JSON-formatted credential information in plain text, or use an expression to call the information from runtime resources or a credential store. For more information about credential stores, see Credential Stores in the Data Collector documentation.