Google BigQuery (Legacy) (deprecated)

The Google BigQuery (Legacy) destination streams data into existing datasets and tables in Google BigQuery. For information about supported versions, see Supported Systems and Versions in the Data Collector documentation.

Important: This stage is deprecated and may be removed in a future release. To write to Google BigQuery, use the Google BigQuery destination, which can process change data capture (CDC) data and handle data drift.

When you configure the destination, you define the existing BigQuery dataset and table to stream data into. The destination streams each record into a row in a BigQuery table. You can optionally define an expression to specify the insert ID to insert or update. The insert ID is a unique ID for each row. If you do not specify an insert ID, the destination inserts each record into a new row.

The destination maps fields from records to BigQuery columns based on matching names and compatible data types. You can configure the destination to ignore invalid columns when the destination cannot map a field to an existing BigQuery column. You can also configure the table cache size for the destination.

You define the project ID and credentials to use when connecting to Google BigQuery.

You can also use a connection to configure the destination.

For more information about streaming data into Google BigQuery, see the Google BigQuery documentation.

BigQuery Data Types

The Google BigQuery (Legacy) destination maps fields from records to BigQuery columns in existing tables based on matching names and compatible data types. If needed, the destination converts Data Collector data types to BigQuery data types.

The following table lists the Data Collector data types and the BigQuery data types that the destination converts them to:


Data Collector Data Type	BigQuery Data Type
Boolean	Boolean
Byte Array	Bytes
Date	Date
Datetime	Datetime or Timestamp
Decimal	Decimal
Double	Float
Float	Float
Integer	Integer
List	Array
List-Map	Record with repeated fields
Long	Integer
Map	Record
Short	Integer
String	String
Time	Time

The Google BigQuery (Legacy) destination cannot convert the following Data Collector data types:

Byte
Character

Credentials

When the Google BigQuery (Legacy) destination streams data into Google BigQuery, the destination must pass credentials to BigQuery.

You can provide credentials using one the following options:

Google Cloud default credentials
Credentials in a file
Credentials in a stage property

For details on how to configure each option, see Security in Google Cloud Stages.

Configuring a Google BigQuery (Legacy) Destination

Configure a Google BigQuery (Legacy) destination to stream data into Google BigQuery.

In the Properties panel, on the General tab, configure the following properties:


General Property	Description
Name	Stage name.
Description	Optional description.
Required Fields	Fields that must include data for the record to be passed into the stage. Tip: You might include fields that the stage uses. Records that do not include all required fields are processed based on the error handling configured for the pipeline.
Preconditions	Conditions that must evaluate to TRUE to allow a record to enter the stage for processing. Click Add to create additional preconditions. Records that do not meet all preconditions are processed based on the error handling configured for the stage.
On Record Error	Error record handling for the stage: Discard - Discards the record. Send to Error - Sends the record to the pipeline for error handling. Stop Pipeline - Stops the pipeline.

On the BigQuery tab, configure the following properties:


BigQuery Property	Description
Dataset	BigQuery dataset to write to. Enter the name of an existing dataset or an expression that evaluates to the name of an existing dataset. For example, if the dataset name is stored in the 'dataset' record attribute, enter the following expression: `${record:attribute('dataset')}`
Table Name	Name of the BigQuery table to write to. Enter the name of an existing table or an expression that evaluates to the name of an existing table. For example, if the table name is stored in the 'table' record attribute, enter the following expression: `${record:attribute('table')}`
Insert ID Expression	Expression that evaluates to the BigQuery insert ID to insert or update. The insert ID is a unique ID for each row. Leave blank to insert each record into a new row. For more information about the insert ID property used to stream data into BigQuery, see the Google BigQuery documentation.
Ignore Invalid Column	Ignores invalid columns. If selected and the destination encounters a field path that cannot be mapped to a BigQuery column with the same name and a compatible data type, the destination ignores the invalid column and writes the remaining fields in the record to BigQuery. If cleared and the destination encounters an invalid column, the record is sent to the stage for error handling.
Table Cache Size	Maximum number of table ID entries to cache locally. When the destination evaluates the dataset and table name to write to, it checks that the table ID exists in BigQuery and then caches the table ID. When possible, the destination uses the cache to avoid unnecessary retrievals from BigQuery. When the cache reaches the maximum size, the oldest cached entries are evicted to allow for new data. Default is -1, an unlimited cache size.

On the Credentials tab, configure the following properties:


Credentials Property	Description
Connection	Connection that defines the information required to connect to an external system. To connect to an external system, you can select a connection that contains the details, or you can directly enter the details in the pipeline. When you select a connection, Control Hub hides other properties so that you cannot directly enter connection details in the pipeline.
Project ID	Google Cloud project ID to use.
Credentials Provider	Provider for Google Cloud credentials: Default credentials provider - Uses Google Cloud default credentials. Service account credentials file (JSON) - Uses credentials stored in a JSON service account credentials file. Service account credentials (JSON) - Uses JSON-formatted credentials information from a service account credentials file.
Credentials File Path (JSON)	Path to the Google Cloud service account credentials file used to connect. The credentials file must be a JSON file. Enter a path relative to the Data Collector resources directory, `$SDC_RESOURCES`, or enter an absolute path.
Credentials File Content (JSON)	Contents of a Google Cloud service account credentials JSON file used to connect. Enter JSON-formatted credential information in plain text, or use an expression to call the information from runtime resources or a credential store. For more information about credential stores, see Credential Stores in the Data Collector documentation.