Google Big Query
The Google Big Query destination writes data to a Google BigQuery table. Use the destination in Databricks or Dataproc cluster pipelines only. To use the destination in Databricks clusters, you must configure specific Spark properties.
The Google Big Query destination stages data in a Google Cloud Storage bucket before writing it to BigQuery. The destination can write data to a new or existing BigQuery table. You can configure the destination to create a table if the specified table does not exist. You can configure the destination to truncate the table before writing each batch.
The Google Big Query destination writes to a BigQuery table based on the specified write mode. The destination can insert data to the table or merge data with existing data in the table. When inserting data, the destination can add new nullable columns to the table schema and can relax required fields to allow null values. When merging data, the destination can insert, update, and delete records based on one or more key columns and the specified merge conditions.
When you configure the Google Big Query destination, you specify the dataset, table, and temporary storage bucket. You specify whether to create a new table or to truncate an existing table. You also select the write mode to use. When merging data, you specify the join key, merge conditions, and operations.
Before you configure the Google Big Query destination, complete the prerequisite task.
Prerequisite
If necessary, create a bucket in Google Cloud Storage before you configure the Google Big Query destination. The destination temporarily stores data in the specified bucket before writing the data to BigQuery.
To ensure data integrity, use a separate bucket for every pipeline. Do not use the bucket for any other pipelines or processes.
Write Mode
- Insert
- The destination inserts all data to the table.
- Merge
- The destination merges data with existing data in the table. The destination performs inserts, updates, and deletes based on the specified merge properties.
Merge Properties
- Join Keys
- One or more key columns in the table. Used to perform updates and deletes and to ensure that duplicate rows do not exist for inserts. Pipeline records must include a matching field name.
- Merge Configuration
- Action that the destination
performs when a record meets the specified conditions. You can specify
multiple merge configurations for the destination to perform. Important: The destination performs the writes in the specified order. Best practice is to list merge configurations with the smallest number of affected records first, progressing to the largest number of affected records. When defining multiple merge configurations of the same type, carefully consider the order that you use.
Configuring a Google Big Query Destination
Use the Google Big Query destination to write to a BigQuery table. Include the destination in Dataproc cluster pipelines only.
Before you configure the destination, complete the prerequisite task.