The Google BigQuery origin executes a query job and reads the result from Google BigQuery.
The origin submits the query that you define, and then Google BigQuery runs the query as an interactive query. When the query is complete, the origin reads the query results to generate records. The origin runs the query once and then the pipeline stops when it finishes reading all query results. If you start the pipeline again, the origin submits the query again.
When you configure the origin, you define the query to run using valid BigQuery standard SQL or legacy SQL syntax. By default, BigQuery writes all query results to a temporary, cached results table. You can choose to disable retrieving cached results and force BigQuery to compute the query result.
You also define the project and credentials provider to use to connect to Google BigQuery. The origin can retrieve credentials from the Google Application Default Credentials or from a Google Cloud service account credentials file.
The origin can generate events for an event stream. For more information about dataflow triggers and the event framework, see Dataflow Triggers Overview.
When the Google BigQuery origin executes a query job and reads the result from Google BigQuery, it must pass credentials to Google BigQuery. Configure the origin to retrieve the credentials from the Google Application Default Credentials or from a Google Cloud service account credentials file.
When configured to use the Google Application Default Credentials, the origin checks for the credentials file defined in the GOOGLE_APPLICATION_CREDENTIALS environment variable. If the environment variable doesn't exist and Data Collector is running on a virtual machine (VM) in Google Cloud Platform (GCP), the origin uses the built-in service account associated with the virtual machine instance.
For more information about the default credentials, see Google Application Default Credentials in the Google Developer documentation.
Complete the following steps to define the credentials file in the environment variable:
gcloud iam service-accounts create my-account gcloud iam service-accounts keys create key.json --iam-account=my-account@my-project.iam.gserviceaccount.com
Modify environment variables using the method required by your installation type.
Set the environment variable as follows:
export GOOGLE_APPLICATION_CREDENTIALS="/var/lib/sdc-resources/keyfile.json"
When configured to use the Google Cloud service account credentials file, the origin checks for the file defined in the origin properties.
Complete the following steps to use the service account credentials file:
Use the Google Cloud Platform Console or the gcloud command-line tool to generate and download the credentials file. For more information, see generating a service account credential in the Google Cloud Platform documentation.
As a best practice, store the file in the Data Collector resources directory, $SDC_RESOURCES.
The Google BigQuery origin converts the Google BigQuery data types to Data Collector data types.
The data types supported by Google BigQuery depend on whether you use standard SQL or legacy SQL syntax to define the query:
The following table lists the data types that Google BigQuery standard SQL supports and the Data Collector data types that the origin converts them to:
| BigQuery Standard SQL Data Type | Data Collector Data Type |
|---|---|
| Array | List |
| Boolean | Boolean |
| Bytes | Byte Array |
| Date | Date |
| Datetime | Datetime |
| Float | Double |
| Integer | Long |
| String | String |
| Struct | List-Map |
| Time | Datetime |
| Timestamp | Datetime |
The following table lists the data types that Google BigQuery legacy SQL supports and the Data Collector data types that the origin converts them to.
| BigQuery Legacy SQL Data Type | Data Collector Data Type |
|---|---|
| Boolean | Boolean |
| Bytes | Byte Array |
| Date | Date |
| Datetime | Datetime |
| Float | Double |
| Integer | Long |
| Record | List-Map |
| String | String |
| Time | Datetime |
| Timestamp | Datetime |
The Google BigQuery origin generates an event when a query completes successfully.
For an example, see Case Study: Sending Email.
With a destination to store information about completed queries.
For an example, see Case Study: Event Storage.
For more information about dataflow triggers and the event framework, see Dataflow Triggers Overview.
Event records generated by the Google BigQuery origin have the following event-related record header attributes:
| Record Header Attribute | Description |
|---|---|
| sdc.event.type | Event type. Uses the following type:
|
| sdc.event.version | An integer that indicates the version of the event record type. |
| sdc.event.creation_timestamp | Epoch timestamp when the stage created the event. |
| Field | Description |
|---|---|
| query | Query that completed successfully. |
| timestamp | Timestamp when the query completed. |
| row-count | Number of processed rows. |
| source-offset | Offset after the query completed. |
Configure a Google BigQuery origin to execute a query job and read the result from Google BigQuery.