Google Cloud
Google BigQuery Connection
Available when using an authoring Data Collector version 3.19.0 or later.
To create a Google BigQuery connection, the Google Cloud stage library,
streamsets-datacollector-google-cloud-lib
, must be installed on the
selected authoring Data Collector.
For a description of the Google BigQuery connection properties, see Google Cloud Connection Properties.
Engine | Stages |
---|---|
Data Collector 5.3.0 or later |
|
Data Collector 3.20.0 to 5.2.x |
|
Data Collector 3.19.0 to 3.20.0 |
|
Google Cloud Storage Connection
Available when using an authoring Data Collector version 3.19.0 or later.
To create a Google Cloud Storage connection, the Google Cloud stage library,
streamsets-datacollector-google-cloud-lib
must be installed on the
selected authoring Data Collector.
For a description of the Google Cloud Storage connection properties, see Google Cloud Connection Properties.
Engine | Stages and Locations |
---|---|
Data Collector 4.1.0 or later |
|
Data Collector 3.20.0 or later |
|
Data Collector 3.19.0 or later |
|
Google Pub/Sub Connection
Available when using an authoring Data Collector version 3.19.0 or later.
To create a Google Pub/Sub connection, the Google Cloud stage library,
streamsets-datacollector-google-cloud-lib
, must be installed on the
selected authoring Data Collector.
For a description of the Google Pub/Sub connection properties, see Google Cloud Connection Properties.
Engine | Stages and Locations |
---|---|
Data Collector 3.19.0 or later |
|
Google Cloud Credentials
Each Google Cloud connection must pass credentials to Google Cloud.
- Google Cloud default credentials
- Credentials in a file
- Credentials in a connection property
Default Credentials
You can configure the connection to use Google Cloud
default credentials. When using Google Cloud default credentials, the pipeline checks
for the credentials file defined in the GOOGLE_APPLICATION_CREDENTIALS
environment variable.
Set the environment variable on the Data Collector machine. If you run Data Collector on a virtual machine (VM) in Google Cloud Platform (GCP), use an instance service account with access to Google Secret Manager.
For more information about the default credentials, see Finding credentials automatically in the Google Cloud documentation.
- Use the Google Cloud Platform Console or the
gcloud
command-line tool to create a Google service account and have your application use it for API access.For example, to use the command line tool, run the following commands:gcloud iam service-accounts create my-account gcloud iam service-accounts keys create key.json --iam-account=my-account@my-project.iam.gserviceaccount.com
- Store the generated credentials file in a local directory external to the Data Collector installation
directory.For example, if you installed Data Collector in the following directory:
/opt/sdc/
you might store the credentials file at:/opt/sdc-credentials
Important: The file must exist in the same location on all execution engines that access the connection. - For all registered Data Collectors that access the connection, add the
GOOGLE_APPLICATION_CREDENTIALS
environment variable to the appropriate file and point it to the credentials file.Modify environment variables using the method required by your installation type.
Set the environment variable as follows:export GOOGLE_APPLICATION_CREDENTIALS="/var/lib/sdc-resources/keyfile.json"
- Restart each Data Collector to enable the changes.
- On the Credentials tab for the connection, for the Credential Provider property, select Default Credentials Provider.
Credentials in a File
You can configure the connection to use credentials in a Google Cloud service account credentials JSON file.
Complete the following steps to use credentials in a file:
- Generate a service account credentials file in JSON
format.
Use the Google Cloud Platform Console or the
gcloud
command-line tool to generate and download the credentials file. For more information, see Generating a service account credential in the Google Cloud Platform documentation. - Store the generated credentials file on the Data Collector machine. As a best practice, store the file in the Data Collector resources directory,
$SDC_RESOURCES
.Important: The file must exist in the same location on all execution engines that access the connection. - On the Credentials tab for the connection, for the Credential Provider property, select Service Account Credentials File. Then, enter the path to the credentials file.
Credentials in a Connection Property
You can configure the connection to use credentials specified in a connection property. When using credentials in connection properties, you provide JSON-formatted credentials from a Google Cloud service account credential file.
You can enter credential details in plain text, but best practice is to secure the credential details using credential stores or runtime resources.
- Generate a service account credentials file in JSON
format.
Use the Google Cloud Platform Console or the
gcloud
command-line tool to generate and download the credentials file. For more information, see Generating a service account credential in the Google Cloud Platform documentation. - As a best practice, secure the credentials using credential stores or runtime resources.
- On the Credentials tab for the connection, for the Credential Provider property, select Service Account Credentials. Then, enter the JSON-formatted credential details or an expression that calls the credentials from a credential store.
Google Cloud Connection Properties
You configure similar properties for all of the Google Cloud connections.
Credentials Property | Description |
---|---|
Project ID |
Google Cloud project ID to use. |
Credentials Provider | Provider for Google Cloud
credentials:
|
Credentials File Path (JSON) | Path to the Google Cloud
service account credentials file used to connect. The credentials
file must be a JSON file. Important: The file must exist in the same location on all
execution engines that access the connection. Enter a path relative to the
Data Collector resources directory, |
Credentials File Content (JSON) | Contents of a Google Cloud
service account credentials JSON file used to
connect. Enter JSON-formatted credential information in plain text, or use an expression to call the information from credential stores or runtime resources. |