GCP Environments

A Google Cloud Platform (GCP) environment represents the Google virtual private cloud (VPC) network in your Google Cloud account where engines are deployed.

Your GCP administrator must designate a project for the resources, create a VPC network in the project, and configure Google Cloud credentials for Control Hub to use. You then create a GCP environment in Control Hub that represents the VPC network. When you activate the environment, Control Hub connects to the project and VPC network using the configured credentials, provisions the Google Cloud resources needed to run engines, and deploys engine instances to those resources.

While the environment is in an active state, Control Hub periodically verifies that the project and VPC network exist and that the credentials are valid. Control Hub does not provision resources in the VPC network until you create and start a deployment for this environment.

Before you create a GCP environment, your Google Cloud administrator must complete several prerequisites.
Note: Due to your account agreement, GCP environments might be disabled for your organization. For more information, contact your StreamSets account team.

Feature Versions

At this time, GCP environments include the initial GCP_2021_06_01 feature version that includes all available features for GCP environments and deployments.

Prerequisites

The prerequisites require logging into StreamSets to retrieve information generated for the organization.

You first must invite your Google Cloud administrator to join your StreamSets organization. You can invite the administrator using the default role assignments, or you can modify the role assignments to grant the administrator the Environment Manager role only.

After joining the StreamSets organization, your Google Cloud administrator must complete the following prerequisites:
  1. Designate a Google Cloud project for the Google Cloud resources that Control Hub provisions. Then, enable the required Google APIs on the project.
  2. Create a Google VPC network in the designated project for the StreamSets GCP environment to use.
  3. Configure the Google Clouds credentials that Control Hub uses to access and provision resources in your project.
  4. Create instance service accounts to associate with the provisioned VM instances.

Designate a Project and Enable Google APIs

Designate a Google Cloud project for the resources that Control Hub provisions. You can use an existing project or create a new project. You'll select this project when you create the StreamSets GCP environment.

For instructions on creating or managing Google Cloud projects, see the Google Cloud Resource Manager documentation.

The project must have the following Google APIs enabled:
  • Cloud Resource Manager API
  • Compute Engine API
  • Identity and Access Management (IAM) API
  • Cloud Deployment Manager V2 API
  • Secret Manager API
  • Service Usage API
Note: Google Cloud provides multiple methods for enabling Google APIs. These steps provide brief instructions using the gcloud command line tool. For instructions on using other methods, such as using the Google Cloud Console, see the Google Cloud Endpoints documentation.
  1. Run the following gcloud command to set your designated project as the current project:
    gcloud config set project <PROJECT_ID>
  2. Run the following command to view the list of Google APIs currently enabled on the project:
    gcloud services list
  3. Run the following command to enable the required APIs on the project:
    gcloud services enable cloudresourcemanager.googleapis.com compute.googleapis.com iam.googleapis.com deploymentmanager.googleapis.com secretmanager.googleapis.com serviceusage.googleapis.com

Create a Google VPC Network

Create a Google virtual private cloud (VPC) network in your designated Google Cloud project. Or, create a shared VPC network in a host project and then attach your designated project to the host project. For more information on shared VPC networks, see the Google Cloud VPC documentation.

You can use an existing VPC network. However, StreamSets recommends creating a new VPC network for the exclusive use of each StreamSets GCP environment.

You can use private or public subnets within the VPC network, as long as the subnets can send outbound traffic to the internet.

For instructions on creating a VPC network and on allowing subnets internet access, see the Google Cloud VPC documentation.

Firewall Rules

Define the required firewall rules for the VPC network.

Allow the following traffic:
  • Inbound and outbound connections required by StreamSets engines, as described in Firewall Configuration Overview.
  • Outbound connections to Google Secret Manager. Add the IP address of the https://secretmanager.googleapis.com host as an allowed destination.

    For the list of Google Cloud IP addresses, see this Google support article.

You can define the firewall rules for the entire VPC network. Or, you can apply network tags to the firewall rules. Network tags allow you to apply firewall rules to specific VM instances within the network.

When you configure a GCE deployment for the environment, you specify the network tags to use for the provisioned VM instances. For more information on configuring network tags, see the Google Cloud VPC documentation.

Optionally Create a Cloud NAT Gateway

By default, Control Hub provisions Google Cloud VM instances with external IP addresses.

If your GCP project does not allow externally accessible IP addresses, create a Cloud NAT gateway for the VPC network in the region that you plan to provision Google Cloud resources. A Cloud NAT gateway provides outgoing connectivity for Compute Engine VM instances without external IP addresses. For more information on Cloud NAT gateways, see the Google Cloud NAT documentation.

When you configure a GCE deployment for the environment, you specify that the deployment provision VM instances without external IP addresses in the Configure GCE SSH Access step.

Configure Google Cloud Credentials

You can grant Control Hub access to your Google Cloud project using service account impersonation or a service account key. Control Hub uses the credentials to access and provision resources in the project. StreamSets recommends that you use service account impersonation for production.

Important: Configuring Google Cloud credentials requires logging into StreamSets to retrieve information generated for the organization. If you have not yet joined the StreamSets organization, ask your organization administrator to invite you.
Complete the following steps to configure Google Cloud credentials for Control Hub:
  1. Create an IAM service account and add IAM roles to the account to delegate limited access to Control Hub. Create the service account with the same roles when using either credential type.
  2. Allow Control Hub to impersonate the service account, or create the service account key that Control Hub uses.

Create a Service Account

For either credential type, create a service account in Google Cloud that delegates limited access to Control Hub.

The service account requires the following IAM roles:

  • Compute Network Viewer role
  • Deployment Manager Editor role, with a condition that limits access to resources provisioned by Control Hub
  • Service Account User role
  • Secret Manager Admin role, with a condition that limits access to resources provisioned by Control Hub
Note: Google Cloud provides multiple methods for creating service accounts. These steps provide brief instructions to create a new service account and add roles to the account using the gcloud command line tool. For instructions on using other methods, such as using the Google Cloud Console, see the Google Cloud IAM documentation.
  1. To use service account impersonation, first retrieve the unique service account name generated for your Control Hub organization.
    Important: Using the generated service account name prevents the confused deputy problem and ensures that Control Hub can impersonate this service account only when acting on behalf of your organization.

    If using a service account key, skip this step.

    1. Log into StreamSets.
    2. In the Control Hub Navigation panel, click Set Up > Environments and then click the Create Environment icon: .
    3. Enter a name for the environment, and then select Google Cloud Platform (GCP) for the Environment Type.
    4. Click Save & Next.
    5. Select Service Account Impersonation for the Credential Type.
    6. Copy the Service Account Name value.

      You can leave this page open in the browser, cancel the environment creation, or click Save & Exit to save an incomplete environment that you finish configuring after completing the prerequisites.

  2. Run the following gcloud command to create a new service account with a display name of StreamSets Service Account:
    gcloud iam service-accounts create <SA_NAME> --display-name="StreamSets Service Account"

    To use service account impersonation, replace the <SA_NAME> parameter with the generated service account name that you retrieved from Control Hub.

    To use a service account key, replace the <SA_NAME> parameter with an alphanumeric ID that meets the naming requirements for Google Cloud service accounts, as described in the Google Cloud IAM documentation. For example, you might enter the name streamsets-service-account.

  3. To grant the service account the Compute Network Viewer role on your project, run the following command:
    gcloud projects add-iam-policy-binding <PROJECT_ID> --member=serviceAccount:<SA_NAME>@<PROJECT_ID>.iam.gserviceaccount.com --role=roles/compute.networkViewer

    Replace the following parameters in the command:

    Parameter Replacement Value
    <PROJECT_ID> ID of the project designated for the StreamSets GCP environment.
    <SA_NAME> When using service account impersonation, the service account name that you retrieved from Control Hub.

    When using a service account key, the name that you used to create the service account.

  4. To grant the service account the Deployment Manager Editor role with the required condition, run the following command:
    gcloud projects add-iam-policy-binding <PROJECT_ID> --member=serviceAccount:<SA_NAME>@<PROJECT_ID>.iam.gserviceaccount.com --role=roles/deploymentmanager.editor --condition=expression=resource.name.startsWith\(\"projects/<PROJECT_ID>/global/deployments/streamsets-\"\),title="StreamSets Limited"

    Replace the parameters as described above.

  5. To grant the service account the Service Account User role, run the following command:
    gcloud projects add-iam-policy-binding <PROJECT_ID> --member=serviceAccount:<SA_NAME>@<PROJECT_ID>.iam.gserviceaccount.com --role=roles/iam.serviceAccountUser

    Replace the parameters as described above.

  6. To grant the service account the Secret Manager Admin role with the required condition, run the following command:
    gcloud projects add-iam-policy-binding <PROJECT_ID> --member=serviceAccount:<SA_NAME>@<PROJECT_ID>.iam.gserviceaccount.com --role=roles/secretmanager.admin --condition=expression=resource.name.startsWith\(\"projects/<PROJECT_NUMBER>/secrets/StreamSets-Deployment-Token-\"\),title="StreamSets Limited"

    Replace the parameters as described above. Replace the <PROJECT_NUMBER> parameter with the number of the project designated for the StreamSets GCP environment.

Use the Service Account for Impersonation

To use service account impersonation as the credential type, add Control Hub as a member of the service account that you created. This allows Control Hub to impersonate the service account to perform tasks in your Google Cloud project.

Note: Google Cloud provides multiple methods for allowing a member to impersonate a service account. These steps provide brief instructions to add Control Hub as a member of the service account you created using the gcloud command line tool. For instructions on using other methods, such as using the Google Cloud Console, see the Google Cloud IAM documentation.
Run the following gcloud command to add the Control Hub service account as a member of this service account:
gcloud iam service-accounts add-iam-policy-binding <SA_NAME>@<PROJECT_ID>.iam.gserviceaccount.com --member=serviceAccount:streamsets@streamsets-gcp-bridge.iam.gserviceaccount.com --role=roles/iam.serviceAccountTokenCreator

Replace the following parameters in the command:

Parameter Replacement Value
<SA_NAME> Service account name that you retrieved from Control Hub.
<PROJECT_ID> ID of the project designated for the StreamSets GCP environment.
You will enter the email of this new service account, <SA_NAME>@<PROJECT_ID>.iam.gserviceaccount.com, when you create the GCP environment in Control Hub.

Use the Service Account Key

To use service account key as the credential type, create a key for the service account that you created. Control Hub uses the key to perform tasks in your Google Cloud project.

Note: Google Cloud provides multiple methods for creating service account keys. These steps provide brief instructions using the Google Cloud Console. For instructions on using other methods, such as using the gcloud command line tool, see the Google Cloud IAM documentation.
  1. In the Google Cloud Console, go to the Service Accounts page.
  2. Select the project designated for the StreamSets GCP environment.
  3. Click the email address of the service account that you want to create a key for.
  4. Click the Keys tab.
  5. Click Add key > Create new key.
  6. Select JSON as the Key type, and then click Create.

    The service account key file downloads. You will enter the full contents of this key file when you create the GCP environment in Control Hub.

Create Instance Service Accounts for VM Instances

Create instance service accounts for Google Compute Engine VM instances. When Control Hub provisions VM instances for a GCE deployment belonging to this environment, it associates these instance service accounts with the VM instances.

Instance service accounts require the Secret Manager Secret Accessor role with a condition that limits access to resources provisioned by Control Hub.

In addition, when GCE deployments managed by this environment are configured to use an external resource archive file stored in a private Google Cloud Storage bucket, the service account requires the Storage Legacy Object Reader role with a condition that limits access to the bucket containing the archive file.

You can configure the instance service account used by a deployment in the following ways:
Configure a default instance service account for the environment
Configure a default instance service account for the parent GCP environment. When you create a GCE deployment for this environment, you can simply use the default instance service account configured for the environment.
Configure a unique instance service account for each deployment
Do not configure a default instance service account for the parent GCP environment. When you create a GCE deployment for this environment, you must configure the instance service account to use for the deployment.
Configure a default instance service account and override as needed
Configure a default instance service account for the parent GCP environment. When you create a GCE deployment for this environment, you can use the default instance service account configured for the environment, or you can override the default and configure a different instance service account for the deployment to use.
Note: Google Compute Engine provides multiple methods for creating instance service accounts for VM instances. These steps provide brief instructions to create a new instance service account and add the required roles to the account using the gcloud command line tool. For instructions on using other methods, such as using the Google Cloud Console, see the Google Cloud Compute Engine documentation.
  1. Run the following gcloud command to create a new instance service account with a display name of StreamSets Instance Service Account:
    gcloud iam service-accounts create <INSTANCE_SA_NAME> --display-name="StreamSets Instance Service Account"
    Replace the <INSTANCE_SA_NAME> parameter with an alphanumeric ID that meets the naming requirements for Google Cloud service accounts, as described in the Google Cloud IAM documentation. For example, you might enter streamsets-instance-sa.
  2. To grant the instance service account the Secret Manager Secret Accessor role with the required condition, run the following command:
    gcloud projects add-iam-policy-binding <PROJECT_ID> --member=serviceAccount:<INSTANCE_SA_NAME>@<PROJECT_ID>.iam.gserviceaccount.com --role=roles/secretmanager.secretAccessor --condition=expression=resource.name.startsWith\(\"projects/<PROJECT_NUMBER>/secrets/StreamSets-Deployment-Token-\"\),title="StreamSets Limited"

    Replace the following parameters in the command:

    Parameter Replacement Value
    <PROJECT_ID> ID of the project designated for the StreamSets GCP environment.
    <INSTANCE_SA_NAME> Name that you used to create the instance service account.
    <PROJECT_NUMBER> Number of the project designated for the StreamSets GCP environment.
  3. Optionally, to grant the service account the Storage Legacy Object Reader role when GCE deployments managed by this environment are configured to use an external resource archive file stored in a private Google Cloud Storage bucket, run the following command:
    gcloud projects add-iam-policy-binding <PROJECT_ID> --member=serviceAccount:<INSTANCE_SA_NAME>@<PROJECT_ID>.iam.gserviceaccount.com --role=roles/storage.legacyObjectReader --condition=expression=resource.name.startsWith\(\"projects/_/buckets/<BUCKET_NAME>/\"\),title="StreamSets Limited" 

    Replace the parameters as described above.

    For more information about configuring external resources for deployments, see External Resources.

  4. Grant the instance service account any additional roles required by the StreamSets engines running on the VM instances.

    For example, if your pipelines process Google Cloud Storage data, grant the appropriate Cloud Storage roles as well.

  5. To use a unique instance service account for each deployment, simply repeat these steps to create another instance service account.

Configuring a GCP Environment

Configure a Google Cloud Platform (GCP) environment to define where to deploy StreamSets engines in your Google Cloud project.

Important: Before configuring an environment, your Google Cloud administrator must complete the required prerequisites.

To create a new environment, click Set Up > Environments in the Navigation panel, and then click the Create Environment icon: . Or, if you saved an incomplete environment when you retrieved the information required by the prerequisites, simply edit that environment.

To edit an existing environment, click Set Up > Environments in the Navigation panel, click the environment name, and then click Edit.

Define the Environment

Define the environment essentials, including the environment name and type, and optional tags to identify similar environments.

  1. Configure the following properties:
    Define Environment Property Description
    Environment Name Name of the environment.

    Use a brief name that informs your team of the environment use case.

    Environment Type Select Google Cloud Platform (GCP).

    Once saved, you cannot change the environment type.

    Environment Tags Optional tags that identify similar environments within Control Hub. Use environment tags to easily search and filter environments.

    Enter nested tags using the following format:

    <tag1>/<tag2>/<tag3>

    Feature Version Feature version to use for the environment and all deployments created for the environment.

    Each feature version typically requires different permissions.

    When creating a new environment, StreamSets recommends using the latest feature version. When a new feature version is available, StreamSets recommends changing your existing environments to use the new feature version as soon as possible.

  2. Optionally, click Show Advanced Options and configure the following advanced property:
    Define Environment Advanced Property Description
    Allow Nightly Builds Allows deployments for this environment to use nightly engine builds in addition to released engine versions.

    Nightly builds are for testing features under development and should not be used in production systems.

    The version number of a nightly build includes a -SNAPSHOT suffix and the build number. For example, 5.2.0-SNAPSHOT (Build 1013).

  3. If creating the environment, click one of the following buttons:
    • Cancel - Cancels creating the environment and exits the wizard.
    • Save & Next - Saves the environment and continues.
    • Save & Exit - Saves the environment and exits the wizard, displaying the incomplete environment in the Environments view.

Configure GCP Credentials

Configure the credentials that Control Hub uses to access and provision resources in your Google Cloud project.
Important: Before you configure GCP credentials, your Google Cloud administrator must complete the required prerequisites.
  1. Configure the following properties:
    Credentials Property Description
    Credential Type Type of credentials to authenticate with Google Cloud:
    • Service Account Impersonation
    • Service Account Key
    Important: StreamSets recommends that you use service account impersonation for production.
    Service Account Email Email of the service account created as an environment prerequisite by your Google Cloud administrator. Enter using the following format:

    <SA_NAME>@<PROJECT_ID>.iam.gserviceaccount.com

    Required when using service account impersonation to authenticate with Google Cloud.

    Account Key JSON Full contents of the service account key file created and downloaded as an environment prerequisite by your Google Cloud administrator.

    Required when using a service account key to authenticate with Google Cloud.

    Default Instance Service Account Optional instance service account to associate with the VM instances provisioned for all deployments belonging to this environment. Select the instance service account created as an environment prerequisite by your Google Cloud administrator.

    If you do not define a default instance service account, then you must define an instance service account when you create a deployment for this environment.

  2. If creating the environment, click one of the following buttons:
    • Back - Returns to the previous step in the wizard.
    • Save & Next - Saves the environment and continues.
    • Save & Exit - Saves the environment and exits the wizard, displaying the incomplete environment in the Environments view.

Configure the GCP Project

Select the GCP project prepared as a prerequisite by your Google Cloud administrator.

  1. Select the GCP project designated for the StreamSets environment.
  2. If creating the environment, click one of the following buttons:
    • Back - Returns to the previous step in the wizard.
    • Save & Next - Saves the environment and continues.
    • Save & Exit - Saves the environment and exits the wizard, displaying the incomplete environment in the Environments view.

Select the GCP VPC

Select the VPC network created as a prerequisite by your Google Cloud administrator, and optionally define GCP labels to apply to provisioned GCP resources.

  1. Configure the following properties:
    GCP VPC Property Description
    VPC ID ID of the Google VPC network created as an environment prerequisite by your Google Cloud administrator.
    GCP Labels Labels to apply to all Google Cloud resources provisioned for this environment.

    Enter the labels as key-value pairs. For label naming requirements, see the Google Cloud Compute Engine documentation.

    You can define the labels using simple or bulk edit mode. In simple edit mode, click Add to define additional labels. In bulk edit mode, configure labels in JSON format.

    Important: These labels are applied to Google Cloud resources, not to Control Hub environments.
  2. If creating the environment, click one of the following buttons:
    • Back - Returns to the previous step in the wizard.
    • Save & Next - Saves the environment and continues.
    • Save & Exit - Saves the environment and exits the wizard, displaying the incomplete environment in the Environments view.

Share the Environment

By default, the environment can only be seen by you. Share the environment with other users and groups to grant them access to it.

  1. In the Select Users and Groups field, type a user email address or a group name.
  2. Select users or groups from the list, and then click Add.

    The added users and groups display in the User / Group table.

  3. Modify permissions as needed. By default, each added user or group is granted the following permissions:
    • Read - View the details of the environment. Create and edit a deployment for the environment.
    • Write - Edit, activate, deactivate, and delete the environment.

    For more information, see Environment Permissions.

  4. Click one of the following buttons:
    • Back - Returns to the previous step in the wizard.
    • Save & Next - Saves the environment and continues.
    • Save & Exit - Saves the environment and exits the wizard, displaying the incomplete environment in the Environments view.

Review and Activate the Environment

You've successfully finished creating the environment. Activate the environment so that you can create deployments for the environment.

Click one of the following buttons:
  • Exit - Saves the environment and exits the wizard, displaying the Deactivated environment in the Environments view.
  • Activate & Add Deployment - Activates the environment and opens the deployment wizard so that you can create a deployment for the environment.
  • Activate & Exit - Activates the environment and exits the wizard, displaying the Active environment in the Environments view.