GCE Deployments#


You can create a Google Compute Engine (GCE) deployment for an active GCP environment.

When you create a GCE deployment, you define the engine type, version, and configuration to deploy to the Google Cloud project and VPC network specified in the environment. You also specify the number of engine instances to deploy. Each engine instance runs on a dedicated Google Compute Engine VM instance.

For more details, refer to the StreamSets Platform Documentation.

Creating Deployment for Data Collector#


The SDK is designed to mirror the UI workflow. This section shows you how to create a GCE deployment for Data Collector in the UI and how to achieve the same using StreamSets Platform SDK for Python code step by step.

Define the Deployment#

In the UI, a deployment is defined as seen below:

../../../_images/creation_define_deployment_sdc1.png

The same effect can be achieved by using the SDK as seen below:

deployment_builder = sch.get_deployment_builder(deployment_type='GCE')

# sample_environment is an instance of streamsets.sdk.sch_models.GCPEnvironment
deployment = deployment_builder.build(deployment_name='Sample Deployment',
                                      environment=sample_environment,
                                      engine_type='DC',
                                      engine_version='4.1.0',
                                      deployment_tags=['gce-deployment-tag'])
sch.add_deployment(deployment)

Configure the Engine#

In the UI, a deployment’s engines are configured as seen below:

../../../_images/creation_configure_engine.png

In the above UI, when you click on 3 stage libraries selected, the following dialog opens and allows you to select stage libraries.

../../../_images/creation_configure_engine_stage_lib_selection_screen.png

In the above UI, once you select JDBC and click on any of the ‘+’ signs, then it shows the following:

../../../_images/creation_configure_engine_stage_lib_selection_as_a_list.png

Selecting stage libraries for a deployment is also possible using the SDK. The stage_libs property of the streamsets.sdk.sch_models.DeploymentEngineConfiguration attribute in a Deployment object allows specification of additional stage libraries in the '<library_name>' format, or optionally the '<library_name>:<library_version>' format.

Note

If a version is omitted for a stage library, it will default to the engine version that was configured for the deployment.

There are several methods available for modifying the stage libraries of a deployment. If you know the complete list of stage libraries you want to add to a deployment, you can specify them as a list and set the stage_libs attribute directly as seen below:

Warning

Attempting to add multiple versions of the same stage library to a deployment’s engine configuration will result in an error when you attempt to add or update a deployment on the StreamSets Platform.

# Stage libraries can be supplied both with and without version specified. Any without a version will default
# to the version of the engine selected for the deployment
deployment.engine_configuration.stage_libs = ['jdbc', 'aws:4.1.0', 'cdp_7_1:4.1.0', 'basic:4.1.0', 'dev']

The stage_libs attribute operates like a traditional list object, with accompanying append(), extend(), and remove() methods. If you are looking to add a single stage library to a deployment’s engine configuration, you can utilize the streamsets.sdk.sch_models.DeploymentStageLibraries.append() method, using the same library:version syntax from above:

# Adding a single additional library to the stage library configuration
deployment.engine_configuration.stage_libs.append('aws')

If you would prefer to add a list of additional stage libraries to a deployment’s engine configuration, you can utilize the streamsets.sdk.sch_models.DeploymentStageLibraries.extend() method, which also follows the same library:version syntax from above:

# Extending the list of stage libraries by adding two additional stages
deployment.engine_configuration.stage_libs.extend(['cassandra_3:4.1.0', 'elasticsearch_7'])

Finally, if you would like to remove a single stage library from a deployment’s engine configuration, you can utilize the streamsets.sdk.sch_models.DeploymentStageLibraries.remove() method. The removal of a stage library from a deployment’s engine configuration intentionally requires a version to be supplied, so as to not accidentally remove an unintended stage library:

# Removing a single library from the stage library configuration by supplying the library name and version
deployment.engine_configuration.stage_libs.remove('aws:4.1.0')

Once the desired stage libraries have been set for the deployment, the deployment must be updated on Control Hub using the streamsets.sdk.ControlHub.update_deployment() method in order for them to take effect:

# Update a deployment's configuration/definition on Control Hub
sch.update_deployment(deployment)

Configure the GCE Region#

In the UI, the GCE Region for a deployment is selected from a list as seen below:

../../../_images/creation_configure_gce_region.png

The equivalent configuration in the SDK uses the deployment object’s region property:

deployment.region = 'us-west2'

Configure the GCE Zone#

In the UI, one or more GCE Zones for a deployment are selected from a list as seen below:

../../../_images/creation_configure_gce_zone.png

The same effect can be achieved by using the SDK as seen below:

deployment.zone = ['us-west2-c']

Configure the GCE Autoscaling Group#

In the UI, GCE Autoscaling Group for a deployment is configured as seen below:

../../../_images/creation_configure_gce_autoscaling_group.png

The same effect can be achieved by using the SDK as seen below:

deployment.desired_instances = 1
deployment.machine_type = 'e2-standard-4'
deployment.instance_service_account = <Instance Service Account>
deployment.gcp_labels = {'name1': 'value1', 'name2': 'value2'}
deployment.network_tags = '<Tag 1>, <Tag 2>'

Configure GCE SSH Access#

In the UI, GCE SSH Access for a deployment is configured as seen below:

../../../_images/creation_configure_gce_ssh_access.png

The same effect can be achieved by using the SDK as seen below:

deployment.block_project_ssh_keys = False
deployment.public_ssh_key = <Public SSH key contents>

Review and Launch the Deployment#

In the UI, a deployment can be reviewed and launched as seen below:

../../../_images/creation_review_and_launch2.png

The same effect can be achieved by using the SDK as seen below:

# Optional - equivalent to clicking on 'Launch Deployment'
sch.start_deployment(deployment)

Complete example for Data Collector#


To create a new streamsets.sdk.sch_models.GCEDeployment object and add it to Control Hub, use the streamsets.sdk.sch_models.DeploymentBuilder class. Use the streamsets.sdk.ControlHub.get_deployment_builder() method to instantiate the builder object:

deployment_builder = sch.get_deployment_builder(deployment_type='GCE')

Next, retrieve the streamsets.sdk.sch_models.GCPEnvironment object which represents an active GCP environment where engine instances will be deployed, pass it to the streamsets.sdk.sch_models.DeploymentBuilder.build() method along with other parameters. Finally, pass the resulting streamsets.sdk.sch_models.GCEDeployment object to the streamsets.sdk.ControlHub.add_deployment() method:

# sample_environment is an instance of streamsets.sdk.sch_models.GCPEnvironment
deployment = deployment_builder.build(deployment_name='Sample Deployment',
                                      environment=sample_environment,
                                      engine_type='DC',
                                      engine_version='4.1.0',
                                      deployment_tags=['gce-deployment-tag'])
sch.add_deployment(deployment)

# Optional - add sample stage libs
deployment.engine_configuration.stage_libs = ['jdbc', 'aws:4.1.0', 'cdp_7_1:4.1.0', 'basic:4.1.0', 'dev']
# deployment.engine_configuration.stage_libs.append('aws')
# deployment.engine_configuration.stage_libs.extend(['cassandra_3:4.1.0', 'elasticsearch_7'])

deployment.region = 'us-west2'
deployment.zone = ['us-west2-c']
deployment.desired_instances = 1
deployment.machine_type = 'e2-standard-4'
deployment.instance_service_account = <Instance Service Account>
deployment.gcp_labels = {'name1': 'value1', 'name2': 'value2'}
deployment.network_tags = '<Tag 1>, <Tag 2>'
deployment.block_project_ssh_keys = False
deployment.public_ssh_key = <Public SSH key contents>

# Update a deployment's configuration/definition on Control Hub
sch.update_deployment(deployment)

# Optional - equivalent to clicking on 'Launch Deployment'
sch.start_deployment(deployment)