Self-Managed Deployment samples#


The following section includes example scripts of some common tasks and objectives for Self-Managed Deployments.

These examples are intended solely as a jumping-off point for developers new to the SDK; to provide an idea of how some common tasks might be written out programmatically using the tools and resources available in the SDK.

For more details, refer to the StreamSets DataOps Platform Documentation.

To help visualize the environment and deployment that this example builds, here is the representation of the environment and deployment as it appears in the StreamSets DataOps Platform UI:

Environment

../../_images/sdk_sample_self_managed_environment.png

Deployment

../../_images/sdk_sample_self_managed_deployment.png

Deployment Details

../../_images/sdk_sample_self_managed_deployment_details.png

Create a Self-Managed Deployment#


This example will show how to use the SDK to create and start a brand new Self-Managed Deployment on the StreamSets DataOps Platform.

# Import the ControlHub class from the SDK.
from streamsets.sdk import ControlHub

# Connect to the StreamSets DataOps Platform.
sch = ControlHub(credential_id=<credential id>, token=<token>)

# Instantiate an EnvironmentBuilder instance to build an environment, and activate it.
environment_builder = sch.get_environment_builder(environment_type='SELF')
environment = environment_builder.build(environment_name='Sample Environment',
                                        environment_type='SELF',
                                        environment_tags=['self-managed-tag'],
                                        allow_nightly_engine_builds=False)
# Add the environment and activate it
sch.add_environment(environment)
sch.activate_environment(environment)

# Instantiate the DeploymentBuilder instance to build the deployment
deployment_builder = sch.get_deployment_builder(deployment_type='SELF')

# Build the deployment and specify the Sample Environment created previously.
deployment = deployment_builder.build(deployment_name='Sample Deployment DC-DOCKER',
                                      deployment_type='SELF',
                                      environment=environment,
                                      engine_type='DC',
                                      engine_version='4.1.0',
                                      deployment_tags=['self-managed-tag'])
deployment.install_type = 'DOCKER'
deployment.engine_instances = 1

# Add the deployment to SteamSets DataOps Platform, and start it
sch.add_deployment(deployment)
sch.start_deployment(deployment)

Fetch Self-Managed Deployments#


This example will show how to use the SDK to fetch Self-Managed Deployments.

# Import the ControlHub class from the SDK.
from streamsets.sdk import ControlHub

# Connect to the StreamSets DataOps Platform.
sch = ControlHub(credential_id=<credential id>, token=<token>)

# Fetch by deployment_name
fetched_by_name_deployment = sch.deployments.get(deployment_name='Sample Deployment DC-DOCKER')

# Fetch by id
deployment_id = fetched_by_name_deployment.deployment_id
fetched_by_id_deployment = sch.deployments.get(deployment_id=deployment_id)

# Fetch all the deployments
all_deployments = sch.deployments

Start/Stop Self-Managed Deployments#


This example will show how to use the SDK to start and stop Self-Managed Deployments.

# Import the ControlHub class from the SDK.
from streamsets.sdk import ControlHub

# Connect to the StreamSets DataOps Platform.
sch = ControlHub(credential_id=<credential id>, token=<token>)

sample_deployment = sch.deployments.get(deployment_name='Sample Deployment DC-DOCKER')

# Start
sch.start_deployment(sample_deployment)
assert sample_deployment.state == 'ACTIVE'

# Stop
sch.stop_deployment(sample_deployment)
assert deployment.state == 'DEACTIVATED'

Update Self-Managed Deployment#


This example will show how to use the SDK to update a Self-Managed Deployment. This includes how to update stage libraries, external resources, and a few other configurations of the deployment.

# Import the ControlHub class from the SDK.
from streamsets.sdk import ControlHub

# Connect to the StreamSets DataOps Platform.
sch = ControlHub(credential_id=<credential id>, token=<token>)
# Fetch a deployment
sample_deployment = sch.deployments.get(deployment_name='Sample Deployment DC-DOCKER')

# Update deployment name and tag/s
sample_deployment.deployment_name = 'updated name'
sample_deployment.tags = sample_deployment.tags + ['updatedTag']

# Update stage libraries
stage_libraries = sample_deployment.engine_configuration.select_stage_libraries
current_engine_version = sample_deployment.engine_configuration.engine_version
if sample_deployment.engine_configuration.engine_type == 'DC':
    additional_stage_libs = [f'streamsets-datacollector-jython_2_7-lib:{current_engine_version}',
                             f'streamsets-datacollector-jdbc-lib:{current_engine_version}']
else:
    additional_stage_libs = [f'streamsets-spark-jdbc-lib:{current_engine_version}',
                             f'streamsets-spark-snowflake-with-no-dependency-lib:{current_engine_version}']

stage_libraries.extend(additional_stage_libs)

# Update install type
expected_install_type = 'DOCKER'
sample_deployment.install_type = expected_install_type

# Update external_resource_location
expected_external_resource_location = 'http://www.google.com'
sample_deployment.engine_configuration.external_resource_location = expected_external_resource_location

# Update java configurations
java_config = sample_deployment.engine_configuration.java_configuration
java_config.maximum_java_heap_size_in_mb = 4096
java_config.minimum_java_heap_size_in_mb = 2048
java_config.java_options = '-Xdebug'

# Update the deployment with all the above changes
sch.update_deployment(sample_deployment)