Self-Managed Deployments
Section Contents
Self-Managed Deployments#
You can create a self-managed deployment for an active self-managed environment. When using a self-managed deployment, you take full control of procuring the resources needed to run engine instances.
When you create a self-managed deployment, you define the engine type, version, and configuration to deploy. You also select the installation type to use - either a tarball or a Docker image
For more details, refer to the StreamSets Platform Documentation.
Using the Data Collector Engine#
The SDK is designed to mirror the UI workflow. This section shows you how to create a self-managed deployment for Data Collector using the Docker Image installation type in the UI and how to achieve the same using the StreamSets Platform SDK for Python code step by step.
Define the Deployment#
In the UI, a deployment is defined as seen below:
The same effect can be achieved using the streamsets.sdk.sch_models.DeploymentBuilder
class in the SDK. To
instantiate a DeploymentBuilder object, use the streamsets.sdk.ControlHub.get_deployment_builder()
method and then provide configuration options to the streamsets.sdk.sch_models.DeploymentBuilder.build()
method
of the builder. Once the deployment object has been built, it can be added to Control Hub using the
streamsets.sdk.ControlHub.add_deployment()
method:
deployment_builder = sch.get_deployment_builder(deployment_type='SELF')
# sample_environment is an instance of streamsets.sdk.sch_models.SelfManagedEnvironment
deployment = deployment_builder.build(deployment_name='Sample Deployment',
environment=sample_environment,
engine_type='DC',
engine_version='4.1.0',
deployment_tags=['self-managed-tag'])
sch.add_deployment(deployment)
Configure the Engine#
In the UI, a deployment’s engines are configured as seen below:
When you click on 3 stage libraries selected in the above UI, the following dialog opens and allows you to select stage libraries:
In the above UI, once you select JDBC and click on any of the ‘+’ signs, then it shows the following:
Selecting stage libraries for a deployment is also possible using the SDK. The stage_libs
property of the
streamsets.sdk.sch_models.DeploymentEngineConfiguration
attribute in a
Deployment object allows specification of additional stage libraries in the '<library_name>'
format, or optionally
the '<library_name>:<library_version>'
format.
Note
If a version is omitted for a stage library, it will default to the engine version that was configured for the deployment.
There are several methods available for modifying the stage libraries of a deployment.
If you know the complete list of stage libraries you want to add to a deployment, you can specify them as a list
and set the stage_libs
attribute directly as seen below:
Warning
Attempting to add multiple versions of the same stage library to a deployment’s engine configuration will result in an error when you attempt to add or update a deployment on the StreamSets Platform.
# Stage libraries can be supplied both with and without version specified. Any without a version will default
# to the version of the engine selected for the deployment
deployment.engine_configuration.stage_libs = ['jdbc', 'aws:4.1.0', 'cdp_7_1:4.1.0', 'basic:4.1.0', 'dev']
The stage_libs
attribute operates like a traditional list
object, with accompanying append()
,
extend()
, and remove()
methods.
If you are looking to add a single stage library to a deployment’s engine configuration, you can utilize the
streamsets.sdk.sch_models.DeploymentStageLibraries.append()
method, using the same library:version syntax from
above:
# Adding a single additional library to the stage library configuration
deployment.engine_configuration.stage_libs.append('aws')
If you would prefer to add a list of additional stage libraries to a deployment’s engine configuration, you can utilize
the streamsets.sdk.sch_models.DeploymentStageLibraries.extend()
method, which also follows the same
library:version syntax from above:
# Extending the list of stage libraries by adding two additional stages
deployment.engine_configuration.stage_libs.extend(['cassandra_3:4.1.0', 'elasticsearch_7'])
Finally, if you would like to remove a single stage library from a deployment’s engine configuration, you can utilize
the streamsets.sdk.sch_models.DeploymentStageLibraries.remove()
method. The removal of a stage library from
a deployment’s engine configuration intentionally requires a version to be supplied, so as to not accidentally remove
an unintended stage library:
# Removing a single library from the stage library configuration by supplying the library name and version
deployment.engine_configuration.stage_libs.remove('aws:4.1.0')
Once the desired stage libraries have been set for the deployment, the deployment must be updated on Control Hub using
the streamsets.sdk.ControlHub.update_deployment()
method in order for them to take effect:
# Update a deployment's configuration/definition on Control Hub
sch.update_deployment(deployment)
Configure the Install Type#
In the UI, the Install Type for a deployment is configured as seen below:
To set the Install Type for the engine in a deployment via the SDK, the install_type
property can be configured
to either 'TARBALL'
or 'DOCKER'
depending on your needs. If no install_type
is provided, the deployment will
default to 'TARBALL'
:
# Set a deployment's install type to use a tarball
deployment.install_type = 'TARBALL'
# Or, set a deployment's install type to use the docker image
deployment.install_type = 'DOCKER'
Review and Launch the Deployment#
In the UI, a deployment can be reviewed and launched as seen below:
The same effect can be achieved by using the SDK as seen below:
# Optional - equivalent to clicking on 'Start & Generate Install Script'
sch.start_deployment(deployment)
Complete example for the Data Collector Engine#
To create a new streamsets.sdk.sch_models.SelfManagedDeployment
object and add it to Control Hub, use the
streamsets.sdk.sch_models.DeploymentBuilder
class.
Use the streamsets.sdk.ControlHub.get_deployment_builder()
method to instantiate the builder object:
deployment_builder = sch.get_deployment_builder(deployment_type='SELF')
Next, retrieve the streamsets.sdk.sch_models.SelfManagedEnvironment
object which represents an active
self-managed environment where engine instances will be deployed, pass it to the
streamsets.sdk.sch_models.DeploymentBuilder.build()
method along with other parameters, and pass the
resulting streamsets.sdk.sch_models.SelfManagedDeployment
object to the
streamsets.sdk.ControlHub.add_deployment()
method:
# sample_environment is an instance of streamsets.sdk.sch_models.SelfManagedEnvironment
deployment = deployment_builder.build(deployment_name='Sample Deployment',
environment=sample_environment,
engine_type='DC',
engine_version='4.1.0',
deployment_tags=['self-managed-tag'])
sch.add_deployment(deployment)
deployment.install_type = 'TARBALL'
# deployment.install_type = 'DOCKER'
# Optional - add sample stage libs
deployment.engine_configuration.stage_libs = ['jdbc', 'aws:4.1.0', 'cdp_7_1:4.1.0', 'basic:4.1.0', 'dev']
# deployment.engine_configuration.stage_libs.append('aws')
# deployment.engine_configuration.stage_libs.extend(['cassandra_3:4.1.0', 'elasticsearch_7'])
# Update a deployment's configuration/definition on Control Hub
sch.update_deployment(deployment)
# Optional - equivalent to clicking on 'Start & Generate Install Script'
sch.start_deployment(deployment)
Using the Transformer Engine#
The SDK is designed to mirror the UI workflow. This section shows you how to create a self-managed deployment for Transformer using the Docker Image installation type in the UI and how to achieve the same using StreamSets Platform SDK for Python code step by step.
Define the Deployment#
In the UI, a deployment is defined as seen below:
The same effect can be achieved using the streamsets.sdk.sch_models.DeploymentBuilder
class in the SDK. To
instantiate a DeploymentBuilder object, use the streamsets.sdk.ControlHub.get_deployment_builder()
method and then provide configuration options to the streamsets.sdk.sch_models.DeploymentBuilder.build()
method
of the builder. Once the deployment object has been built, it can be added to Control Hub using the
streamsets.sdk.ControlHub.add_deployment()
method:
deployment_builder = sch.get_deployment_builder(deployment_type='SELF')
# sample_environment is an instance of streamsets.sdk.sch_models.SelfManagedEnvironment
deployment = deployment_builder.build(deployment_name='Sample Deployment',
environment=sample_environment,
engine_type='TF',
engine_version='4.1.0',
scala_binary_version='2.11',
deployment_tags=['self-managed-tag'])
sch.add_deployment(deployment)
Configure the Engine#
In the UI, a deployment’s engines are configured as seen below:
When you click on 3 stage libraries selected in the above UI, the following dialog opens and allows you to select stage libraries:
In the above UI, once you select JDBC and click on any of the ‘+’ signs, then it shows the following:
Selecting stage libraries for a deployment is also possible using the SDK. The stage_libs
property of the
streamsets.sdk.sch_models.DeploymentEngineConfiguration
attribute in a
Deployment object allows specification of additional stage libraries in the '<library_name>'
format, or optionally
the '<library_name>:<library_version>'
format.
Note
If a version is omitted for a stage library, it will default to the engine version that was configured for the deployment.
There are several methods available for modifying the stage libraries of a deployment.
If you know the complete list of stage libraries you want to add to a deployment, you can specify them as a list
and set the stage_libs
attribute directly as seen below:
Warning
Attempting to add multiple versions of the same stage library to a deployment’s engine configuration will result in an error when you attempt to add or update a deployment on the StreamSets Platform.
# Stage libraries can be supplied both with and without version specified. Any without a version will default
# to the version of the engine selected for the deployment
deployment.engine_configuration.stage_libs = ['file', 'aws:4.1.0', 'jdbc', 'kafka:4.1.0']
The stage_libs
attribute operates like a traditional list
object, with accompanying append()
,
extend()
, and remove()
methods.
If you are looking to add a single stage library to a deployment’s engine configuration, you can utilize the
streamsets.sdk.sch_models.DeploymentStageLibraries.append()
method, using the same library:version syntax from
above:
# Adding a single additional library to the stage library configuration
deployment.engine_configuration.stage_libs.append('hive:4.1.0')
If you wouldd prefer to add a list of additional stage libraries to a deployment’s engine configuration, you can utilize
the streamsets.sdk.sch_models.DeploymentStageLibraries.extend()
method, which also follows the same
library:version syntax from above:
# Extending the list of stage libraries by adding two additional stages
deployment.engine_configuration.stage_libs.extend(['redshift-no-dependency:4.1.0', 'azure_3_2_0'])
Finally, if you would like to remove a single stage library from a deployment’s engine configuration, you can utilize
the streamsets.sdk.sch_models.DeploymentStageLibraries.remove()
method. The removal of a stage library from
a deployment’s engine configuration intentionally requires a version to be supplied, so as to not accidentally remove
an unintended stage library:
# Removing a single library from the stage library configuration by supplying the library name and version
deployment.engine_configuration.stage_libs.remove('kafka:4.1.0')
Once the desired stage libraries have been set for the deployment, the deployment must be updated on Control Hub using
the streamsets.sdk.ControlHub.update_deployment()
method in order for them to take effect:
# Update a deployment's configuration/definition on Control Hub
sch.update_deployment(deployment)
Configure the Install Type#
In the UI, Install Type for a deployment is configured as seen below:
To set the Install Type for the engine in a deployment via the SDK, the install_type
property can be configured
to either 'TARBALL'
or 'DOCKER'
depending on your needs. If no install_type
is provided, the deployment will
default to 'TARBALL'
:
# Set a deployment's install type to use the docker image
deployment.install_type = 'DOCKER'
# Or, set a deployment's install type to use a tarball
deployment.install_type = 'TARBALL'
Review and Launch the Deployment#
In the UI, a deployment can be reviewed and launched as seen below:
The same effect can be achieved by using the SDK as seen below:
# Optional - equivalent to clicking on 'Start & Generate Install Script'
sch.start_deployment(deployment)
Complete example for the Transformer Engine#
To create a new streamsets.sdk.sch_models.SelfManagedDeployment
object and add it to Control Hub, use the
streamsets.sdk.sch_models.DeploymentBuilder
class.
Use the streamsets.sdk.ControlHub.get_deployment_builder()
method to instantiate the builder object:
deployment_builder = sch.get_deployment_builder(deployment_type='SELF')
Next, retrieve the streamsets.sdk.sch_models.SelfManagedEnvironment
object which represents an active
self-managed environment where engine instances will be deployed, pass it to the
streamsets.sdk.sch_models.DeploymentBuilder.build()
method along with other parameters, and pass the
resulting streamsets.sdk.sch_models.SelfManagedDeployment
object to the
streamsets.sdk.ControlHub.add_deployment()
method:
# sample_environment is an instance of streamsets.sdk.sch_models.SelfManagedEnvironment
deployment = deployment_builder.build(deployment_name='Sample Deployment',
environment=sample_environment,
engine_type='TF',
engine_version='4.1.0',
scala_binary_version='2.11',
deployment_tags=['self-managed-tag'])
sch.add_deployment(deployment)
deployment.install_type = 'DOCKER'
# deployment.install_type = 'TARBALL'
# Optional - add sample stage libs
deployment.engine_configuration.stage_libs = ['file', 'aws_3_2_0:4.1.0', 'jdbc', 'kafka:4.1.0']
# deployment.engine_configuration.stage_libs.append('hive:4.1.0')
# deployment.engine_configuration.stage_libs.extend(['redshift-no-dependency:4.1.0', 'azure_3_2_0'])
# Update a deployment's configuration/definition on Control Hub
sch.update_deployment(deployment)
# Optional - equivalent to clicking on 'Start & Generate Install Script'
sch.start_deployment(deployment)
Retrieving the Install Script#
Once a self-managed deployment has been successfully created and started, the install script for the deployment’s engine(s) can be retrieved from the deployment’s details page in the UI:
To retrieve the install script for a deployment via the SDK, use the streamsets.sdk.sch_models.SelfManagedDeployment.install_script()
method and execute the script according to your requirements:
install_script = deployment.install_script()
Install scripts have the ability to run in background or foreground.
To retrieve the desired install script pass install_mechanism
to the method streamsets.sdk.sch_models.SelfManagedDeployment.install_script()
.
Available install mechanisms are 'DEFAULT'
, 'BACKGORUND'
, 'FOREGROUND'
.
install_script = deployment.install_script(install_mechanism='BACKGROUND')
Additionally, install scripts can also be customized to specify which java version can be used.
To set the java version pass java_version
to the method streamsets.sdk.sch_models.SelfManagedDeployment.install_script()
:
install_script = deployment.install_script(java_version='8')
Note
Supported java versions vary per engine.
Different engine versions support various java versions.
Providing an invalid java_version
will result in an error.
For more information regarding engine java versions, refer to the StreamSets Platform Documentation.