Kubernetes Deployments
Section Contents
Kubernetes Deployments#
You can create a Kubernetes deployment for an active Kubernetes environment.
If you have not yet created a Kubernetes environment or are unsure of the prerequisite requirements, please refer to the SDK documentation section on Kubernetes Environments.
When you create a Kubernetes deployment, you define the type of engine, the engine version, and Kubernetes configuration to use to deploy to the Kubernetes cluster specified in the environment you provide. You can also specify the desired number of engine instances, enable autoscaling, and set specific Kubernetes deployment configurations.
For more information on Kubernetes deployments, refer to the StreamSets Platform Documentation.
Creating a Deployment for Data Collector#
The SDK is designed to mirror the workflow seen in the Platform UI. This section shows you how to create a Kubernetes Data Collector deployment in the UI, and the step-by-step equivalent using the StreamSets Platform SDK for Python.
Define the Deployment#
In the Platform UI, a Kubernetes deployment can be defined using the wizard as seen below:
The equivalent steps to define and create a deployment using the SDK require that you have the streamsets.sdk.sch_models.KubernetesEnvironment
instance handy.
To create a deployment for your environment, start by retrieving an instance of streamsets.sdk.sch_models.DeploymentBuilder
.
This is done via the streamsets.sdk.ControlHub.get_deployment_builder()
method, specifying the deployment_type
as 'KUBERNETES'
.
Once the streamsets.sdk.sch_models.DeploymentBuilder
instance has been retrieved, a deployment can be created using the streamsets.sdk.sch_models.DeploymentBuilder.build()
method.
You’ll need to specify key details for the deployment like deployment_name
, engine_type
, engine_version
, and deployment_tags
.
Finally, pass the newly-built streamsets.sdk.sch_models.KubernetesDeployment
object to the streamsets.sdk.ControlHub.add_deployment()
method:
environment = sch.environments.get(environment_id='<environment_id>') # Retrieve an environment by id
# environment = sch.environments.get(environment_name='<environment_name>') Alternatively, retrieve an environment by name
deployment_builder = sch.get_deployment_builder(deployment_type='KUBERNETES')
deployment = deployment_builder.build(deployment_name='Sample Kubernetes Deployment',
environment=environment,
engine_type='DC',
engine_version='4.1.0',
deployment_tags=['k8s-sdc-4.1.0'])
sch.add_deployment(deployment)
Configure the Engine#
In the Platform UI, a Kubernetes deployment’s engine(s) can be configured using the wizard as seen below:
Clicking on the 3 stage libraries selected option allows you to select which stage libraries to install on the engines for the deployment. You can select categories for the stage libraries or search for individual stage libraries by name as seen in the example below:
Selecting the +
symbol to add a stage library allows you to select the version of the stage library you wish to add to the deployment.
For example, if you were to add the JDBC Lookup
stage to your deployment, you would see a selection similar to the following:
Selecting stage libraries for a deployment is also possible using the SDK. The stage_libs
property of the
streamsets.sdk.sch_models.DeploymentEngineConfiguration
attribute in a
Deployment object allows specification of additional stage libraries in the '<library_name>'
format, or optionally
the '<library_name>:<library_version>'
format.
Note
If a version is omitted for a stage library, it will default to the engine version that was configured for the deployment.
There are several methods available for modifying the stage libraries of a deployment.
If you know the complete list of stage libraries you want to add to a deployment, you can specify them as a list
and set the stage_libs
attribute directly as seen below:
Warning
Attempting to add multiple versions of the same stage library to a deployment’s engine configuration will result in an error when you attempt to add or update a deployment on the StreamSets Platform.
# Stage libraries can be supplied both with and without version specified. Any without a version will default
# to the version of the engine selected for the deployment
deployment.engine_configuration.stage_libs = ['jdbc', 'aws:4.1.0', 'cdp_7_1:4.1.0', 'basic:4.1.0', 'dev']
The stage_libs
attribute operates like a traditional list
object, with accompanying append()
,
extend()
, and remove()
methods.
If you are looking to add a single stage library to a deployment’s engine configuration, you can utilize the
streamsets.sdk.sch_models.DeploymentStageLibraries.append()
method, using the same library:version syntax from
above:
# Adding a single additional library to the stage library configuration
deployment.engine_configuration.stage_libs.append('aws')
If you would prefer to add a list of additional stage libraries to a deployment’s engine configuration, you can utilize
the streamsets.sdk.sch_models.DeploymentStageLibraries.extend()
method, which also follows the same
library:version syntax from above:
# Extending the list of stage libraries by adding two additional stages
deployment.engine_configuration.stage_libs.extend(['cassandra_3:4.1.0', 'elasticsearch_7'])
Finally, if you would like to remove a single stage library from a deployment’s engine configuration, you can utilize
the streamsets.sdk.sch_models.DeploymentStageLibraries.remove()
method. The removal of a stage library from
a deployment’s engine configuration intentionally requires a version to be supplied, so as to not accidentally remove
an unintended stage library:
# Removing a single library from the stage library configuration by supplying the library name and version
deployment.engine_configuration.stage_libs.remove('aws:4.1.0')
Once the desired stage libraries have been set for the deployment, the deployment must be updated on Control Hub using
the streamsets.sdk.ControlHub.update_deployment()
method in order for them to take effect:
# Update a deployment's configuration/definition on Control Hub
sch.update_deployment(deployment)
Configure the Kubernetes Deployment#
In the Platform UI, you can also set Kubernetes-specific configurations for the deployment including options like the desired number of instances, CPU and Memory limits, or Kubernetes labels:
The same configuration can be set via the SDK by simply referencing the configuration properties of the streamsets.sdk.sch_models.KubernetesDeployment
you generated in the previous step:
# Set Kubernetes configurations for the deployment
deployment.kubernetes_labels = {'environment': 'streamsets'}
deployment.desired_instances = 2
deployment.cpu_request = '1.0'
deployment.memory_request = '1Gi'
deployment.memory_limit = '4Gi'
Review and Launch the Deployment#
In the Platform UI, you can review and launch your Kubernetes deployment as seen below:
To launch your Kubernetes deployment using the SDK, use the streamsets.sdk.ControlHub.start_deployment()
method and pass in the streamsets.sdk.sch_models.KubernetesDeployment
instance you wish to start:
# Optional - equivalent to clicking on 'Launch Deployment'
sch.start_deployment(deployment)
Bringing It All Together#
The complete scripts from this section can be found below. Commands that only served to verify some output from the example have been removed.
environment = sch.environments.get(environment_id='<environment_id>') # Retrieve an environment by id
# environment = sch.environments.get(environment_name='<environment_name>') Alternatively, retrieve an environment by name
deployment_builder = sch.get_deployment_builder(deployment_type='KUBERNETES')
deployment = deployment_builder.build(deployment_name='Sample Kubernetes Deployment',
environment=environment,
engine_type='DC',
engine_version='4.1.0',
deployment_tags=['k8s-sdc-4.1.0'])
sch.add_deployment(deployment)
# Set Kubernetes configurations for the deployment
deployment.kubernetes_labels = {'environment': 'streamsets'}
deployment.desired_instances = 2
deployment.cpu_request = '1.0'
deployment.memory_request = '1Gi'
deployment.memory_limit = '4Gi'
# Optional - add sample stage libs
deployment.engine_configuration.stage_libs = ['jdbc', 'aws:4.1.0', 'cdp_7_1:4.1.0', 'basic:4.1.0', 'dev']
# deployment.engine_configuration.stage_libs.append('aws')
# deployment.engine_configuration.stage_libs.extend(['cassandra_3:4.1.0', 'elasticsearch_7'])
# Update the deployment's configuration/definition on Control Hub
sch.update_deployment(deployment)
# Optional - equivalent to clicking on 'Launch Deployment'
sch.start_deployment(deployment)
Creating a Deployment for Transformer#
The SDK is designed to mirror the workflow seen in the Platform UI. This section shows you how to create a Kubernetes Transformer deployment in the UI, and the step-by-step equivalent using the StreamSets Platform SDK for Python.
Define the Deployment#
In the Platform UI, a Kubernetes deployment can be defined using the wizard as seen below:
The equivalent steps to define and create a deployment using the SDK require that you have the streamsets.sdk.sch_models.KubernetesEnvironment
instance handy.
To create a deployment for your environment, start by retrieving an instance of streamsets.sdk.sch_models.DeploymentBuilder
.
This is done via the streamsets.sdk.ControlHub.get_deployment_builder()
method, specifying the deployment_type
as 'KUBERNETES'
.
Once the streamsets.sdk.sch_models.DeploymentBuilder
instance has been retrieved, a deployment can be created using the streamsets.sdk.sch_models.DeploymentBuilder.build()
method.
You’ll need to specify key details for the deployment like deployment_name
, engine_type
, engine_version
, scala_binary_version
, and deployment_tags
.
Finally, pass the newly-built streamsets.sdk.sch_models.KubernetesDeployment
object to the streamsets.sdk.ControlHub.add_deployment()
method:
environment = sch.environments.get(environment_id='<environment_id>') # Retrieve an environment by id
# environment = sch.environments.get(environment_name='<environment_name>') Alternatively, retrieve an environment by name
deployment_builder = sch.get_deployment_builder(deployment_type='KUBERNETES')
deployment = deployment_builder.build(deployment_name='Sample Kubernetes Deployment',
environment=environment,
engine_type='TF',
engine_version='4.1.0',
scala_binary_version='2.12',
deployment_tags=['k8s-transformer-4.1.0'])
sch.add_deployment(deployment)
Configure the Engine#
In the Platform UI, a Kubernetes deployment’s engine(s) can be configured using the wizard as seen below:
Clicking on the 2 stage libraries selected option allows you to select which stage libraries to install on the engines for the deployment. You can select categories for the stage libraries or search for individual stage libraries by name as seen in the example below:
Selecting the +
symbol to add a stage library allows you to select the version of the stage library you wish to add to the deployment.
For example, if you were to add the JDBC Lookup
stage to your deployment, you would see a selection similar to the following:
Selecting stage libraries for a deployment is also possible using the SDK. The stage_libs
property of the
streamsets.sdk.sch_models.DeploymentEngineConfiguration
attribute in a
Deployment object allows specification of additional stage libraries in the '<library_name>'
format, or optionally
the '<library_name>:<library_version>'
format.
Note
If a version is omitted for a stage library, it will default to the engine version that was configured for the deployment.
There are several methods available for modifying the stage libraries of a deployment.
If you know the complete list of stage libraries you want to add to a deployment, you can specify them as a list
and set the stage_libs
attribute directly as seen below:
Warning
Attempting to add multiple versions of the same stage library to a deployment’s engine configuration will result in an error when you attempt to add or update a deployment on the StreamSets Platform.
# Stage libraries can be supplied both with and without version specified. Any without a version will default
# to the version of the engine selected for the deployment
deployment.engine_configuration.stage_libs = ['file', 'aws:4.1.0', 'jdbc', 'kafka:4.1.0']
The stage_libs
attribute operates like a traditional list
object, with accompanying append()
,
extend()
, and remove()
methods.
If you are looking to add a single stage library to a deployment’s engine configuration, you can utilize the
streamsets.sdk.sch_models.DeploymentStageLibraries.append()
method, using the same library:version syntax from
above:
# Adding a single additional library to the stage library configuration
deployment.engine_configuration.stage_libs.append('hive:4.1.0')
If you wouldd prefer to add a list of additional stage libraries to a deployment’s engine configuration, you can utilize
the streamsets.sdk.sch_models.DeploymentStageLibraries.extend()
method, which also follows the same
library:version syntax from above:
# Extending the list of stage libraries by adding two additional stages
deployment.engine_configuration.stage_libs.extend(['redshift-no-dependency:4.1.0', 'azure_3_2_0'])
Finally, if you would like to remove a single stage library from a deployment’s engine configuration, you can utilize
the streamsets.sdk.sch_models.DeploymentStageLibraries.remove()
method. The removal of a stage library from
a deployment’s engine configuration intentionally requires a version to be supplied, so as to not accidentally remove
an unintended stage library:
# Removing a single library from the stage library configuration by supplying the library name and version
deployment.engine_configuration.stage_libs.remove('kafka:4.1.0')
Once the desired stage libraries have been set for the deployment, the deployment must be updated on Control Hub using
the streamsets.sdk.ControlHub.update_deployment()
method in order for them to take effect:
# Update a deployment's configuration/definition on Control Hub
sch.update_deployment(deployment)
Configure the Kubernetes Deployment#
In the Platform UI, you can also set Kubernetes-specific configurations for the deployment including options like the desired number of instances, CPU and Memory limits, or Kubernetes labels:
The same configuration can be set via the SDK by simply referencing the configuration properties of the streamsets.sdk.sch_models.KubernetesDeployment
you generated in the previous step:
# Set Kubernetes configurations for the deployment
deployment.kubernetes_labels = {'environment': 'streamsets'}
deployment.desired_instances = 2
deployment.cpu_request = '1.0'
deployment.memory_request = '1Gi'
deployment.memory_limit = '4Gi'
Review and Launch the Deployment#
In the Platform UI, you can review and launch your Kubernetes deployment as seen below:
To launch your Kubernetes deployment using the SDK, use the streamsets.sdk.ControlHub.start_deployment()
method and pass in the streamsets.sdk.sch_models.KubernetesDeployment
instance you wish to start:
# Optional - equivalent to clicking on 'Launch Deployment'
sch.start_deployment(deployment)
Bringing It All Together#
The complete scripts from this section can be found below. Commands that only served to verify some output from the example have been removed.
environment = sch.environments.get(environment_id='<environment_id>') # Retrieve an environment by id
# environment = sch.environments.get(environment_name='<environment_name>') Alternatively, retrieve an environment by name
deployment_builder = sch.get_deployment_builder(deployment_type='KUBERNETES')
deployment = deployment_builder.build(deployment_name='Sample Kubernetes Deployment',
environment=environment,
engine_type='TF',
engine_version='4.1.0',
scala_binary_version='2.12',
deployment_tags=['k8s-sdc-4.1.0'])
sch.add_deployment(deployment)
# Set Kubernetes configurations for the deployment
deployment.kubernetes_labels = {'environment': 'streamsets'}
deployment.desired_instances = 2
deployment.cpu_request = '1.0'
deployment.memory_request = '1Gi'
deployment.memory_limit = '4Gi'
# Optional - add sample stage libs
deployment.engine_configuration.stage_libs = ['file', 'aws_3_2_0:4.1.0', 'jdbc', 'kafka:4.1.0']
# deployment.engine_configuration.stage_libs.append('hive:4.1.0')
# deployment.engine_configuration.stage_libs.extend(['redshift-no-dependency:4.1.0', 'azure_3_2_0'])
# Update the deployment's configuration/definition on Control Hub
sch.update_deployment(deployment)
# Optional - equivalent to clicking on 'Launch Deployment'
sch.start_deployment(deployment)