Kubernetes Deployments

You can create a Control Hub Kubernetes deployment for an active Kubernetes environment.

When you create the deployment, you define the engine type, version, and configuration to deploy to the Kubernetes cluster. Each engine instance runs in a dedicated Kubernetes pod. You also configure details about the pods, including whether the number of pods are automatically scaled during times of peak performance or whether a fixed number of pods are created.

If you are an advanced Kubernetes user, you can use advanced mode to directly edit the deployment YAML file.

When you start a Control Hub Kubernetes deployment, the StreamSets Kubernetes agent that corresponds to the parent environment creates a YAML file describing the required resources. The YAML file creates a single Kubernetes deployment and secret in the Kubernetes namespace. The YAML file also creates a horizontal pod autoscaler if the Control Hub deployment is configured to allow autoscaling. The Kubernetes deployment then creates a replica set to ensure that enough pods are created, with each pod running a single engine instance.

Kubernetes manages the provisioning and monitoring of the pods. The agent simply receives the status of the deployed engine instances, and communicates the status to Control Hub.

When you stop a Control Hub Kubernetes deployment, all Kubernetes resources created for that deployment are deleted.

Important: StreamSets strongly advises against directly modifying the provisioned resources in your Kubernetes cluster. Doing so may cause unexpected errors.

Before you create a Control Hub Kubernetes deployment, you must complete several prerequisites.

Secrets

When you start a Control Hub Kubernetes deployment, the following information is stored as a Kubernetes secret:
  • Authentication token that the deployment uses to communicate with the StreamSets platform.
  • Proxy credentials, including the HTTP and HTTPS proxy user and password, when you configure engines to use a proxy server.

Prerequisites

Before you create a Control Hub Kubernetes deployment, complete the following prerequisites:
Create a Kubernetes environment
Create and activate a Control Hub Kubernetes environment and launch a StreamSets Kubernetes agent for that environment, as described in Kubernetes Environments.
Optionally, create a Kubernetes service account
By default, each Kubernetes deployment provisioned in the Kubernetes namespace uses the default service account configured for the namespace. If you require a specific service account for this deployment, ask your Kubernetes administrator to create a service account.
You can skip this prerequisite when you want to use the default service account.
Optionally, set up an external resource archive
When your pipelines require external resources and when you plan to deploy multiple engine instances, you must set up an external resource archive that all engine instances can access. When your pipelines do not require external resources or when using a single engine instance to get started with StreamSets, you do not need to complete this prerequisite.
You typically configure a deployment to use an external resource archive when you are ready to move to production, after you have finished building your pipelines and have finalized the list of external resources that your pipelines require. For more information, see External Resources.

Configuring a Kubernetes Deployment

Configure a Control Hub Kubernetes deployment to define the group of engine instances to deploy to a Kubernetes environment.

Important: Before configuring a deployment, you must complete the required prerequisites.

To create a new deployment, click Set Up > Deployments in the Navigation panel, and then click the Create Deployment icon: .

To edit an existing deployment, click Set Up > Deployments in the Navigation panel, click the deployment name, and then click Edit.

Define the Deployment

Define the deployment essentials, including the deployment name and type, the environment that the deployment belongs to, and the engine type and version to deploy.

Once saved, you cannot change the deployment type, the engine version, or the environment.

  1. Configure the following properties:
    Define Deployment Property Description
    Deployment Name Name of the deployment.

    Use a brief name that informs your team of the deployment use case.

    Deployment Type Select Kubernetes.
    Environment Active Kubernetes environment where engine instances will be deployed.
    Engine Type Type of engine to deploy:
    • Data Collector
    • Transformer
    Engine Version Engine version to deploy.
    Deployment Tags Optional tags that identify similar deployments within Control Hub. Use deployment tags to easily search and filter deployments.

    Enter nested tags using the following format:

    <tag1>/<tag2>/<tag3>

  2. If creating the deployment, click one of the following buttons:
    • Cancel - Cancels creating the deployment and exits the wizard.
    • Save & Next - Saves the deployment and continues.
    • Save & Exit - Saves the deployment and exits the wizard, displaying the incomplete deployment in the Deployments view.

Configure the Engine

Define the configuration of the engine to deploy. You can use the defaults to get started.

  1. Configure the following properties:
    Engine Property Description
    Stage Libraries

    Stage libraries to install on the engine.

    The available stage libraries depend on the selected engine type and version.

    Advanced Configuration

    Access to advanced configuration properties to further customize the engine. As you get started with StreamSets, the default values should work in most cases.

    The available properties depend on the selected engine type.

    External Resource Source Source of the external files and libraries, such as JDBC drivers, required by the engine:
    • None - External resources are not defined in the deployment.

      Select when using a single engine instance to get started with StreamSets, or when your pipelines do not require external resources.

    • Archive File - External resources are included in an archive file defined in the deployment.

      Select when the deployment launches multiple engine instances and when your pipelines require external resources.

    External Resource Location

    Location of the archive file that contains the external resources used by the engine. The archive file must be in TGZ or ZIP format.

    Enter the location using one of the following formats:
    • File path. For example: /mnt/shared/externalResources.tgz
      Important: To use a file path, you must use advanced mode to edit the deployment YAML file to mount the file to the engine container.
    • URL. For example: https://<hostname>:<port>/shared/externalResources.tgz
    Tip: Click the download icon to download a sample externalResources.tgz file to view the required directory structure.

    Available when using an archive file as the source for external resources.

    Engine Labels Labels to assign to all engine instances launched for this deployment. Labels determine the group of engine instances that run a job.

    Default is the name of the deployment.

    Max CPU Load (%)

    Maximum percentage of CPU on the host machine that an engine instance can use. When an engine equals or exceeds this threshold, Control Hub does not start new pipeline instances on the engine.

    All engine instances belonging to the deployment inherit these resource threshold values.

    Default is 80.

    Max Memory (%)

    Maximum percentage of the configured Java heap size that an engine instance can use. When an engine equals or exceeds this threshold, Control Hub does not start new pipeline instances on the engine.

    Default is 100.

    Max Running Pipeline Count

    Maximum number of pipelines that can be running on each engine instance. When an engine equals this threshold, Control Hub does not start new pipeline instances on the engine.

    Default is 1,000,000.

  2. If creating the deployment, click one of the following buttons:
    • Back - Returns to the previous step in the wizard.
    • Save & Next - Saves the deployment and continues.
    • Save & Exit - Saves the deployment and exits the wizard, displaying the incomplete deployment in the Deployments view.

Configure the Kubernetes Deployment

Configure details about the pods provisioned in the Kubernetes cluster.

  1. If you are an advanced Kubernetes user and want to directly edit the deployment YAML, select Advanced Mode.
    Note: As you configure the deployment, Control Hub generates a valid YAML file used to provision the Kubernetes resources. The automatically-generated YAML file is sufficient for most use cases.

    For details about using advanced mode, see Advanced Mode.

  2. Configure the following properties:
    Kubernetes Deployment Property Description
    Kubernetes Labels Kubernetes labels to apply to all Kubernetes resources provisioned for this deployment.
    Enter the labels as key-value pairs. For label naming requirements, see the Kubernetes documentation.
    Important: StreamSets reserves app as a label key for its own use. As a result, you cannot define app as a label key.

    You can define the labels using simple or bulk edit mode. In simple edit mode, click Add Another to define additional labels. In bulk edit mode, configure labels in JSON format.

    Note: These labels are applied to Kubernetes resources, not to Control Hub deployments.
    Enable Autoscaling Automatically scale the number of engine instances during times of peak performance. Enable only when your Kubernetes administrator has set up a metrics server as an environment prerequisite.

    For each instance, Kubernetes creates a replicated pod, and then deploys and launches a single engine instance to each pod.

    Important: If your pipelines require external resources, you must set up an external resource archive that all engine instances can access before increasing the number of instances.

    When enabled, configure the minimum and maximum number of instances to deploy. In addition, you must configure the CPU Requested or CPU Limit property, or both properties.

    Note: If you edit an active Kubernetes deployment to enable autoscaling, Kubernetes initially decreases the number of pods to one until it can create a horizontal pod autoscaler, which can take several minutes.
    Desired Instances Number of engine instances to deploy.
    For each instance, Kubernetes creates a replicated pod, and then deploys and launches a single engine instance to each pod.
    Important: If your pipelines require external resources, you must set up an external resource archive that all engine instances can access before increasing the number of instances.

    Default is 1. Set to the minimum value of 0 to temporarily prevent engine instances from running, as an alternative to stopping the deployment but that still incurs minimal costs from the cloud service provider.

    Available when autoscaling is disabled.

    Minimum Instances Minimum number of engine instances to deploy.

    Minimum value is 1.

    Available when autoscaling is enabled.

    Maximum Instances Maximum number of engine instances to deploy.

    Available when autoscaling is enabled.

    CPU Threshold Percentage Target average CPU utilization, represented as a percentage of the requested CPU, over all pods hosting an engine instance.
    For example, if CPU Requested is 100m, and CPU Threshold Percentage is 50%, Kubernetes creates additional pods hosting engine instances when the average CPU usage for all existing pods exceeds 50%. Kubernetes removes pods when the average CPU usage falls below 50%.
    Note: To maintain stability, Kubernetes does not immediately increase or decrease pods when the average CPU usage exceeds or falls below the threshold. Instead, it waits for some time before making changes.

    Available when autoscaling is enabled.

    CPU Requested Requested amount of CPU for each pod hosting an engine instance.

    A pod is guaranteed to have as much CPU as it requests.

    CPU Limit Maximum amount of CPU that each pod hosting an engine instance can use.
    Memory Requested Requested amount of memory for each pod hosting an engine instance.

    Include the units when you enter a value. For example, enter 1024Mi to specify 1024 mebibytes. For more information about Kubernetes memory resource units, see the Kubernetes documentation.

    A pod is guaranteed to have as much memory as it requests.

    Memory Limit Maximum amount of memory in megabytes that each pod hosting an engine instance can use.
    Service Account Name Name of the Kubernetes service account to associate with the Kubernetes deployment provisioned in the namespace. Specify when your Kubernetes administrator has created a service account as a prerequisite.

    When not specified, the default service account configured for the namespace is used.

  3. If creating the deployment, click one of the following buttons:
    • Back - Returns to the previous step in the wizard.
    • Save & Next - Saves the deployment and continues.
    • Save & Exit - Saves the deployment and exits the wizard, displaying the incomplete deployment in the Deployments view.

Share the Deployment

By default, the deployment can only be seen by you. Share the deployment with other users and groups to grant them access to it.

  1. In the Select Users and Groups field, type a user email address or a group name.
  2. Select users or groups from the list, and then click Add.

    The added users and groups display in the User / Group table.

  3. Modify permissions as needed. By default, each added user or group is granted the following permissions:
    • Read - View the details of the deployment and of all engines managed by the deployment. Restart or shut down individual engines managed by the deployment in the Engines view.
    • Write - Edit, start, stop, and delete the deployment. Delete engines managed by the deployment. Also requires read access on the parent environment.
    • Execute - Start jobs on engines managed by the deployment. Starting jobs also requires execute access on the job and read access on the pipeline.

    For more information, see Deployment Permissions.

  4. Click one of the following buttons:
    • Back - Returns to the previous step in the wizard.
    • Save & Next - Saves the deployment and continues.
    • Save & Exit - Saves the deployment and exits the wizard, displaying the incomplete deployment in the Deployments view.

Review and Launch the Deployment

You've successfully finished creating the deployment.

  1. Click one of the following buttons:
    • Exit - Saves the deployment and exits the wizard, displaying the Deactivated deployment in the Deployments view. You can start the deployment at a later time.
    • Launch Deployment - Starts the deployment, as long as the StreamSets Kubernetes agent that corresponds to the parent environment is online. The agent communicates with Control Hub to provision the Kubernetes resources needed to run engines and to deploy engine instances to those resources.
      Note: In some cases, Kubernetes can take several minutes to create and run all resources. A Control Hub Kubernetes deployment transitions to an Active state only when all associated Kubernetes resources are running.
  2. If the deployment launches a Transformer engine that works with a Spark cluster, you must grant the Spark cluster access to Transformer.

    For instructions, see Granting the Spark Cluster Access to Transformer in the Transformer engine documentation.

Advanced Mode

As you configure a Kubernetes deployment, Control Hub generates a valid YAML file used to provision the Kubernetes resources. The automatically generated YAML file is sufficient for most use cases. However, if you are an advanced Kubernetes user, you can use advanced mode to directly edit the deployment YAML file. For example, you might want to edit the YAML to attach extra volumes or to use a custom image.

To access advanced mode, in the Configure Kubernetes Deployment step in the deployment wizard, select Advanced Mode.

The wizard displays the generated YAML. You can directly edit the YAML in the wizard. Or, click the Download file icon () to edit the YAML in a text editor and then upload the edited file.

Click the Reset icon () to reset to the previously saved YAML.

Note: When you edit a cloned Kubernetes deployment, you can select Show Diff to display the YAML differences between the original and cloned deployment.

Use caution when editing the YAML. Control Hub validates that the YAML uses the correct syntax, but cannot validate that you have specified an existing volume or image. Control Hub does place some restrictions on the edits you can make to the file.

The maximum YAML size is 16 KB.

Important: If you add custom objects in the advanced YAML, ensure that the StreamSets Kubernetes agent has sufficient permissions to apply the custom YAML.

Editing a Kubernetes Deployment

You can edit a Control Hub Kubernetes deployment while it is deactivated or active.

When you stop a Control Hub Kubernetes deployment, all Kubernetes resources created for that deployment are deleted. After you edit properties and then restart the deployment, the StreamSets Kubernetes agent communicates with Control Hub to provision the Kubernetes resources needed to run engines and to deploy engine instances to those resources.

When you edit a deployment while it is active, existing Kubernetes resources might be deleted, depending on the following types of edited properties:

General deployment or engine properties
When you edit general deployment or engine properties while the deployment is active, the StreamSets Kubernetes agent continues running the existing pods. Changes are replicated to all StreamSets engine instances on the next restart of the engines.
For example, let's say you edit the deployment to install additional stage libraries on the engine instances, and then you instruct Control Hub to restart all engine instances. The StreamSets Kubernetes agent restarts the StreamSets engine instances on the existing pods, which triggers the installation of the additional stage libraries and the engine property changes.
Kubernetes properties
When you edit Kubernetes properties while the deployment is active, Kubernetes might replace all of the existing Kubernetes pods, depending on the change. If a replacement is needed, Kubernetes deletes the pods one by one to prevent engine downtime.
For example, if you edit the deployment to increase the number of engine instances from 2 to 3, the StreamSets Kubernetes agent applies the changes and Kubernetes provisions a new pod. If you edit a deployment to enable autoscaling, the StreamSets Kubernetes agent creates a horizontal pod autoscaler, which might delete the existing Kubernetes pods or provision new ones.

To edit a deployment, locate the deployment in the Deployments view. In the Actions column, click the More icon () and then click Edit.