Self-Managed Deployments
You can create a self-managed deployment for an active self-managed environment.
When using a self-managed deployment, you take full control of procuring the resources needed to run engine instances. The resources can be local on-premises machines or cloud computing machines. You must set up the machines and complete the installation prerequisites required by the StreamSets engine type. When the machines reside behind a firewall, you must allow the required inbound and outbound traffic to each machine, as described in Firewall Configuration.
When you create a self-managed deployment, you define the engine type, version, and configuration to deploy. You select the installation type to use - either a tarball or a Docker image. You can also use the Quick Start menu to quickly create a self-managed deployment that deploys an engine instance using Docker.
After you create and start a self-managed deployment, Control Hub displays the engine installation script that you run to install and launch engine instances on the on-premises or cloud computing machines that you have set up. You can configure the installation script to run engine instances as a foreground or background process.
Using a self-managed deployment is the simplest way to get started with StreamSets. After getting started, you might consider using one of the cloud service provider integrations that StreamSets provides, such as the AWS and GCP environments and deployments. With these integrations, Control Hub automatically provisions the resources needed to run the engine type in your cloud service provider account, and then deploys engine instances to those resources.
Quick Start Deployments
To quickly create a self-managed deployment that deploys an engine instance using Docker, click Quick Start in the top toolbar:
After creating the deployment, copy the generated command and then launch an engine for the deployment.
- Docker Data Collector <number> (Quick Start)
- Docker Transformer <number> (Quick Start)
For example, if you create two deployments for Data Collector from the Quick Start menu, Control Hub names the deployments Docker Data Collector 1 (Quick Start) and Docker Data Collector 2 (Quick Start).
You can rename quick start deployments or remove the default quick-start tag.
Creating a Self-Managed Deployment
Create a self-managed deployment to define the group of engine instances to deploy to a self-managed environment.
To create a new self-managed deployment, click Create
Deployment icon: .
Define the Deployment
Define the deployment essentials, including the deployment name and type, the environment that the deployment belongs to, and the engine type and version to deploy.
Once saved, you cannot change the deployment type, the engine version, or the environment.
Configure the Engine
Define the configuration of the engine to deploy. You can use the defaults to get started.
Configure the Install Type
Select the type of engine installation to deploy to a local on-premises or cloud computing machine.
Share the Deployment
By default, the deployment can only be seen by you. Share the deployment with other users and groups to grant them access to it.
Review and Launch the Deployment
You've successfully finished creating the deployment.
Foreground or Background Process
- Foreground
- When the installation script runs an engine instance as a foreground process, you cannot run additional commands from that command prompt while the engine runs. The command prompt must remain open for the engine to continue to run. If you close the command prompt, the engine shuts down.
- Background
- When the installation script runs an engine instance as a background process, you regain access to the command prompt after the engine starts. You can run additional commands from that command prompt as the engine runs. If you close the command prompt, the engine continues to run.
By default, a tarball installation script runs an engine instance as a foreground process. A Docker installation script runs an engine instance as a background process.
Launching an Engine for a Deployment
After creating a self-managed deployment, you set up a machine that meets the engine requirements. The machine can be a local on-premises machine or a cloud computing machine. Then, you manually run the engine installation script to install and launch an engine instance on the machine.
When the machine resides behind a firewall, you also must allow the required inbound and outbound traffic to each machine, as described in Firewall Configuration.
Launching a Data Collector Docker Image
Complete the following steps on the machine where you want to launch the Data Collector Docker image.
Launching a Data Collector Tarball
Complete the following steps on the machine where you want to install and launch the Data Collector tarball.
Launching Transformer when Spark Runs Locally
To get started with Transformer, you can use a local Spark installation that runs on the same machine as Transformer.
This allows you to easily develop and test local pipelines, which run on the local Spark installation.
Launching a Transformer Docker Image
To use a Transformer Docker image when Spark runs locally, complete the following steps on the machine where you want to launch Transformer. The Docker image includes a local Spark installation that matches the Scala version selected for the engine version.
Launching a Transformer Tarball
To use a Transformer tarball when Spark runs locally, complete the following steps on the machine where you want to install and launch Transformer.
Launching Transformer when Spark Runs on a Cluster
In a production environment, use a Spark installation that runs on a cluster to leverage the performance and scale that Spark offers.
Install Transformer on a machine that is configured to submit Spark jobs to the cluster. When you run Transformer pipelines, Spark distributes the processing across nodes in the cluster.
For information about each cluster type, see Cluster Types in the Transformer engine documentation.
Launching a Transformer Docker Image
To use a Transformer Docker image when Spark runs on a cluster, complete the following steps on the machine where you want to launch Transformer.
Launching a Transformer Tarball
To use a Transformer tarball when Spark runs on a cluster, complete the following steps on the machine where you want to launch Transformer.
Retrieving the Installation Script
You can retrieve the installation script generated for a self-managed deployment.
- In the Control Hub Navigation panel, click .
- Locate the self-managed deployment that you want to launch an engine instance for.
-
In the Actions column, click the
More icon (
) and then click Get Install Script.
- Select whether the installation script runs an engine instance as a foreground or background process.
-
Click the Copy to Clipboard icon (
) to copy the generated command, and then click Close.
Running the Installation Script without Prompts
When you run the engine installation script for a tarball installation, you must respond to command prompts to enter download and installation directories. To skip the prompts, you can optionally define the directories as command arguments.
You might skip the command prompts if you set up an automation tool such as Ansible to install and launch engines. Or you might skip the prompts if you prefer to define the directories at the same time that you run the command.
To skip the prompts, include the following arguments in the installation script command:
Argument | Value |
---|---|
--no-prompt |
None. Indicates that the script should run without prompts. |
--download-dir |
Enter the full path to an existing download directory. |
--install-dir |
Enter the full path to an existing installation directory. |
bash -c ' curl https://na01.hub.streamsets.com/streamsets-engine-install.sh | bash -s -- --deployment-id="<deployment_ID>" --deployment-token="<deployment_token>" --sch-url="https://na01.hub.streamsets.com" --foreground --no-prompt --download-dir=/tmp/streamsets --install-dir=/opt/streamsets-datacollector'