Managing Engines
- Assign labels to engine instances.
- Override resource thresholds for engine instances.
- Restart engine instances.
- Balance jobs on Data Collector engine instances when the jobs are enabled for pipeline failover.
- Shut down engine instances. If you shut down an engine instance, you can later use the command line to run a command to start the instance.
- Delete engine instances.
You cannot shut down or delete engine instances belonging to Control Hub-managed deployments, such as GCE and Kubernetes deployments. Instead, engine instances are automatically shut down or deleted as needed when you edit the deployment. For example, if you edit a GCE deployment to decrease the number of engine instances from 3 to 2, Google Deployment Manager shuts down one of the engine instances and then deletes the engine and the VM instance where the engine runs.
Restarting Engines
Restart an engine instance to apply updates made to the engine configuration in the deployment.
For example, if you edit a deployment to add additional stage libraries, you must restart all engine instances managed by the deployment. During the restart, the stage libraries are installed on the engine instances.
During the restart process, the engine instances shut down and then automatically restart.
- In the Navigation panel, click .
- Click an engine type tab.
- Select one or more engine instances that you want to restart.
- In the toolbar above the engines list, click the More icon () and then click Restart Engines.
-
Click Restart to confirm.
It can take a few minutes for an engine instance to shut down and then restart.
- Click Close.
Balancing Jobs on Engines
From the Engines view, you can balance all jobs enabled for failover and running on specific Data Collector engine instances.
When balancing the jobs, Control Hub redistributes the pipeline load evenly across all available Data Collectors that have the necessary labels and that have not exceeded any resource thresholds.
For more information about balancing jobs on Data Collector engines, see Balancing Jobs Enabled for Failover.
Shutting Down Engines for Self-Managed Deployments
You can shut down an engine instance belonging to a self-managed deployment, and then manually start the engine instance at a later time. Alternatively, you can stop a self-managed deployment to shut down all engine instances belonging to the deployment.
After an engine instance shuts down, you cannot use the Control Hub UI to manage the engine instance. You must manually start the engine instance from the command line.
- In the Control Hub Navigation panel, click .
- Click an engine type tab.
- Select one or more engine instances that you want to shut down.
- In the toolbar above the engines list, click the More icon () and then click Shut Down Engines.
- Click OK to confirm, and then click Close.
Starting Engines for Self-Managed Deployments
If an engine instance belonging to a self-managed deployment shuts down, you cannot use the Control Hub UI to manage the instance. You first must manually start the engine instance from the command line.
Starting from a Tarball Installation
You can manually start an engine instance that has been installed from a tarball.
- Open a command prompt on the engine host machine.
-
Use the
cd
command to navigate to the engine installation directory.The installation directory is named
streamsets-<engine type>-<version>
.For example,
streamsets-datacollector-4.0.0
. -
Run the required command from the installation directory, based on the engine
type.
- For a Data Collector engine, run the following command:
bin/streamsets dc
- For a Transformer engine, run the following command:
bin/streamsets transformer
- For a deployed
Transformer for Snowflake engine, run the following command:
bin/streamsets streamflake
It can take a few minutes for the engine instance to start.
- For a Data Collector engine, run the following command:
Starting When Using a Proxy Server
To manually start an engine instance version 5.0.0 and later that has been installed from a tarball and has been configured to use a proxy server, you must define the proxy properties as environment variables when you start the engine. You can copy the required environment variables from the installation script generated for the self-managed deployment.
- Data Collector documentation
- Transformer documentation
- Transformer for Snowflake documentation - Applicable when your organization uses a deployed Transformer for Snowflake engine.
- In the Control Hub Navigation panel, click .
- Locate the self-managed deployment that the engine instance belongs to.
- In the Actions column, click the More icon () and then click Get Install Script.
-
Copy the environment variable definitions from the installation script.
The environment variables are defined before the
bash
command, as follows: - Click Close.
- Open a command prompt on the engine host machine.
-
Use the
cd
command to navigate to the engine installation directory.The installation directory is named
streamsets-<engine type>-<version>
.For example,
streamsets-datacollector-4.0.0
-
Define the environment variables and run the required command from the
installation directory in one of the following ways:
- Define the variables in-line when you run the start command.For example, to start a Data Collector engine, run the following command:
http_proxy=http://111.222.33.44:3128 https_proxy=http://111.222.33.44:3128 no_proxy=localhost,metadata.google.internal,169.254.169.254,127.0.0.1 bin/streamsets dc
- Export the variables in the command prompt, and then run the start
command.
To save the environment variables in the command prompt so that you do not need to define them each time you manually start the engine instance, export each environment variable as follows:
export http_proxy=http://111.222.33.44:3128 export https_proxy=http://111.222.33.44:3128 export no_proxy=localhost,metadata.google.internal,169.254.169.254,127.0.0.1 ...
Then run the required command to start the engine.- For a Data Collector engine, run:
bin/streamsets dc
- For a Transformer engine,
run:
bin/streamsets transformer
- For a deployed Transformer for Snowflake engine, run:
bin/streamsets streamflake
- For a Data Collector engine, run:
It can take a few minutes for the engine instance to start.
- Define the variables in-line when you run the start command.
Starting from Docker
You can manually start an engine instance that has been installed from a Docker image.
- Open a command prompt on the engine host machine.
-
Run the following Docker command:
docker restart <containerID>
It can take a few minutes for the engine instance to start.
Deleting Engines from a Self-Managed Deployment
You can delete an engine instance from a self-managed deployment.
You cannot delete engine instances from Control Hub-managed deployments. Instead, engine instances are automatically deleted as needed when you edit the deployment.
- Stop all jobs running on the engine instance.
- In the Navigation panel, click .
- Click an engine type tab.
- Select one or more engine instances that you want to delete, and then click the Delete icon ().
-
In the Confirmation dialog box, click Delete
and Unregister.
Deleting the engine instance removes the instance from the self-managed deployment, but it does not delete the engine installation files on the machine.
-
Locate and then delete the installation directory on the machine.
The installation directory is named streamsets-<engine type>-<version>.
For example, streamsets-datacollector-4.0.0.