Managing Engines

When you view engines in the Engines view, you can complete the following tasks on engine instances belonging to all deployment types:
  • Assign labels to engine instances.
  • Override resource thresholds for engine instances.
  • Restart engine instances.
  • Balance jobs on Data Collector engine instances when the jobs are enabled for pipeline failover.
You can complete the following tasks on engine instances belonging to a self-managed deployment:
  • Shut down engine instances. If you shut down an engine instance, you can later use the command line to run a command to start the instance.
  • Delete engine instances.

You cannot shut down or delete engine instances belonging to Control Hub-managed deployments, such as GCE and Kubernetes deployments. Instead, engine instances are automatically shut down or deleted as needed when you edit the deployment. For example, if you edit a GCE deployment to decrease the number of engine instances from 3 to 2, Google Deployment Manager shuts down one of the engine instances and then deletes the engine and the VM instance where the engine runs.

Restarting Engines

Restart an engine instance to apply updates made to the engine configuration in the deployment.

For example, if you edit a deployment to add additional stage libraries, you must restart all engine instances managed by the deployment. During the restart, the stage libraries are installed on the engine instances.

When you edit an active deployment to update the engine configuration, Control Hub prompts you to restart all engine instances. If you choose not to restart the engine instances while editing the deployment, you can restart the engine instances at a later time from the Engines view.
Tip: You can also restart all engine instances for an active deployment from the Deployments view.

During the restart process, the engine instances shut down and then automatically restart.

  1. In the Navigation panel, click Set Up > Engines.
  2. Click an engine type tab.
  3. Select one or more engine instances that you want to restart.
  4. In the toolbar above the engines list, click the More icon () and then click Restart Engines.
  5. Click Restart to confirm.

    It can take a few minutes for an engine instance to shut down and then restart.

  6. Click Close.

Balancing Jobs on Engines

From the Engines view, you can balance all jobs enabled for failover and running on specific Data Collector engine instances.

When balancing the jobs, Control Hub redistributes the pipeline load evenly across all available Data Collectors that have the necessary labels and that have not exceeded any resource thresholds.

For more information about balancing jobs on Data Collector engines, see Balancing Jobs Enabled for Failover.

Shutting Down Engines for Self-Managed Deployments

You can shut down an engine instance belonging to a self-managed deployment, and then manually start the engine instance at a later time. Alternatively, you can stop a self-managed deployment to shut down all engine instances belonging to the deployment.

After an engine instance shuts down, you cannot use the Control Hub UI to manage the engine instance. You must manually start the engine instance from the command line.

Note: You cannot shut down engine instances belonging to Control Hub-managed deployments.
  1. In the Control Hub Navigation panel, click Set Up > Engines.
  2. Click an engine type tab.
  3. Select one or more engine instances that you want to shut down.
  4. In the toolbar above the engines list, click the More icon () and then click Shut Down Engines.
  5. Click OK to confirm, and then click Close.

Starting Engines for Self-Managed Deployments

If an engine instance belonging to a self-managed deployment shuts down, you cannot use the Control Hub UI to manage the instance. You first must manually start the engine instance from the command line.

Note: You do not need to manually start engine instances belonging to Control Hub-managed deployments. Engine instances are automatically started during the provisioning process.
The command that you use to manually start an engine instance depends on the installation type:

Starting from a Tarball Installation

You can manually start an engine instance that has been installed from a tarball.

Important: When starting an engine instance version 5.0.0 and later that has been configured to use a proxy server, you also must define the proxy properties as environment variables when you start the engine. See Starting When Using a Proxy Server.
  1. Open a command prompt on the engine host machine.
  2. Use the cd command to navigate to the engine installation directory.

    The installation directory is named streamsets-<engine type>-<version>.

    For example, streamsets-datacollector-4.0.0.

  3. Run the required command from the installation directory, based on the engine type.
    • For a Data Collector engine, run the following command:
      bin/streamsets dc
    • For a Transformer engine, run the following command:
      bin/streamsets transformer

    It can take a few minutes for the engine instance to start.

Starting When Using a Proxy Server

To manually start an engine instance version 5.0.0 and later that has been installed from a tarball and has been configured to use a proxy server, you must define the proxy properties as environment variables when you start the engine. You can copy the required environment variables from the installation script generated for the self-managed deployment.

For more information about using a proxy server, see the Data Collector documentation or the Transformer documentation.

  1. In the Control Hub Navigation panel, click Set Up > Deployments.
  2. Locate the self-managed deployment that the engine instance belongs to.
  3. In the Actions column, click the More icon () and then click Get Install Script.
  4. Copy the environment variable definitions from the installation script.

    The environment variables are defined before the bash command, as follows:

  5. Click Close.
  6. Open a command prompt on the engine host machine.
  7. Use the cd command to navigate to the engine installation directory.

    The installation directory is named streamsets-<engine type>-<version>.

    For example, streamsets-datacollector-4.0.0

  8. Define the environment variables and run the required command from the installation directory in one of the following ways:
    • Define the variables in-line when you run the start command.
      For example, to start a Data Collector engine, run the following command:
      http_proxy=http://111.222.33.44:3128  https_proxy=http://111.222.33.44:3128  HTTP_PROXYHOST=111.222.33.44 HTTP_PROXYPORT=3128 HTTPS_PROXYHOST=111.222.33.44 HTTPS_PROXYPORT=3128 HTTP_NONPROXYHOSTS="localhost|222.44.55.*|metadata.google.internal|169.254.169.254|127.0.0.1"  no_proxy="localhost,222.44.55.*,metadata.google.internal,169.254.169.254,127.0.0.1"  HTTP_PROXY_AUTH_TUNNELING_DISABLED_SCHEMES="Basic" bin/streamsets dc
    • Export the variables in the command prompt, and then run the start command.

      To save the environment variables in the command prompt so that you do not need to define them each time you manually start the engine instance, export each environment variable as follows:

      export http_proxy=http://111.222.33.44:3128 
      export https_proxy=http://111.222.33.44:3128 
      export HTTP_PROXYHOST=111.222.33.44
      ...
      Then run the required command to start the engine.
      • For a Data Collector engine, run:
        bin/streamsets dc
      • For a Transformer engine, run:
        bin/streamsets transformer

    It can take a few minutes for the engine instance to start.

Starting from Docker

You can manually start an engine instance that has been installed from a Docker image.

  1. Open a command prompt on the engine host machine.
  2. Run the following Docker command:
    docker restart <containerID>

    It can take a few minutes for the engine instance to start.

Deleting Engines from a Self-Managed Deployment

You can delete an engine instance from a self-managed deployment.

For example, you might have a self-managed deployment that manages 5 engine instances, but you no longer need that many engine instances for your pipeline processing needs. So you delete 2 of the engine instances from the deployment.
Note: To delete a deployment and all engine instances in that deployment, see Deleting Deployments.

You cannot delete engine instances from Control Hub-managed deployments. Instead, engine instances are automatically deleted as needed when you edit the deployment.

  1. Stop all jobs running on the engine instance.
  2. In the Navigation panel, click Set Up > Engines.
  3. Click an engine type tab.
  4. Select one or more engine instances that you want to delete, and then click the Delete icon ().
  5. In the Confirmation dialog box, click Delete and Unregister.

    Deleting the engine instance removes the instance from the self-managed deployment, but it does not delete the engine installation files on the machine.

  6. Locate and then delete the installation directory on the machine.

    The installation directory is named streamsets-<engine type>-<version>.

    For example, streamsets-datacollector-4.0.0.