Authoring Engine

You select the authoring engine to use when you create a connection, or a Data Collector or Transformer pipeline or pipeline fragment. Transformer for Snowflake pipelines do not require an authoring engine.

For pipelines and fragments, the selected authoring engine determines the stages, stage libraries, and functionality that display in the pipeline canvas. For connections, the selected authoring Data Collector determines the connection types that you can create.

By default, Control Hub selects an accessible engine that you have read permission on and that has the most recent reported time. You can choose a different engine.

For example, the following image displays the New Pipeline wizard where Control Hub has selected a self-managed authoring engine belonging to the Tutorial deployment:

When you click the link next to the selected engine, Control Hub displays the following authoring engine selection window:

The window displays all engines that you have read permission on. You can filter the list of engines by deployment, version, or label. When a deployment manages multiple engine instances, use the hostname to select a specific engine instance as the authoring engine.

Accessible Engines

You can select any accessible engine as the authoring engine.

An engine is accessible when both of the following are true:
  • The engine is running, and the engine can communicate with Control Hub.

    This is indicated by a green Last Reported Time value in the engine selection window, which displays the last reported time that Control Hub received a heartbeat from the engine.

  • The web browser can reach the engine, using either WebSocket tunneling or direct engine REST APIs.

    This is indicated by a green Accessible check mark in the engine selection window.

The combination of the Last Reported Time and Accessible values indicate whether an engine is accessible. They can have the following possible values:

Both are green
When both the Last Reported Time and Accessible values are green, the engine is accessible:

Both are red
When both the Last Reported Time and Accessible values are red, the engine is either not running or cannot communicate with Control Hub due to network issues:

To troubleshoot this issue, verify that the engine is running. If the engine is installed in a system that resides behind a firewall or in a system that limits outbound traffic, verify that the engine is allowed to make outbound connections to Control Hub over HTTPS on port number 443.
Check the current CPU and memory usage of the engine. When an engine uses an excessive amount of available resources, the machine running the engine might lose connection to Control Hub.
Last Reported Time is green, but Accessible is red
When the Last Reported Time value is green, but the Accessible value is red, the engine is running, but the browser cannot reach the engine:

The steps that you take to troubleshoot this issue depend on the engine communication method used by the browser:
  • WebSocket tunneling - The WebSocket tunnel is inactive or reconnecting. Try waiting approximately one minute and then clicking the Refresh button in the engine selection window. This issue most commonly occurs for a self-managed engine that runs on a local machine that has recently restarted or resumed connectivity to the internet.
  • Direct Engine REST APIs - The browser cannot reach the engine URL using the HTTPS protocol. Verify that you have correctly configured the engine to use direct engine REST APIs. This includes verifying that the engine has a valid TLS certificate trusted by the browser and that the browser can access the engine.

If the problem persists for either communication method, try increasing the Authoring Engine Timeout property in the browser settings within the My Account window. Or, check the current CPU and memory usage of the engine. When an engine uses an excessive amount of available resources, the machine running the engine might lose connection to the browser.

For more information about how the browser communicates with engines, see Engine Communication.

Changing the Authoring Engine

When you edit a pipeline or fragment in the pipeline canvas, the selected authoring engine determines the stages, stage libraries, and functionality that display in the pipeline canvas.

When the selected engine is not accessible, you can continue editing the pipeline or fragment based on the following engine types and versions:
Data Collector version 5.4.0 or later
When using Data Collector version 5.4.0 or later, you can edit pipelines and fragments in the pipeline canvas when the engine is not accessible. The engine must be accessible before you can preview, run, validate, or check in the pipeline or fragment.
Note: When using any Data Collector version and the pipeline or fragment includes stages from an Enterprise stage library or a custom stage library, the pipeline canvas displays in read-only mode when the engine is not accessible.
Earlier Data Collector version or any Transformer version
When using Data Collector version 5.3.x or earlier or when using any Transformer version, the pipeline canvas displays in read-only mode when the engine is not accessible. To edit the pipeline or fragment, you must select another accessible engine.

As you edit a pipeline or fragment in the pipeline canvas, you can change the authoring engine as long as you select the same engine type of the same or later version. For example, when editing a pipeline created with Data Collector version 5.3.0, you can select another Data Collector version 5.3.0 or later. You cannot select Transformer nor Data Collector version 5.1.0.

If you select a later authoring engine version, Control Hub upgrades the pipeline so that it can no longer run on the earlier engine version.

Tip: If you select a later authoring engine version for a draft pipeline, you can choose to publish the draft pipeline first, and then create a new draft pipeline that is upgraded to run on the later engine version. That way, you retain a pipeline version that can run on the earlier engine version.

In the top left corner of the pipeline canvas, Control Hub displays the name of the parent deployment that the currently selected authoring engine belongs to. Click the down arrow next to the deployment name to view which authoring engine is being used and to optionally change the engine.

For example, the following image shows the currently selected authoring Data Collector that belongs to the Tutorial deployment:

Stage Libraries

The selected authoring Data Collector or Transformer determines the stage libraries that are installed and available for use as you design pipelines and pipeline fragments.

The stage library panel in the pipeline canvas displays all stages. Stages that are not installed on the selected authoring engine appear disabled, or greyed out. For example, the stage library panel shown below indicates that the Elasticsearch and Google BigQuery origins are not installed:

You can install additional stage libraries, including enterprise stage libraries, from the pipeline canvas. To install an additional stage library, click on a disabled stage. Confirm that you want to install the library, and then restart the engine for the changes to take effect.

Important: Installing an additional stage library from the pipeline canvas installs the library only on the selected authoring engine. You must install the additional library on any other authoring engine used to design the pipeline and on all execution engines where you run the pipeline. For information about adding stage libraries to an existing deployment, see Updating Stage Libraries.

External Libraries

The selected authoring Data Collector or Transformer determines the external libraries available to stages as you design pipelines and pipeline fragments. For example, some stages, such as most JDBC stages, require installing a JDBC driver as an external library on Data Collector or Transformer.

As you design pipelines, each stage requiring an external library displays the currently installed libraries in the External Libraries tab in the stage properties panel.

For example, the following image shows that a MySQL JDBC driver is installed for the JDBC stage library on the selected authoring Data Collector. As a result, this external library is available to the JDBC Query Consumer origin during pipeline design:

When the parent deployment is not configured to use external resources, you can install an additional external library from the pipeline canvas by clicking Upload External Library from the External Libraries tab. When the parent deployment is configured to use an external resource archive, you update the archive file to install additional libraries. For more information, see External Resources.