Authoring Engine

You select the authoring engine to use for a pipeline when performing the following tasks:
  • Create a connection
  • Create a Data Collector or Transformer pipeline or pipeline fragment
  • Create a Transformer for Snowflake pipeline or fragment when your organization uses deployed Transformer for Snowflake engines.

    When your organization uses the hosted Transformer for Snowflake engine, Control Hub uses the hosted engine as the authoring engine.

The selected authoring engine determines the stages and functionality you can use in the pipeline. For example, the pipeline canvas allows you to add Google stages to a Data Collector pipeline only when the selected Data Collector authoring engine has the Google stage library installed. Similarly, when a new feature is included in the latest Transformer for Snowflake release, the pipeline canvas only displays that feature when you use the latest version of the engine as the deployed authoring engine.

For connections, the selected authoring Data Collector determines the connection types that you can create.

By default, Control Hub selects an accessible engine that you have read permission on and that has the most recent reported time. You can choose a different engine.

For example, the following image displays the New Pipeline wizard where Control Hub has selected a self-managed authoring engine belonging to the Tutorial deployment:

Selected authoring engine in the New Pipeline wizard

When you click the link next to the selected engine, Control Hub displays the following authoring engine selection window:

Authoring engine selection window displaying two accessible engines

The window displays all engines that you have read permission on. You can filter the list of engines by deployment, version, or label. When a deployment manages multiple engine instances, use the hostname to select a specific engine instance as the authoring engine.

Accessible Engines

You can select any accessible engine as the authoring engine.

An engine is accessible when both of the following are true:
  • The engine is running, and the engine can communicate with Control Hub.

    This is indicated by a green Last Reported Time value in the engine selection window, which displays the last reported time that Control Hub received a heartbeat from the engine.

  • The web browser can reach the engine, using either WebSocket tunneling or direct engine REST APIs.

    This is indicated by a green Accessible check mark in the engine selection window.

The combination of the Last Reported Time and Accessible values indicate whether an engine is accessible. They can have the following possible values:

Both are green
When both the Last Reported Time and Accessible values are green, the engine is accessible:

Engine with green Last Reported Time and Accessible values

Both are red
When both the Last Reported Time and Accessible values are red, the engine is either not running or cannot communicate with Control Hub due to network issues:

Engine with red Last Reported Time and Accessible values

To troubleshoot this issue, verify that the engine is running. If the engine is installed in a system that resides behind a firewall or in a system that limits outbound traffic, verify that the engine is allowed to make outbound connections to Control Hub over HTTPS on port number 443.
Check the current CPU and memory usage of the engine. When an engine uses an excessive amount of available resources, the machine running the engine might lose connection to Control Hub.
Last Reported Time is green, but Accessible is red
When the Last Reported Time value is green, but the Accessible value is red, the engine is running, but the browser cannot reach the engine:

Engine with green Last Reported Time value but red Accessible value

The steps that you take to troubleshoot this issue depend on the engine communication method used by the browser:
  • WebSocket tunneling - The WebSocket tunnel is inactive or reconnecting. Try waiting approximately one minute and then clicking the Refresh button in the engine selection window. This issue most commonly occurs for a self-managed engine that runs on a local machine that has recently restarted or resumed connectivity to the internet.
  • Direct Engine REST APIs - The browser cannot reach the engine URL using the HTTPS protocol. Verify that you have correctly configured the engine to use direct engine REST APIs. This includes verifying that the engine has a valid TLS certificate trusted by the browser and that the browser can access the engine.

If the problem persists for either communication method, try increasing the Authoring Engine Timeout property in the browser settings within the My Account window. Or, check the current CPU and memory usage of the engine. When an engine uses an excessive amount of available resources, the machine running the engine might lose connection to the browser.

For more information about how the browser communicates with engines, see Engine Communication.

Changing the Authoring Engine

When you edit a pipeline or fragment in the pipeline canvas, the selected authoring engine determines the stages, stage libraries, and functionality that display in the pipeline canvas.

When the selected engine is not accessible, you can continue editing the pipeline or fragment based on the following engine types and versions:
Data Collector version 5.4.0 or later
When using Data Collector version 5.4.0 or later, you can edit pipelines and fragments in the pipeline canvas when the engine is not accessible. The engine must be accessible before you can preview, run, validate, or check in the pipeline or fragment.
Note: When using any Data Collector version and the pipeline or fragment includes stages from an Enterprise stage library or a custom stage library, the pipeline canvas displays in read-only mode when the engine is not accessible.
Earlier Data Collector versions or any Transformer or Transformer for Snowflake version
When using Data Collector version 5.3.x or earlier or when using any Transformer or Transformer for Snowflake version, the pipeline canvas displays in read-only mode when the engine is not accessible. To edit the pipeline or fragment, you must select another accessible engine.

As you edit a pipeline or fragment in the pipeline canvas, you can change the authoring engine as long as you select the same engine type of the same or later version. For example, when editing a pipeline created with Data Collector version 5.3.0, you can select another Data Collector version 5.3.0 or later. Similarly, when editing a pipeline created with Transformer 5.7.0, you can select another Transformer version 5.7.0 or later.

If you select a later authoring engine version, Control Hub upgrades the pipeline so that it can no longer run on the earlier engine version.

Tip: If you select a later authoring engine version for a draft pipeline, you can choose to publish the draft pipeline first, and then create a new draft pipeline that is upgraded to run on the later engine version. That way, you retain a pipeline version that can run on the earlier engine version.

In the top left corner of the pipeline canvas, Control Hub displays the name of the parent deployment that the currently selected authoring engine belongs to. Click the down arrow next to the deployment name to view which authoring engine is being used and to optionally change the engine.

For example, the following image shows the currently selected authoring Data Collector that belongs to the Tutorial deployment:

Currently selected authoring Data Collector that belongs to the Tutorial deployment

Stage Libraries

The selected authoring engine determines the stage libraries that are installed and available for use as you design those pipelines and pipeline fragments.

Transformer for Snowflake engines include all available stages and credential stores. You cannot configure the stage libraries for Transformer for Snowflake.

The stage library panel in the pipeline canvas displays all stages. Stages that are not installed on the selected authoring engine appear disabled, or greyed out. For example, the following stage library panel indicates that the Elasticsearch and Google BigQuery origins are not installed:

Stage library panel indicating that the Elasticsearch and Google BigQuery origins are not installed

You can install additional stage libraries, including enterprise stage libraries, from the pipeline canvas. To install an additional stage library, click on a disabled stage. Confirm that you want to install the library, and then restart the engine for the changes to take effect.

Important: Installing an additional stage library from the pipeline canvas installs the library only on the selected authoring engine. You must install the additional library on any other authoring engine used to design the pipeline and on all execution engines where you run the pipeline. For information about adding stage libraries to an existing deployment, see Updating Stage Libraries.
Note: If using the user-provided stage library mode, you cannot install stage libraries from the pipeline canvas.

External Libraries

The selected authoring Data Collector or Transformer determines the external libraries available to stages as you design pipelines and pipeline fragments. For example, some stages, such as most JDBC stages, require installing a JDBC driver as an external library on Data Collector or Transformer.

Transformer for Snowflake engines do not require access to external libraries.

As you design Data Collector or Transformer pipelines, each stage requiring an external library displays the currently installed libraries in the External Libraries tab in the stage properties panel.

For example, the following image shows that a MySQL JDBC driver is installed for the JDBC stage library on the selected authoring Data Collector. As a result, this external library is available to the JDBC Query Consumer origin during pipeline design:

External library available to the JDBC Query Consumer origin during pipeline design

When the parent deployment is not configured to use external resources, you can install an additional external library from the pipeline canvas by clicking Upload External Library from the External Libraries tab. When the parent deployment is configured to use an external resource archive, you update the archive file to install additional libraries. For more information, see External Resources.