External Libraries

You can install a driver or other library as an external library to make it available to a Transformer stage.

Transformer includes the libraries needed to use most Transformer stages out of the box. However, you might install an external library in the following cases:
  • Some stages, such as the Oracle JDBC Table origin and the MySQL JDBC Table origins, require installing a driver as an external library.
  • Some stages, such as the JDBC origins, lookup, and destination, include several drivers, but require installing a driver to access certain databases.
  • Some stages provide the required libraries, but you can install custom libraries to access custom functionality. For example, you might install a custom Java or Scala library for the Scala processor.

When installing an external library, you install it into the stage library that includes the stage. For example, to use a custom Scala library with Scala processors, you install the Scala library as an external library for the Basic stage library.

To use an external library with multiple stage libraries, install the external library into each stage library associated with the stages. For example, if you want to use an Oracle JDBC driver with the Scala processor and the Oracle JDBC Table origin, you install the driver as an external library for the Basic stage library and for the JDBC stage library.

To install an external library, add the external library to an external resource archive file for the deployment.

When needed, you can update or remove an existing external library. For more information, see Managing External Libraries.

Managing External Libraries

When you run a pipeline that uses a stage library with related external libraries, Transformer uploads those libraries to the cluster as needed.

When you want to update or remove an existing external library, you perform the task differently depending on the cluster that the pipeline runs on:
EMR, EMR Serverless, Databricks, and Dataproc clusters
Transformer automatically updates the staging directories for these clusters. You do not need to manually manage external libraries in these cluster staging directories.
For example, if you remove an obsolete external library from Transformer, the next time that a related pipeline runs, Transformer removes that external library from the cluster staging directory.
Similarly, if you update an external library by installing the new version on Transformer and removing the old version, the next time that a related pipeline runs, Transformer automatically updates the external library in the cluster staging directory.
Use the following steps to manage external libraries for these clusters:
  1. Install or update an external library on Transformer as an external resource, as needed.
    Important: Transformer uses file names to determine if external libraries have changed. When updating an external library, ensure that the new file name differs from the previous file name.
  2. Remove obsolete external libraries from Transformer, as needed.

For information about working with external resources, see the Control Hub documentation.

Other supported clusters
For all other cluster types, you must manually manage external library updates for both Transformer and cluster staging directories.
Use the following steps to manage external libraries for these clusters:
  1. When adding a new external library or updating an existing external library, install the external library on Transformer as an external resource.

    Transformer uploads those libraries to the cluster when you run a related pipeline.

  2. When updating an existing external library, remove the existing version of the external library from Transformer external resources. Also remove any other obsolete external libraries.
  3. To prevent clusters from using obsolete external libraries, also remove the obsolete external libraries from cluster staging directories.
    External libraries are uploaded to the following location in the cluster:
    /<staging directory>/<Transformer version>

For information about working with external resources, see the Control Hub documentation.