External Libraries

You can install a driver or other library as an external library to make it available to a Transformer stage.

Transformer includes the libraries needed to use most Transformer stages out of the box. However, you might install an external library in the following cases:
  • Some stages, such as the Oracle JDBC Table origin and the MySQL JDBC Table origins, require installing a driver as an external library.
  • Some stages, such as the JDBC origins, lookup, and destination, include several drivers, but require installing a driver to access certain databases.
  • Some stages provide the required libraries, but you can install custom libraries to access custom functionality. For example, you might install a custom Java or Scala library for the Scala processor.

When installing an external library, you install it into the stage library that includes the stage. For example, to use a custom Scala library with Scala processors, you install the Scala library as an external library for the Basic stage library.

To use an external library with multiple stage libraries, install the external library into each stage library associated with the stages. For example, if you want to use an Oracle JDBC driver with the Scala processor and the Oracle JDBC Table origin, you install the driver as an external library for the Basic stage library and for the JDBC stage library.

To install an external library, add the external library to an external resource archive file for the deployment.

When needed, you can update an existing external library. For more information, see Managing External Libraries.

Managing External Libraries

When you run a pipeline that uses a stage library with related external libraries, Transformer uploads those libraries to the cluster as needed.

When you want to update an existing external library, you perform the task differently depending on the cluster that the pipeline runs on:
EMR, EMR Serverless, Databricks, and Dataproc clusters
Transformer can automatically update external libraries in the staging directories for these clusters.
If you update the external library by installing the new version on Transformer and removing the old version, the next time that a related pipeline runs, Transformer automatically updates the external library in the cluster staging directory.
As a result, for these clusters, you do not need to manually remove an old version of an external library from the cluster staging directory, as long as you remove the old version from Transformer.
Use the following steps to update an external library for these clusters:
  1. Install the new version of the external library on Transformer as an external resource.
    Important: Transformer determines that a file is a new version of an existing external library based on file names. Ensure that each new version of an external library uses the same base file name as the existing external library.
  2. Remove the existing version of the external library from Transformer.

    If you do not remove the existing version from Transformer, Transformer assumes both versions are required and uploads both to the cluster staging directory, instead of removing the existing version.

For information about working with external resources, see the Control Hub documentation.

Other supported clusters
For all other clusters, you must manually manage external library updates for both Transformer and cluster staging directories.
Use the following steps to update an external library for these clusters:
  1. Install the new version of the external library on Transformer as an external resource.
  2. Remove the existing version of the external library from Transformer.
  3. To prevent clusters from using the wrong version of the external library, also remove the existing version from cluster staging directories.
    External libraries are uploaded to the following location in the cluster:
    /<staging directory>/<Transformer version>

For information about working with external resources, see the Control Hub documentation.

Name Requirement for Automatic Updates

Transformer can automatically update an external library in a cluster staging directory for certain cluster types so you do not need to manually remove older versions of the libraries. Transformer recognizes that a new external library is related to an existing external library based on their file names.

When evaluating file names, Transformer notes the first number in the name and treats the characters before that number as the file name.

For example, you have an external library named ERfile-5.jar installed on Transformer, and Transformer has uploaded it to the staging directory of your Dataproc cluster. Transformer treats libraries named ERfile-<first number><additional characters> as a different version of the same library. Transformer does not require any numeric progression in the file names.

So if you installed external libraries with any of the following names, Transformer would treat them as related to the existing ERfile-5.jar file:
  • ERfile-1.jar
  • ERfile-3_2023.jar
  • ERfile-A2023.jar

However, Transformer would interpret ERfileA6.jar as a different file name because of the missing dash and the A before the first number.