External Libraries
You can install a driver or other library as an external library to make it available to a Transformer stage.
- Some stages, such as the Oracle JDBC Table origin and the MySQL JDBC Table origins, require installing a driver as an external library.
- Some stages, such as the JDBC origins, lookup, and destination, include several drivers, but require installing a driver to access certain databases.
- Some stages provide the required libraries, but you can install custom libraries to access custom functionality. For example, you might install a custom Java or Scala library for the Scala processor.
When installing an external library, you install it into the stage library that includes the stage. For example, to use a custom Scala library with Scala processors, you install the Scala library as an external library for the Basic stage library.
To use an external library with multiple stage libraries, install the external library into each stage library associated with the stages. For example, if you want to use an Oracle JDBC driver with the Scala processor and the Oracle JDBC Table origin, you install the driver as an external library for the Basic stage library and for the JDBC stage library.
To install an external library, add the external library to an external resource archive file for the deployment.
When needed, you can update an existing external library. For more information, see Managing External Libraries.
Managing External Libraries
When you run a pipeline that uses a stage library with related external libraries, Transformer uploads those libraries to the cluster as needed.
- EMR, EMR Serverless, Databricks, and Dataproc clusters
- Transformer can automatically update external libraries in the staging directories for these clusters.
- Other supported clusters
- For all other clusters, you must manually manage external library updates for both Transformer and cluster staging directories.
Name Requirement for Automatic Updates
Transformer can automatically update an external library in a cluster staging directory for certain cluster types so you do not need to manually remove older versions of the libraries. Transformer recognizes that a new external library is related to an existing external library based on their file names.
When evaluating file names, Transformer notes the first number in the name and treats the characters before that number as the file name.
For example, you have an external library named ERfile-5.jar
installed
on Transformer,
and Transformer
has uploaded it to the staging directory of your Dataproc cluster. Transformer treats
libraries named ERfile-<first number><additional characters>
as a
different version of the same library. Transformer does
not require any numeric progression in the file names.
ERfile-5.jar
file: ERfile-1.jar
ERfile-3_2023.jar
ERfile-A2023.jar
However, Transformer would interpret ERfileA6.jar
as a different file name because of the
missing dash and the A
before the first number.