External Libraries

You can install a driver or other library as an external library to make it available to a Transformer stage.

Transformer includes the libraries needed to use most Transformer stages out of the box. However, you might install an external library in the following cases:
  • Some stages, such as the Oracle JDBC Table origin and the MySQL JDBC Table origins, require installing a driver as an external library.
  • Some stages, such as the JDBC origins, lookup, and destination, include several drivers, but require installing a driver to access certain databases.
  • Some stages provide the required libraries, but you can install custom libraries to access custom functionality. For example, you might install a custom Java or Scala library for the Scala processor.

When installing an external library, you install it into the stage library that includes the stage. For example, to use a custom Scala library with Scala processors, you install the Scala library as an external library for the Basic stage library.

To use an external library with multiple stage libraries, install the external library into each stage library associated with the stages. For example, if you want to use an Oracle JDBC driver with the Scala processor and the Oracle JDBC Table origin, you install the driver as an external library for the Basic stage library and for the JDBC stage library.

By default, external libraries are installed to the $TRANSFORMER_EXTERNAL_RESOURCES/streamsets-libs-extras directory. StreamSets recommends configuring Transformer to use an external directory to enable use of the libraries after Transformer upgrades.

You can install external libraries any of the following ways:

When needed, you can update or remove an existing external library. For more information, see Managing External Libraries.

Setting Up an External Directory

By default, external libraries are installed to the $TRANSFORMER_EXTERNAL_RESOURCES/streamsets-libs-extras directory.

For an RPM installation, you must configure Transformer to use an external directory before you can install external libraries from Package Manager or from the stage properties panel.

For a tarball installation, you can use the default directory as you get started with Transformer. However, StreamSets recommends configuring Transformer to use an external directory to enable use of the libraries after Transformer upgrades.

Use the procedure for your Transformer installation type.

Setting Up for Tarball and RPM Installations

Before you install external libraries for a tarball or RPM installation, set up an external directory to store the libraries.

  1. Create a local directory external to the Transformer installation directory.
    For example, if you installed Transformer in the following directory:
    /opt/transformer/
    you might create the external directory at:
    /opt/transformer-extras
  2. Grant the user who starts Transformer ownership on the external directory.
    For example, if you use the default system user and group named transformer to run Transformer as a service, use the following command to change the owner of the external directory and all files in the directory to transformer:transformer:
    chown -R transformer:transformer /opt/transformer-extras
  3. Add the STREAMSETS_LIBRARIES_EXTRA_DIR environment variable to the appropriate file and point it to the external directory.

    Modify environment variables using the method required by your installation type.

    Set the environment variable as follows:

    export STREAMSETS_LIBRARIES_EXTRA_DIR="<external directory>"

    For example:

    export STREAMSETS_LIBRARIES_EXTRA_DIR="/opt/transformer-extras/"
  4. When using the Java Security Manager, which is enabled by default, update the Transformer security policy to include the external directory as follows:
    1. In the Transformer configuration directory, open the security policy file, $TRANSFORMER_CONF/transformer-security.policy.
    2. Add the following lines to the file:
      // user-defined external directory
      grant codebase "file://<external directory>-" {
        permission java.security.AllPermission;
      };
      For example:
      // user-defined external directory
      grant codebase "file:///opt/transformer-extras/-" {
        permission java.security.AllPermission;
      };
  5. Restart Transformer.

Installing from Package Manager

You can use the Transformer Package Manager to install external libraries for all stage libraries.
Important: For an RPM installation, you must configure Transformer to use an external directory before you can install external libraries from Package Manager.
  1. In Transformer, in the top right toolbar, click the Package Manager icon:

  2. In Package Manager, determine the stage library where you want to install the external library.

    Each stage library lists the stages included in the stage library. For example, the File stage library shown below includes the File and Whole Directory origins, and the File destination:

  3. At the bottom of the navigation panel, click External Libraries.

    Package Manager lists any currently installed external libraries.

  4. Click the Install External Libraries icon, located under the top right toolbar:

  5. In the Install External Libraries dialog box, select the stage library that needs to access the external library.

    For example, to install a custom Java library for the Scala processor, select the Basic stage library.

  6. Browse to select the external library to install and click Open.
  7. To install the external library into the specified stage library, click Upload.

    Package Manager installs the external library and displays a message offering to restart Transformer.

  8. To install additional external libraries, click Cancel, then repeat steps 4 - 7 for every stage library that needs access to the external library.
  9. After installing all of the external libraries that you want, restart Transformer in one of the following ways:
    • If you started Transformer manually, click Restart Transformer in the Install External Libraries dialog box.
    • If you started Transformer as a service, you must use the command line for restart. Click Cancel in the Install External Libraries dialog box, and then run the required command for your operating system:
      • For CentOS 6, Oracle Linux 6, or Red Hat Enterprise Linux 6, use:
        service transformer restart
      • For CentOS 7, Oracle Linux 7, or Red Hat Enterprise Linux 7, use:
        systemctl restart transformer

Installing from Stage Properties

When configuring a pipeline, you can use the stage properties panel to install external libraries for the stage library that includes the stage.
Important: For an RPM installation, you must configure Transformer to use an external directory before you can install external libraries from the stage properties panel.
  1. While configuring a pipeline, select a stage that requires an external library in the pipeline canvas.
  2. In the stage properties panel, click the External Libraries tab:

  3. Click the Install External Libraries icon: .
  4. In the Install External Libraries dialog box, select the stage library that needs to access the external library.

    For example, to install a custom Java library for the Scala processor, select the Basic stage library.

  5. Browse to select the external library to install and click Open.
  6. To install the external library into the specified stage library, click Upload.

    Transformer installs the external library. All stages included in the specified stage library can use this external library. For example, if you installed a JDBC driver for the JDBC stage library, then every JDBC origin, lookup, and destination can also access the driver.

    To use the external library with other stage libraries, you must install the library into the additional stage libraries. For example, if you want to use the same JDBC driver with the Scala processor, you must also install the driver as an external library for the Basic stage library.

  7. Restart Transformer in one of the following ways:
    • If you started Transformer manually, click Restart Transformer in the Install External Libraries dialog box.
    • If you started Transformer as a service, you must use the command line for restart. Click Cancel in the Install External Libraries dialog box, and then run the required command for your operating system:
      • For CentOS 6, Oracle Linux 6, or Red Hat Enterprise Linux 6, use:
        service transformer restart
      • For CentOS 7, Oracle Linux 7, or Red Hat Enterprise Linux 7, use:
        systemctl restart transformer

Installing Manually

To install external libraries manually, use the procedure required for your installation type.

Tarball and RPM Installations

To manually install external libraries for a tarball or RPM Transformer installation, perform the following steps:
  1. In the directory where Transformer installs external libraries, create subdirectories for each set of external libraries based on the stage library name.

    For example, if you set up an external directory to store the libraries at /opt/transformer-extras, then create the subdirectories as follows:

    /opt/transformer-extras/<stage library name>/lib/
    To install drivers for stages included with the JDBC stage library, create the following subdirectory:
    /opt/transformer-extras/streamsets-spark-jdbc-lib/lib/

    To also install drivers for stages in the Basic stage library, create the following subdirectory:

    /opt/transformer-extras/streamsets-spark-basic-lib/lib/
    Note: You can use Package Manager to easily view the stages in each stage library. You can find stage library names in the $TRANSFORMER_DIST/streamsets-libs directory.
  2. Copy the external libraries to the appropriate subdirectories.
  3. Restart Transformer.

Managing External Libraries

When you run a pipeline that uses a stage library with related external libraries, Transformer uploads those libraries to the cluster as needed.

When you want to update or remove an existing external library, you perform the task differently depending on the cluster that the pipeline runs on:
EMR, EMR Serverless, Databricks, and Dataproc clusters
Transformer automatically updates the staging directories for these clusters. You do not need to manually manage external libraries in these cluster staging directories.
For example, if you remove an obsolete external library from Transformer, the next time that a related pipeline runs, Transformer removes that external library from the cluster staging directory.
Similarly, if you update an external library by installing the new version on Transformer and removing the old version, the next time that a related pipeline runs, Transformer automatically updates the external library in the cluster staging directory.
Use the following steps to manage external libraries for these clusters:
  1. Install or update an external library on Transformer, as needed.
    Important: Transformer uses file names to determine if external libraries have changed. When updating an external library, ensure that the new file name differs from the previous file name.
  2. Remove obsolete external libraries from Transformer, as needed.
Other supported clusters
For all other cluster types, you must manually manage external library updates for both Transformer and cluster staging directories.
Use the following steps to manage external libraries for these clusters:
  1. When adding a new external library or updating an existing external library, install the external library on Transformer.

    Transformer uploads those libraries to the cluster when you run a related pipeline.

  2. When updating an existing external library, remove the existing version of the external library from Transformer . Also remove any other obsolete external libraries.
  3. To prevent clusters from using obsolete external libraries, also remove the obsolete external libraries from cluster staging directories.
    External libraries are uploaded to the following location in the cluster:
    /<staging directory>/<Transformer version>