External Libraries
You can install a driver or other library as an external library to make it available to a Transformer stage.
- Some stages, such as the Oracle JDBC Table origin and the MySQL JDBC Table origins, require installing a driver as an external library.
- Some stages, such as the JDBC origins, lookup, and destination, include several drivers, but require installing a driver to access certain databases.
- Some stages provide the required libraries, but you can install custom libraries to access custom functionality. For example, you might install a custom Java or Scala library for the Scala processor.
When installing an external library, you install it into the stage library that includes the stage. For example, to use a custom Scala library with Scala processors, you install the Scala library as an external library for the Basic stage library.
To use an external library with multiple stage libraries, install the external library into each stage library associated with the stages. For example, if you want to use an Oracle JDBC driver with the Scala processor and the Oracle JDBC Table origin, you install the driver as an external library for the Basic stage library and for the JDBC stage library.
By default, external libraries are installed to the $TRANSFORMER_EXTERNAL_RESOURCES/streamsets-libs-extras directory. StreamSets recommends configuring Transformer to use an external directory to enable use of the libraries after Transformer upgrades.
When needed, you can update or remove an existing external library. For more information, see Managing External Libraries.
Setting Up an External Directory
By default, external libraries are installed to the $TRANSFORMER_EXTERNAL_RESOURCES/streamsets-libs-extras directory.
For an RPM installation, you must configure Transformer to use an external directory before you can install external libraries from Package Manager or from the stage properties panel.
For a tarball installation, you can use the default directory as you get started with Transformer. However, StreamSets recommends configuring Transformer to use an external directory to enable use of the libraries after Transformer upgrades.
Use the procedure for your Transformer installation type.
Setting Up for Tarball and RPM Installations
Before you install external libraries for a tarball or RPM installation, set up an external directory to store the libraries.
- Create a local directory external to the Transformer installation directory.For example, if you installed Transformer in the following directory:
you might create the external directory at:/opt/transformer/
/opt/transformer-extras
- Grant the user who starts Transformer ownership on the external directory.For example, if you use the default system user and group named
transformer
to run Transformer as a service, use the following command to change the owner of the external directory and all files in the directory totransformer:transformer
:chown -R transformer:transformer /opt/transformer-extras
- Add the STREAMSETS_LIBRARIES_EXTRA_DIR environment variable to the appropriate
file and point it to the external directory.
Modify environment variables using the method required by your installation type.
Set the environment variable as follows:
export STREAMSETS_LIBRARIES_EXTRA_DIR="<external directory>"
For example:
export STREAMSETS_LIBRARIES_EXTRA_DIR="/opt/transformer-extras/"
- When using the Java Security Manager, which is enabled by default, update the
Transformer security policy to include the external directory as follows:
- In the Transformer configuration directory, open the security policy file,
$TRANSFORMER_CONF/transformer-security.policy
. - Add the following lines to the
file:
// user-defined external directory grant codebase "file://<external directory>-" { permission java.security.AllPermission; };
For example:// user-defined external directory grant codebase "file:///opt/transformer-extras/-" { permission java.security.AllPermission; };
- In the Transformer configuration directory, open the security policy file,
- Restart Transformer.
Installing from Package Manager
- In Transformer, in the top right toolbar, click the Package Manager icon:
- In Package Manager, determine the stage library where you want
to install the external library.
Each stage library lists the stages included in the stage library. For example, the File stage library shown below includes the File and Whole Directory origins, and the File destination:
- At the bottom of the navigation panel, click External
Libraries.
Package Manager lists any currently installed external libraries.
- Click the Install External Libraries icon, located under the top right toolbar:
- In the Install External Libraries dialog box, select the
stage library that needs to access the external library.
For example, to install a custom Java library for the Scala processor, select the Basic stage library.
- Browse to select the external library to install and click Open.
- To install the external library into the specified stage library, click
Upload.
Package Manager installs the external library and displays a message offering to restart Transformer.
- To install additional external libraries, click Cancel, then repeat steps 4 - 7 for every stage library that needs access to the external library.
- After installing all of the external libraries that you want, restart Transformer in one of the following ways:
- If you started Transformer manually, click Restart Transformer in the Install External Libraries dialog box.
- If you started Transformer as a service, you must use the command line for restart. Click
Cancel in the Install External
Libraries dialog box, and then run the required command for
your operating system:
- For CentOS 6, Oracle Linux 6, or Red Hat Enterprise Linux 6, use:
service transformer restart
- For CentOS 7, Oracle Linux 7, or Red Hat Enterprise Linux 7, use:
systemctl restart transformer
- For CentOS 6, Oracle Linux 6, or Red Hat Enterprise Linux 6, use:
Installing from Stage Properties
- While configuring a pipeline, select a stage that requires an external library in the pipeline canvas.
- In the stage properties panel, click the External Libraries tab:
- Click the Install External Libraries icon: .
- In the Install External Libraries dialog box, select the stage
library that needs to access the external library.
For example, to install a custom Java library for the Scala processor, select the Basic stage library.
- Browse to select the external library to install and click Open.
- To install the external library into the specified stage library, click
Upload.
Transformer installs the external library. All stages included in the specified stage library can use this external library. For example, if you installed a JDBC driver for the JDBC stage library, then every JDBC origin, lookup, and destination can also access the driver.
To use the external library with other stage libraries, you must install the library into the additional stage libraries. For example, if you want to use the same JDBC driver with the Scala processor, you must also install the driver as an external library for the Basic stage library.
- Restart Transformer in one of the following ways:
- If you started Transformer manually, click Restart Transformer in the Install External Libraries dialog box.
- If you started Transformer as a service, you must use the command line for restart. Click
Cancel in the Install External
Libraries dialog box, and then run the required command for
your operating system:
- For CentOS 6, Oracle Linux 6, or Red Hat Enterprise Linux 6, use:
service transformer restart
- For CentOS 7, Oracle Linux 7, or Red Hat Enterprise Linux 7, use:
systemctl restart transformer
- For CentOS 6, Oracle Linux 6, or Red Hat Enterprise Linux 6, use:
Installing Manually
To install external libraries manually, use the procedure required for your installation type.
Tarball and RPM Installations
- In the directory where Transformer installs external libraries, create subdirectories for each set of external
libraries based on the stage library name.
For example, if you set up an external directory to store the libraries at /opt/transformer-extras, then create the subdirectories as follows:
/opt/transformer-extras/<stage library name>/lib/
To install drivers for stages included with the JDBC stage library, create the following subdirectory:/opt/transformer-extras/streamsets-spark-jdbc-lib/lib/
To also install drivers for stages in the Basic stage library, create the following subdirectory:
/opt/transformer-extras/streamsets-spark-basic-lib/lib/
Note: You can use Package Manager to easily view the stages in each stage library. You can find stage library names in the$TRANSFORMER_DIST/streamsets-libs
directory. - Copy the external libraries to the appropriate subdirectories.
- Restart Transformer.
Managing External Libraries
When you run a pipeline that uses a stage library with related external libraries, Transformer uploads those libraries to the cluster as needed.
- EMR, EMR Serverless, Databricks, and Dataproc clusters
- Transformer automatically updates the staging directories for these clusters. You do not need to manually manage external libraries in these cluster staging directories.
- Other supported clusters
- For all other cluster types, you must manually manage external library updates for both Transformer and cluster staging directories.