Install External Libraries

Install external libraries to make them available to Data Collector stages.

You can install external libraries for the following stages:
  • Before you use the following stages, install JDBC drivers for the implementation that you want to use:
    • JDBC Multitable Consumer origin
    • JDBC Query Consumer origin
    • MySQL Binary Log origin
    • Oracle Bulkload origin
    • Oracle CDC origin
    • Oracle CDC Client origin
    • SAP HANA Query Consumer origin
    • Teradata Consumer origin
    • JDBC Lookup processor
    • JDBC Tee processor
    • SQL Parser processor, when using the database to resolve the schema
    • JDBC Producer destination
    • MemSQL Fast Loader destination
    • JDBC Query executor

    For example, to use the JDBC Query Consumer origin or the JDBC Producer destination with Oracle, install the Oracle JDBC drivers.

  • Before you use the Hadoop FS origin to read from non-HDFS systems, install all required file system application JAR files. See the file system documentation for details about the files to install.
  • Before you use the Spark Evaluator processor, install the Spark application JAR file and any dependencies other than the streamsets-datacollector-api, streamsets-datacollector-spark-api, and spark-core libraries.
  • You can install external Java libraries to call external Java code from the scripting processors: Groovy, Java, and Jython Evaluator.
  • You can call external Python modules from the Jython Evaluator processor.
  • You can install the DataStax Enterprise (DSE) Java driver to configure the Cassandra destination to use DSE username and password authentication or Kerberos authentication.
  • Before you use the Google Bigtable destination, install the BoringSSL library.
  • Before you use the JMS Consumer origin or the JMS Producer destination, install the JMS drivers for the implementation that you are using.
  • You can install the Impala JDBC driver for use with the Hive Query executor. For more information, see Installing the Impala Driver.

When installing an external library, you install it into the stage library that includes the stage. For example, to use an external Java library with the Groovy Evaluator processor, you install the Java library as an external library for the Groovy stage library, streamsets-datacollector-groovy_4_0-lib.

To use an external library with multiple stage libraries, install the external library into each stage library associated with the stages. For example, if you want to use a MySQL JDBC driver with the JDBC Lookup processor and with the MySQL Binary Log origin, you install the driver as an external library for the JDBC stage library, streamsets-datacollector-jdbc-lib, and for the MySQL Binary Log stage library, streamsets-datacollector-mysql-binlog-lib.

By default, external libraries are installed to the $SDC_EXTERNAL_RESOURCES/streamsets-libs-extras directory. StreamSets recommends configuring Data Collector to use an external directory to enable use of the libraries after Data Collector upgrades.

You can install external libraries any of the following ways:

Setting Up an External Directory

By default, Data Collector expects external libraries to be installed to the $SDC_EXTERNAL_RESOURCES/streamsets-libs-extras directory.

For a tarball or Cloudera Manager installation, you can use the default directory as you get started with Data Collector. However, StreamSets recommends configuring Data Collector to use an external directory to enable use of the libraries after Data Collector upgrades.

For an RPM installation, you must configure Data Collector to use an external directory before you can install external libraries from Package Manager or from the stage properties panel.

Use the required procedure for your installation type.

Setting Up for Tarball and RPM

Before you install external libraries for a tarball or RPM installation, set up an external directory to store the libraries.

  1. Create a local directory external to the Data Collector installation directory.
    For example, if you installed Data Collector in the following directory:
    /opt/sdc/
    you might create the external directory at:
    /opt/sdc-extras
  2. Grant the user who starts Data Collector ownership on the external directory.
    For example, if you use the default system user and group named sdc to run Data Collector as a service, use the following command to change the owner of the external directory and all files in the directory to sdc:sdc:
    chown -R sdc:sdc /opt/sdc-extras
  3. Add the STREAMSETS_LIBRARIES_EXTRA_DIR environment variable to the appropriate file and point it to the external directory.

    Modify environment variables using the method required by your installation type.

    Set the environment variable as follows:

    export STREAMSETS_LIBRARIES_EXTRA_DIR="<external directory>"

    For example:

    export STREAMSETS_LIBRARIES_EXTRA_DIR="/opt/sdc-extras/"
  4. When using the Java Security Manager, which is enabled by default, update the Data Collector security policy to include the external directory as follows:
    1. In the Data Collector configuration directory, open the security policy file, $SDC_CONF/sdc-security.policy.
    2. Add the following lines to the file:
      // user-defined external directory
      grant codebase "file://<external directory>-" {
        permission java.security.AllPermission;
      };
      For example:
      // user-defined external directory
      grant codebase "file:///opt/sdc-extras/-" {
        permission java.security.AllPermission;
      };
  5. Restart Data Collector.

Setting Up for Cloudera Manager

Before you install external libraries for a Cloudera Manager installation, set up an external directory to store the libraries.

  1. In Cloudera Manager, select the StreamSets service and then click Configuration.
  2. On the Configuration page, in the Data Collector Advanced Configuration Snippet (Safety Valve) for sdc-env.sh field, add the STREAMSETS_LIBRARIES_EXTRA_DIR environment variable and point it to the external directory, as follows:
    export STREAMSETS_LIBRARIES_EXTRA_DIR="<external directory>"

    For example:

    export STREAMSETS_LIBRARIES_EXTRA_DIR="/opt/sdc-extras/"
    By default, the path is /var/lib/sdc.
  3. Create the /opt/sdc-extras/ directory on every node that runs Data Collector.
  4. Grant the user who starts Data Collector ownership on the external directory added to every node.
    For example, if you use the default system user and group named sdc to run Data Collector as a service, use the following command to change the owner of the external directory and all files in the directory to sdc:sdc:
    chown -R sdc:sdc /opt/sdc-extras
  5. When using the Java Security Manager, which is enabled by default, update the Data Collector Advanced Configuration Snippet (Safety Valve) for sdc-security.policy property to include the external directory as follows:
    // user-defined external directory
    grant codebase "file://<external directory>-" {
      permission java.security.AllPermission;
    };
    For example:
    // user-defined external directory
    grant codebase "file:///opt/sdc-extras/-" {
      permission java.security.AllPermission;
    };
  6. Restart Data Collector.

Installing from Package Manager

You can use the Package Manager within Data Collector to install external libraries for all stage libraries.

Important: For an RPM installation, you must configure Data Collector to use an external directory before you can install external libraries from Package Manager.
  1. In Data Collector, in the top right toolbar, click the Package Manager icon:
  2. In the navigation panel, click External Libraries:
    Data Collector lists any currently installed external libraries.
  3. Immediately under the top right toolbar, click the Install External Libraries icon:
  4. In the Install External Libraries dialog box, select the stage library that needs to access the external library.
    For example, if you are installing a JDBC driver for the JDBC Multitable Consumer origin, select the JDBC stage library. If you are installing an external Java library for the Groovy Evaluator processor, select the Groovy stage library.
  5. Browse to select the external library to install and click Open.
  6. To install the external library to the specified stage library, click Upload.
    Data Collector installs the external library and displays a message offering to restart Data Collector.
  7. To install additional external libraries, click Cancel, then repeat steps 3 - 6 for every stage library that needs access to the external library.
    For example, say you want to use an external library with an origin, but you use two versions of the origin - each from a different stage library. To make the external library available to both origin versions, you must upload the external library to both stage libraries.
  8. After installing all of the external libraries that you want, restart Data Collector in one of the following ways:
    • If you started Data Collector manually from the command line, click Restart Data Collector in the Install External Libraries dialog box.
    • If you started Data Collector as a service, you must use the command line for restart. Click Cancel in the Install External Libraries dialog box, and then run the required command for your operating system:
      • For CentOS 6, Oracle Linux 6, Red Hat Enterprise Linux 6, or Ubuntu 14.04 LTS, use:
        service sdc restart
      • For CentOS 7, Oracle Linux 7, Red Hat Enterprise Linux 7, or Ubuntu 16.04 LTS, use:
        systemctl restart sdc

Installing from Stage Properties

When configuring a pipeline, you can use the stage properties panel to install external libraries for the stage library that includes the stage.
Important: For an RPM installation, you must configure Data Collector to use an external directory before you can install external libraries from the stage properties panel.
  1. While configuring a pipeline, select a stage that requires an external library in the pipeline canvas.
  2. In the stage properties panel, click the External Libraries tab:

  3. Click the Install External Libraries icon: .
  4. In the Install External Libraries dialog box, select the stage library that needs to access the external library.

    For example, to install a JDBC driver for the JDBC Multitable Consumer origin, select the JDBC stage library. To install an external Java library for the Groovy Evaluator processor, select the Groovy stage library.

  5. Browse to select the external library to install and click Open.
  6. To install the external library into the specified stage library, click Upload.

    Data Collector installs the external library. All stages included in the specified stage library can use this external library. For example, if you installed a JDBC driver for the JDBC stage library, then every stage included in the JDBC stage library can also access the driver.

    To use the external library with other stage libraries, you must install the library into the additional stage libraries. For example, if you want to use the same JDBC driver with the MySQL Binary Log origin, you must also install the driver as an external library for the MySQL Binary Log stage library.

  7. Restart Data Collector in one of the following ways:
    • If you started Data Collector manually from the command line, click Restart Data Collector in the Install External Libraries dialog box.
    • If you started Data Collector as a service, you must use the command line for restart. Click Cancel in the Install External Libraries dialog box, and then run the required command for your operating system:
      • For CentOS 6, Oracle Linux 6, Red Hat Enterprise Linux 6, or Ubuntu 14.04 LTS, use:
        service sdc restart
      • For CentOS 7, Oracle Linux 7, Red Hat Enterprise Linux 7, or Ubuntu 16.04 LTS, use:
        systemctl restart sdc

Install Manually

To manually install external libraries, use the required procedure for your installation type.

Installing Manually for Tarball and RPM

To manually install external libraries for a tarball or RPM installation, perform the following steps:

  1. In the directory where Data Collector installs external libraries, create subdirectories for each set of external libraries based on the stage library name.

    For example, if you set up an external directory to store the libraries at /opt/sdc-extras, then create the subdirectories as follows:

    /opt/sdc-extras/<stage library name>/lib/
    To install drivers for stages included with the JDBC stage library, create the following subdirectory:
    /opt/sdc-extras/streamsets-datacollector-jdbc-lib/lib/

    To also install drivers for stages included with the JMS stage library, create the following subdirectory:

    /opt/sdc-extras/streamsets-datacollector-jms-lib/lib/
    Note: If you use multiple stage libraries for a particular stage, and you want to use an external library with all stage libraries, you must install the external library for each stage library.

    For example, say you want to use an external library with an origin, but you use two versions of the origin - each from a different stage library. To make the external library available to both origin versions, you must upload the external library to both stage libraries.

    Tip: For a list of stage library names, see Available Stage Libraries.
  2. Copy the external libraries to the appropriate subdirectories.
  3. Restart Data Collector.

Installing Manually for Cloudera Manager

To manually install external libraries for an installation with Cloudera Manager, perform the following steps:

  1. On every node that runs Data Collector, create subdirectories in the directory where Data Collector installs external libraries.

    Create a subdirectory for each set of external libraries based on the stage library name. For example, if you set up an external directory to store the libraries at /opt/sdc-extras, then create the subdirectories as follows on every node:

    /opt/sdc-extras/<stage library name>/lib/
    To install drivers for JDBC, create the following subdirectory on every node:
    /opt/sdc-extras/streamsets-datacollector-jdbc-lib/lib/
    To also install drivers for JMS, create the following subdirectory on every node:
    /opt/sdc-extras/streamsets-datacollector-jms-lib/lib/
    Note: If you use multiple stage libraries for a particular stage, and you want to use an external library with all stage libraries, you must install the external library for each stage library.

    For example, say you want to use an external library with an origin, but you use two versions of the origin - each from a different stage library. To make the external library available to both origin versions, you must upload the external library to both stage libraries.

    Tip: For a list of stage library names, see Available Stage Libraries.
  2. Copy the external libraries to the appropriate subdirectories on every node.
  3. Restart Data Collector.