Install External Libraries
Install external libraries to make them available to Data Collector stages.
-
Before you use the following stages, install JDBC drivers for the implementation that you want to use:
- JDBC Multitable Consumer origin
- JDBC Query Consumer origin
- MySQL Binary Log origin
- Oracle Bulkload origin
- Oracle CDC origin
- Oracle CDC Client origin
- SAP HANA Query Consumer origin
- Teradata Consumer origin
- JDBC Lookup processor
- JDBC Tee processor
- SQL Parser processor, when using the database to resolve the schema
- JDBC Producer destination
- MemSQL Fast Loader destination
- JDBC Query executor
For example, to use the JDBC Query Consumer origin or the JDBC Producer destination with Oracle, install the Oracle JDBC drivers.
- Before you use the Hadoop FS origin to read from non-HDFS systems, install all required file system application JAR files. See the file system documentation for details about the files to install.
- Before you use the Spark Evaluator processor, install the Spark application JAR file and any dependencies other than the streamsets-datacollector-api, streamsets-datacollector-spark-api, and spark-core libraries.
- You can install external Java libraries to call external Java code from the scripting processors: Groovy, Java, and Jython Evaluator.
- You can call external Python modules from the Jython Evaluator processor.
- You can install the DataStax Enterprise (DSE) Java driver to configure the Cassandra destination to use DSE username and password authentication or Kerberos authentication.
- Before you use the Google Bigtable destination, install the BoringSSL library.
- Before you use the JMS Consumer origin or the JMS Producer destination, install the JMS drivers for the implementation that you are using.
- You can install the Impala JDBC driver for use with the Hive Query executor. For more information, see Installing the Impala Driver.
When installing an external library, you install it into the stage library that includes the
stage. For example, to use an external Java library with the Groovy Evaluator processor, you
install the Java library as an external library for the Groovy stage library, streamsets-datacollector-groovy_4_0-lib
.
To use an external library with multiple stage libraries, install the external library into
each stage library associated with the stages. For example, if you want to use a MySQL JDBC driver with the JDBC Lookup processor
and with the MySQL Binary Log origin, you install the driver as an external library
for the JDBC stage library, streamsets-datacollector-jdbc-lib
, and for the MySQL Binary Log stage library, streamsets-datacollector-mysql-binlog-lib
.
By default, external libraries are installed to the $SDC_EXTERNAL_RESOURCES/streamsets-libs-extras directory. StreamSets recommends configuring Data Collector to use an external directory to enable use of the libraries after Data Collector upgrades.
Setting Up an External Directory
By default, Data Collector expects external libraries to be installed to the $SDC_EXTERNAL_RESOURCES/streamsets-libs-extras directory.
For a tarball or Cloudera Manager installation, you can use the default directory as you get started with Data Collector. However, StreamSets recommends configuring Data Collector to use an external directory to enable use of the libraries after Data Collector upgrades.
For an RPM installation, you must configure Data Collector to use an external directory before you can install external libraries from Package Manager or from the stage properties panel.
Use the required procedure for your installation type.
Setting Up for Tarball and RPM
Before you install external libraries for a tarball or RPM installation, set up an external directory to store the libraries.
- Create a local directory external to the Data Collector installation
directory.For example, if you installed Data Collector in the following directory:
you might create the external directory at:/opt/sdc/
/opt/sdc-extras
- Grant the user who starts Data Collector ownership on the external directory.For example, if you use the default system user and group named
sdc
to run Data Collector as a service, use the following command to change the owner of the external directory and all files in the directory tosdc:sdc
:chown -R sdc:sdc /opt/sdc-extras
- Add the STREAMSETS_LIBRARIES_EXTRA_DIR environment variable
to the appropriate file and point it to the external directory.
Modify environment variables using the method required by your installation type.
Set the environment variable as follows:
export STREAMSETS_LIBRARIES_EXTRA_DIR="<external directory>"
For example:
export STREAMSETS_LIBRARIES_EXTRA_DIR="/opt/sdc-extras/"
- When using the Java Security Manager, which is enabled by default, update the Data Collector security policy
to include the external directory as follows:
- In the Data Collector
configuration directory, open the security policy file,
$SDC_CONF/sdc-security.policy
. - Add the following lines to the
file:
For example:// user-defined external directory grant codebase "file://<external directory>-" { permission java.security.AllPermission; };
// user-defined external directory grant codebase "file:///opt/sdc-extras/-" { permission java.security.AllPermission; };
- In the Data Collector
configuration directory, open the security policy file,
- Restart Data Collector.
Setting Up for Cloudera Manager
Before you install external libraries for a Cloudera Manager installation, set up an external directory to store the libraries.
- In Cloudera Manager, select the StreamSets service and then click Configuration.
- On the Configuration page, in the
Data Collector Advanced Configuration Snippet (Safety Valve) for
sdc-env.sh field, add the STREAMSETS_LIBRARIES_EXTRA_DIR environment
variable and point it to the external directory, as
follows:
export STREAMSETS_LIBRARIES_EXTRA_DIR="<external directory>"
For example:
By default, the path isexport STREAMSETS_LIBRARIES_EXTRA_DIR="/opt/sdc-extras/"
/var/lib/sdc
. - Create the
/opt/sdc-extras/
directory on every node that runs Data Collector. - Grant the user who starts Data Collector ownership on the external directory added to every node.For example, if you use the default system user and group named
sdc
to run Data Collector as a service, use the following command to change the owner of the external directory and all files in the directory tosdc:sdc
:chown -R sdc:sdc /opt/sdc-extras
- When using the Java Security Manager, which is enabled by default, update the
Data Collector Advanced Configuration Snippet (Safety Valve) for
sdc-security.policy property to include the external directory as follows:
// user-defined external directory grant codebase "file://<external directory>-" { permission java.security.AllPermission; };
For example:// user-defined external directory grant codebase "file:///opt/sdc-extras/-" { permission java.security.AllPermission; };
- Restart Data Collector.
Installing from Package Manager
You can use the Package Manager within Data Collector to install external libraries for all stage libraries.
- In Data Collector, in the top right toolbar, click the Package Manager icon:
-
In the navigation panel, click External Libraries:
Data Collector lists any currently installed external libraries.
- Immediately under the top right toolbar, click the Install External Libraries icon:
-
In the Install External Libraries dialog box, select the
stage library that needs to access the external library.
For example, if you are installing a JDBC driver for the JDBC Multitable Consumer origin, select the JDBC stage library. If you are installing an external Java library for the Groovy Evaluator processor, select the Groovy stage library.
- Browse to select the external library to install and click Open.
-
To install the external library to the specified stage
library, click Upload.
Data Collector installs the external library and displays a message offering to restart Data Collector.
-
To install additional external libraries, click Cancel,
then repeat steps 3 - 6 for every stage library that needs access to the external
library.
For example, say you want to use an external library with an origin, but you use two versions of the origin - each from a different stage library. To make the external library available to both origin versions, you must upload the external library to both stage libraries.
-
After installing all of the external libraries that you want, restart Data Collector in one of the following ways:
- If you started Data Collector manually from the command line, click Restart Data Collector in the Install External Libraries dialog box.
- If you started Data Collector as a service, you must use the command line for restart. Click
Cancel in the Install External
Libraries dialog box, and then run the required command for
your operating system:
- For CentOS 6, Oracle Linux 6, Red Hat Enterprise Linux 6, or Ubuntu
14.04 LTS, use:
service sdc restart
- For CentOS 7, Oracle Linux 7, Red Hat Enterprise Linux 7, or Ubuntu
16.04 LTS, use:
systemctl restart sdc
- For CentOS 6, Oracle Linux 6, Red Hat Enterprise Linux 6, or Ubuntu
14.04 LTS, use:
Installing from Stage Properties
- While configuring a pipeline, select a stage that requires an external library in the pipeline canvas.
- In the stage properties panel, click the External Libraries tab:
- Click the Install External Libraries icon: .
- In the Install External Libraries dialog box, select the stage
library that needs to access the external library.
For example, to install a JDBC driver for the JDBC Multitable Consumer origin, select the JDBC stage library. To install an external Java library for the Groovy Evaluator processor, select the Groovy stage library.
- Browse to select the external library to install and click Open.
- To install the external library into the specified stage library, click
Upload.
Data Collector installs the external library. All stages included in the specified stage library can use this external library. For example, if you installed a JDBC driver for the JDBC stage library, then every stage included in the JDBC stage library can also access the driver.
To use the external library with other stage libraries, you must install the library into the additional stage libraries. For example, if you want to use the same JDBC driver with the MySQL Binary Log origin, you must also install the driver as an external library for the MySQL Binary Log stage library.
- Restart Data Collector
in one of the following ways:
- If you started Data Collector manually from the command line, click Restart Data Collector in the Install External Libraries dialog box.
- If you started Data Collector as a service, you must use the command line for restart. Click
Cancel in the Install External
Libraries dialog box, and then run the required command for
your operating system:
- For CentOS 6, Oracle Linux 6, Red Hat Enterprise Linux 6, or Ubuntu
14.04 LTS, use:
service sdc restart
- For CentOS 7, Oracle Linux 7, Red Hat Enterprise Linux 7, or Ubuntu
16.04 LTS, use:
systemctl restart sdc
- For CentOS 6, Oracle Linux 6, Red Hat Enterprise Linux 6, or Ubuntu
14.04 LTS, use:
Install Manually
To manually install external libraries, use the required procedure for your installation type.
Installing Manually for Tarball and RPM
To manually install external libraries for a tarball or RPM installation, perform the following steps:
- In the directory where Data Collector
installs external libraries, create subdirectories for each set of external
libraries based on the stage library name.
For example, if you set up an external directory to store the libraries at
/opt/sdc-extras
, then create the subdirectories as follows:/opt/sdc-extras/<stage library name>/lib/
To install drivers for stages included with the JDBC stage library, create the following subdirectory:/opt/sdc-extras/streamsets-datacollector-jdbc-lib/lib/
To also install drivers for stages included with the JMS stage library, create the following subdirectory:
/opt/sdc-extras/streamsets-datacollector-jms-lib/lib/
Note: If you use multiple stage libraries for a particular stage, and you want to use an external library with all stage libraries, you must install the external library for each stage library.For example, say you want to use an external library with an origin, but you use two versions of the origin - each from a different stage library. To make the external library available to both origin versions, you must upload the external library to both stage libraries.
Tip: For a list of stage library names, see Available Stage Libraries. - Copy the external libraries to the appropriate subdirectories.
- Restart Data Collector.
Installing Manually for Cloudera Manager
To manually install external libraries for an installation with Cloudera Manager, perform the following steps:
- On every node that runs Data Collector, create subdirectories in the directory where
Data Collector
installs external libraries.
Create a subdirectory for each set of external libraries based on the stage library name. For example, if you set up an external directory to store the libraries at
/opt/sdc-extras
, then create the subdirectories as follows on every node:/opt/sdc-extras/<stage library name>/lib/
To install drivers for JDBC, create the following subdirectory on every node:/opt/sdc-extras/streamsets-datacollector-jdbc-lib/lib/
To also install drivers for JMS, create the following subdirectory on every node:/opt/sdc-extras/streamsets-datacollector-jms-lib/lib/
Note: If you use multiple stage libraries for a particular stage, and you want to use an external library with all stage libraries, you must install the external library for each stage library.For example, say you want to use an external library with an origin, but you use two versions of the origin - each from a different stage library. To make the external library available to both origin versions, you must upload the external library to both stage libraries.
Tip: For a list of stage library names, see Available Stage Libraries. - Copy the external libraries to the appropriate subdirectories on every node.
- Restart Data Collector.