External Resources
StreamSets engines can require access to external files and libraries, depending on how you design pipelines.
For example, JDBC stages require a JDBC driver to access the database. When you use a JDBC stage, you must make the driver available as an external resource.
- None
- Use no configured source when using a single engine instance to get started with StreamSets, or when your pipelines do not require external resources.
- Archive file
- Use an archive file that includes the external resources when the deployment launches multiple engine instances and when your pipelines require external resources.
External Resource Types
External Resource Type | Description |
---|---|
Runtime resource files | Files that define pipeline property values that are called from
within a pipeline. For more information, see:
|
External libraries | External libraries required by pipeline stages. External
libraries can include JDBC or JMS drivers or external Java
libraries. For more information, see:
|
Custom stage libraries | Stage libraries for custom stages. For example, you might develop
a custom processor to perform custom processing in a pipeline.
For more information, see Custom Stage Libraries
in the Data Collector engine documentation.
Important: To use custom
stage libraries, you must configure the deployment to use an
external resource archive.
|
No Source
Configure the deployment to use no source for external resources when using a single engine instance to get started with StreamSets, or when your pipelines do not require external resources.
When using no configured source for the deployment, you can upload external resources to the engine instance.
- Runtime resource files - Use the engine details page.
- External libraries - Use the engine details page or the pipeline canvas.
Uploading Resources from the Engine Details
When using no configured source for the deployment, you can upload runtime resource files and external libraries from the engine details page.
Uploading Resources from the Pipeline Canvas
When using no configured source for the deployment, you can upload external libraries, such as JDBC drivers, from the pipeline canvas.
Archive File as the Source
Configure the deployment to use an external resource archive when a deployment launches multiple engine instances and when your pipelines require external resources.
You typically configure a deployment to use an external resource archive when you are ready to move to production, after you have finished building your pipelines and have finalized the list of external resources that your pipelines require.
You generate an archive file in the TGZ or ZIP format, using the required folder names and directory structure. You store the file in a location that is accessible to all machines running an engine instance for the deployment. Then, you edit the deployment to define the location of the archive file.
After you configure the external resource archive and restart all engine instances in the deployment, the archive file contents are extracted and copied into each engine instance.
When your pipelines require additional external resources, you extract the archive file, add the additional resources, and then compress the archive file again.
Archive Structure
An external resource archive file must use the required folder names and directory structure.
- resources
- The resources directory must include text files created for runtime resources.
- streamsets-libs-extras
- The streamsets-libs-extras directory must include a
subdirectory for each set of required external libraries based on the stage
library name, as follows:
<stage library name>/lib/
- user-libs
- The user-libs directory must include a subdirectory for each custom stage.
If your pipelines do not use one of the external resource types, you can omit that directory. For example, if you have not developed custom stage libraries, you do not need to include the user-libs directory.
Sample
Let's look at the contents of a sample external resource archive file created for Data Collector.
This sample archive file includes a runtime resource file named JDBC.txt, the MySQL JDBC driver for stages included in the JDBC common stage library, and the Oracle JDBC driver for the Oracle Bulkload origin included in the Oracle Enterprise stage library. It does not include any custom stage libraries:
externalResources
resources
JDBC.txt
streamsets-libs-extras
streamsets-datacollector-jdbc-lib
lib
mysql-connector-java-8.0.12.jar
streamsets-datacollector-oracle-lib
lib
ojdbc8-19.3.0.0.jar
Setting Up an Archive
Set up an external resource archive after you have finalized the list of external resources that your pipelines require.
Updating an Archive
When a deployment uses an external resource archive and your pipelines require additional resources, you manually update the archive file to include new external resources and then restart all engine instances in the deployment.