User-Provided Stage Library Mode
For advanced use cases, you can configure a Data Collector or Transformer deployment to use the user-provided stage library mode where you provide the stage library files during the engine installation. For example, you might use the user-provided stage library mode if your organization requires that the stage library files be scanned for security purposes before they are installed on the engine machines.
Before you can use the user-provided stage library mode, you must complete the prerequisite tasks.
Prerequisites
Display the Stage Library Mode Property
By default, deployments use the managed stage library mode.
Before you can use the user-provided stage library mode, a user with the Organization Administrator role must modify the organization configuration properties to display the Stage Library Mode property for deployments.
- As an organization administrator, click in the Navigation panel.
- Click Advanced.
- Select the Show the Stage Library Mode UI field for CSP Deployments property.
-
Click Save to save changes made to advanced
properties.
It can take a few minutes for the change to take effect.
Download the Stage Libraries
Download the stage library files that you want to install on engines.
-
Enter the full URL to the stage library files located on
https://archives.streamsets.com
. Ensure that the stage library file version matches the engine version.The download URL depends on the following engine types:
- Data Collector
- To download all stage library files, enter the following
URL:
https://archives.streamsets.com/datacollector/<version>/tarball/streamsets-datacollector-all-<version>.tgz
For example, to download all stage library files for Data Collector 5.10.0, enter:
https://archives.streamsets.com/datacollector/5.10.0/tarball/streamsets-datacollector-all-5.10.0.tgz
Downloading all stage library files can take some time. Alternatively, you can individually download stage library files by entering the following URL:
https://archives.streamsets.com/datacollector/<version>/tarball/streamsets-datacollector-<stagelib_name>-lib-<version>.tgz
For example, to download the Amazon Web Services stage library for Data Collector 5.10.0, enter:
https://archives.streamsets.com/datacollector/5.10.0/tarball/streamsets-datacollector-aws-lib-5.10.0.tgz
You can locate the stage library name from the Data Collector documentation. Or, you can configure a deployment for the managed stage library mode, select the stage libraries from the Control Hub UI, and then view the Summary tab.
- Transformer
- To download all stage library files, enter the following
URL:
https://archives.streamsets.com/transformer/<version>/<scala_version>/tarball/streamsets-transformer-all_<scala_version>-<version>.tgz
For example, to download all stage library files for Transformer 5.7.0 using Scala 2.12, enter:
https://archives.streamsets.com/transformer/5.7.0/2.12/tarball/streamsets-transformer-all_2.12-5.7.0.tgz
Downloading all stage library files can take some time. Alternatively, you can individually download stage library files by entering the following URL:
https://archives.streamsets.com/transformer/<version>/<scala_version>/tarball/streamsets-spark-<stagelib_name>-lib_<scala_version>-<version>.tgz
For example, to download the JDBC stage library for Transformer 5.7.0 using Scala 2.12, enter:
https://archives.streamsets.com/transformer/5.7.0/2.12/tarball/streamsets-spark-jdbc-lib_2.12-5.7.0.tgz
To locate a stage library name, configure a deployment for the managed stage library mode, select the stage libraries from the Control Hub UI, and then view the Summary tab.
- Locate the downloaded TGZ file in your default downloads directory.
-
Extract the TGZ file and locate the stage library folder under the
streamsets-libs
folder.For example, if you downloaded the Amazon Web Services stage library for Data Collector 5.10.0, the extracted TGZ file contains the following folders:streamsets-datacollector-5.10.0 streamsets-libs streamsets-datacollector-aws-lib
The
streamsets-datacollector-aws-lib
folder includes the Amazon Web Services stage library files. - Copy the downloaded TGZ file or the extracted stage library folders to another location as needed.
Provide Files for Self-Managed Deployments
To provide stage library files for a self-managed deployment, in the Configure Engine step of the deployment wizard, select User-Provided for the Stage Library Mode property.
When you launch engines for the deployment, the streamsets-libs
directory in the engine installation contains a few default stage libraries. Copy the
downloaded stage library files into the directory, and then restart the engine.
For example, for a Data Collector
5.10.0 tarball, copy the downloaded stage library files into the
/streamsets-datacollector-5.10.0/streamsets-libs
directory and then
restart the engine.
For a Docker image installation, you can provide the files to the engine by editing the running container. For example, for a Data Collector 5.10.0 Docker image, you can start a Bash shell in the running Docker container, copy the downloaded stage library files into the /opt/streamsets-datacollector-5.10.0/streamsets-libs directory, and then restart the engine.
Alternatively, you can configure the Docker image to mount an external directory containing the downloaded stage library files, or you can create a custom Docker image derived from an IBM StreamSets engine image that includes the downloaded stage library files.
Provide Files for Cloud Service Provider Deployments
To provide stage library files for cloud service provider deployments, such as Amazon EC2, Azure VM, or GCE deployments, in the Configure Engine step of the deployment wizard, select User-Provided for the Stage Library Mode property.
Then in the Configure Autoscaling Group step of the deployment wizard, define the Init Script property to include commands that copy the downloaded stage library files into the streamsets-libs folder in the engine installation.
#!/bin/bash
wget -q https://<web_server>.com/streamsets-datacollector-aws-lib-5.10.0.tgz -P /tmp/
tar -zxf /tmp/streamsets-datacollector-aws-lib-5.10.0.tgz -C /opt/streamsets-datacollector/streamsets-libs/ --strip-components=2
When you start the deployment, the initialization script copies the files into the engine installation on each provisioned instance in your cloud account.
Provide Files for Kubernetes Deployments
To provide stage library files for a Kubernetes deployment, in the Configure Engine step of the deployment wizard, select User-Provided for the Stage Library Mode property.
Then in the Configure Kubernetes Deployment step of the deployment wizard, use advanced mode to directly edit the deployment YAML file such that the downloaded stage library files are copied into the streamsets-libs folder in the engine installation.
For example, you might create a static persistent volume in Kubernetes with the downloaded stage library directories and a persistent volume claim. For details on Kubernetes persistent volumes, see the Kubernetes documentation.
spec/template/spec/containers[0]
section:volumeMounts:
- mountPath: /opt/streamsets-datacollector-<version>/streamsets-libs
name: stagelibs
readOnly: true
subPath: streamsets-libs
spec/template/spec
section:volumes:
- name: stagelibs
persistentVolumeClaim:
claimName: stage-libs-claim
readOnly: true
When you start the deployment, the stage library files are mounted into the engine installation on each Kubernetes pod.
Alternatively, you can create a custom Docker image derived from an IBM StreamSets engine image that includes the downloaded stage library files, and then use advanced mode to configure the deployment YAML file to use the custom image.