Install Additional Stage Libraries

Install additional stage libraries to use stages that are not included in the core or common installation of Data Collector. This is an optional step, but core installations typically require installing additional stage libraries.

For a complete list of the stages installed with each stage library, see Available Stage Libraries.

Important: You must perform additional steps to install the MapR stage libraries, as described in MapR Prerequisites.

You can install additional RPM stage libraries using the Data Collector command line program.

You can install additional tarball stage libraries using the Package Manager within Data Collector, using the stage library panel in the pipeline canvas, or using the Data Collector command line program.

An installation with Cloudera Manager is a full installation that includes all available stage libraries. As a result, you cannot install or uninstall additional stage libraries in a Cloudera Manager installation.

When necessary, you can also install legacy stage libraries for all installation types.

Installing for RPM

Use the following commands to install additional stage libraries for a core RPM installation:

To install one or more stage libraries:
Use the following command to install the stage libraries downloaded to the current directory:
yum localinstall <libraryID>-<version>-1.noarch.rpm <libraryID>-<version>-1.noarch.rpm ...
Use the full name of the library package that you want to install, separating them with commas. Do not include spaces in the command.
For example, to install the Amazon S3 origin and destination, as well as the Kudu destination for Data Collector version 5.10.0, use the following command:
yum localinstall streamsets-datacollector-aws-lib-5.10.0-1.noarch.rpm streamsets-datacollector-apache-kudu_1_0-lib-5.10.0-1.noarch.rpm 
To list the stage libraries installed on the current Data Collector:
Use the following command:
yum list installed | grep streamsets
To uninstall libraries when necessary:
Use the following command:
yum remove <libraryID> <libraryID> ...
Use the full name of the libraries that you want to uninstall, separating them with commas. Do not include spaces in the command.
For example, to uninstall the Amazon S3 origin and destination, use the following command:
yum remove streamsets-datacollector-aws-lib

Installing for Tarball Using Package Manager

You can use Package Manager within Data Collector to install additional stage libraries for a core or common tarball installation.

Complete one of the following steps to display Package Manager:

  • Click the Package Manager icon .

  • Click Add/Remove Stages in the Stage Library panel when viewing a pipeline in the pipeline canvas.

Package Manager lists all available stage libraries and the stages within each stage library. Origins display in blue, processors in orange, destinations in light green, and executors in dark green. Installed stage libraries display a check mark in the Installed column. You can filter the stage libraries by type or you can search for a stage library in the list.

To install an additional stage library, click the More icon for the library, and then click Install. Or to install multiple stage libraries, select the libraries in the list and then click the Install icon . Confirm that you want to install the libraries, and then restart Data Collector for the changes to take effect.

To uninstall a stage library, click the More icon for the library, and then click Uninstall. Or to uninstall multiple stage libraries, select the libraries in the list and then click the Uninstall icon . Confirm that you want to uninstall the libraries, and then restart Data Collector for the changes to take effect.
Note: If Data Collector does not have internet connectivity, you can view the installed stage libraries and can uninstall a stage library. However, you cannot view all stage libraries or install an additional stage library.

For information about the stages installed with each stage library, see Available Stage Libraries.

Installing for Tarball Using the Stage Library Panel

You can use the stage library panel in the pipeline canvas to install additional stage libraries for a core or common tarball installation.

By default, the stage library panel in the pipeline canvas displays all Data Collector stages, instead of only the installed stages. Stages that are not installed appear disabled, or greyed out. For example, the stage library panel shown below indicates that the Azure origins are not installed:

To install an additional stage library, click on a disabled stage. Confirm that you want to install the library, and then restart Data Collector for the changes to take effect.

Note: If Data Collector does not have internet connectivity, you cannot view all stage libraries or install an additional stage library from the stage library panel.

When needed, you can configure Data Collector to hide the stages that are not installed in the stage library panel, as described in Configuring the Display.

For information about the stages installed with each stage library, see Available Stage Libraries.

Installing for Tarball Using the Command Line

You can use the stagelibs command to install additional stage libraries for a core or common tarball installation.

The stagelibs command requires that curl version 7.18.1 or later and sha1sum utilities are installed on the machine. Verify that these utilities are installed before running the command.

Use the following commands to install additional tarball libraries:
To view the list of available libraries:
Run the following command from the $SDC_DIST directory:
bin/streamsets stagelibs -list
This provides a list of all available stage libraries and whether they are already installed. For more information about the stages installed with each stage library, see Available Stage Libraries.
To install one or more stage libraries:
Run the following command from the $SDC_DIST directory:
bin/streamsets stagelibs -install=<libraryID>,<libraryID>,...
Use the full name of the libraries that you want to install, separating them with commas. Do not include spaces in the command.
For example, to install the Amazon S3 origin and destination, as well as the Cassandra destination, use the following command:
bin/streamsets stagelibs -install\
=streamsets-datacollector-aws-lib,streamsets-datacollector-cassandra_2-lib
When successful, the command line indicates that the stage libraries have been installed as follows:
Downloading: https://archives.streamsets.com/datacollector/<version>/tarball\
/streamsets-datacollector-aws-lib-<version>-SNAPSHOT.tgz
######################################################################## 100.0%
Downloading: https://archives.streamsets.com/datacollector/<version>/tarball\
streamsets-datacollector-jdbc-lib-<version>-SNAPSHOT.tgz
######################################################################## 100.0%
Downloading: https://archives.streamsets.com/datacollector/<version>/tarball\
streamsets-datacollector-rabbitmq-lib-<version>-SNAPSHOT.tgz
######################################################################## 100.0%

Stage library streamsets-datacollector-aws-lib installed
Stage library streamsets-datacollector-jdbc-lib installed
Stage library streamsets-datacollector-rabbitmq-lib installed
To generate the command required to perform the current installation (optional):
You can use the stagelibs command to generate the command to install the libraries that are installed on the current Data Collector. This allows you to easily replicate the installation elsewhere.
For example, say you installed three libraries above, and then installed another two. You can generate the command required to install all five libraries on additional machines.
To generate an installation script based on the current Data Collector installation, run the following command from the $SDC_DIST directory:
bin/streamsets stagelibs -installScript
The command returns an install command, such as the following:
=================================================================================
streamsets stagelibs -install=streamsets-datacollector-apache-kafka_0_8_1-lib,\
streamsets-datacollector-aws-lib,streamsets-datacollector-basic-lib,\
streamsets-datacollector-cdh_kafka_1_3-lib,streamsets-datacollector-jdbc-lib,\
streamsets-datacollector-jython_2_7-lib,streamsets-datacollector-rabbitmq-lib
=================================================================================
To uninstall libraries when necessary:
To uninstall a library, run the following command from the $SDC_DIST directory:
bin/streamsets stagelibs -uninstall=<libraryID>,<libraryID>,...
Use the full name of the libraries that you want to uninstall, separating them with commas. Do not include spaces in the command.

Available Stage Libraries

A full Data Collector installation includes all of the following stage libraries. A core installation includes only some of the following stage libraries and typically requires you to install additional stage libraries. A common installation includes commonly-used stage libraries.

You can install additional stage libraries into either a core or common installation.

The following table describes the stages installed with each stage library:
Stage Library Name Included Stages
streamsets-datacollector-aerospike-lib For Aerospike version 3.15.x.

Includes the Aerospike destination.

streamsets-datacollector-aerospike-client-lib For Aerospike version 6.x.

Includes the Aerospike Client destination.

streamsets-datacollector-apache-kafka_1_0-lib For Kafka version 1.0.x.
Includes:
  • Kafka Consumer origin
  • Kafka Multitopic Consumer origin
  • Kafka Producer destination
streamsets-datacollector-apache-kafka_1_1-lib For Kafka version 1.1.x.
Includes:
  • Kafka Consumer origin
  • Kafka Multitopic Consumer origin
  • Kafka Producer destination
streamsets-datacollector-apache-kafka_2_0-lib For Kafka version 2.0.x.
Includes:
  • Kafka Consumer origin
  • Kafka Multitopic Consumer origin
  • Kafka Producer destination
streamsets-datacollector-apache-kafka_2_1-lib For Kafka version 2.1.x.
Includes:
  • Kafka Consumer origin
  • Kafka Multitopic Consumer origin
  • Kafka Producer destination
streamsets-datacollector-apache-kafka_2_2-lib For Kafka version 2.2.x.
Includes:
  • Kafka Consumer origin
  • Kafka Multitopic Consumer origin
  • Kafka Producer destination
streamsets-datacollector-apache-kafka_2_3-lib For Kafka version 2.3.x.
Includes:
  • Kafka Consumer origin
  • Kafka Multitopic Consumer origin
  • Kafka Producer destination
streamsets-datacollector-apache-kafka_2_4-lib For Kafka version 2.4.x.
Includes:
  • Kafka Consumer origin
  • Kafka Multitopic Consumer origin
  • Kafka Producer destination
streamsets-datacollector-apache-kafka_2_5-lib For Kafka version 2.5.x.
Includes:
  • Kafka Consumer origin
  • Kafka Multitopic Consumer origin
  • Kafka Producer destination
streamsets-datacollector-apache-kafka_2_6-lib For Kafka version 2.6.x.
Includes:
  • Kafka Consumer origin
  • Kafka Multitopic Consumer origin
  • Kafka Producer destination
streamsets-datacollector-apache-kafka_2_7-lib For Kafka version 2.7.x.
Includes:
  • Kafka Consumer origin
  • Kafka Multitopic Consumer origin
  • Kafka Producer destination
streamsets-datacollector-apache-kafka_2_8-lib For Kafka version 2.8.x.
Includes:
  • Kafka Consumer origin
  • Kafka Multitopic Consumer origin
  • Kafka Producer destination
streamsets-datacollector-apache-kafka_3_0-lib For Kafka version 3.0.x.
Includes:
  • Kafka Consumer origin
  • Kafka Multitopic Consumer origin
  • Kafka Producer destination
streamsets-datacollector-apache-kafka_3_1-lib For Kafka version 3.1.x.
Includes:
  • Kafka Consumer origin
  • Kafka Multitopic Consumer origin
  • Kafka Producer destination
streamsets-datacollector-apache-kafka_3_2-lib For Kafka version 3.2.x.
Includes:
  • Kafka Consumer origin
  • Kafka Multitopic Consumer origin
  • Kafka Producer destination
streamsets-datacollector-apache-kafka_3_3-lib For Kafka version 3.3.x.
Includes:
  • Kafka Consumer origin
  • Kafka Multitopic Consumer origin
  • Kafka Producer destination
streamsets-datacollector-apache-kafka_3_4-lib For Kafka version 3.4.x.
Includes:
  • Kafka Consumer origin
  • Kafka Multitopic Consumer origin
  • Kafka Producer destination
streamsets-datacollector-apache-kafka_3_5-lib For Kafka version 3.5.x.
Includes:
  • Kafka Consumer origin
  • Kafka Multitopic Consumer origin
  • Kafka Producer destination
streamsets-datacollector-apache-kafka_3_6-lib For Kafka version 3.6.x.
Includes:
  • Kafka Consumer origin
  • Kafka Multitopic Consumer origin
  • Kafka Producer destination
streamsets-datacollector-apache-kudu_1_3-lib For Kudu version 1.3.x.

Includes the Kudu Lookup processor and Kudu destination.

streamsets-datacollector-apache-kudu_1_4-lib For Kudu version 1.4.x.

Includes the Kudu Lookup processor and Kudu destination.

streamsets-datacollector-apache-kudu_1_5-lib For Kudu version 1.5.x.

Includes the Kudu Lookup processor and Kudu destination.

streamsets-datacollector-apache-kudu_1_6-lib For Kudu version 1.6.x.

Includes the Kudu Lookup processor and Kudu destination.

streamsets-datacollector-apache-kudu_1_7-lib For Kudu version 1.7.x.

Includes the Kudu Lookup processor and Kudu destination.

streamsets-datacollector-apache-pulsar_2-lib For Apache Pulsar version 2.x.
Includes:
  • Pulsar Consumer origin
  • Pulsar Consumer (Legacy) origin
  • Pulsar Producer destination
streamsets-datacollector-apache-solr_6_1_0-lib For Apache Solr version 6.1.

Includes the Solr destination.

streamsets-datacollector-aws-lib For Amazon Web Services 1.11.x.
Includes:
  • Amazon S3 origin
  • Amazon SQS Consumer origin
  • Amazon S3 destination
  • Amazon S3 executor
streamsets-datacollector-aws-secrets-manager-credentialstore-lib For the AWS Secrets Manager credential store.
streamsets-datacollector-azure-keyvault-credentialstore-lib For the Microsoft Azure Key Vault credential store.
streamsets-datacollector-azure-lib For Microsoft Azure.
Includes:
  • Azure Blob Storage origin
  • Azure Data Lake Storage Gen1 origin
  • Azure Data Lake Storage Gen2 origin
  • Azure Data Lake Storage Gen2 (Legacy) origin
  • Azure IoT/Event Hub Consumer origin
  • Azure Blob Storage destination
  • Azure Data Lake Storage (Legacy) destination
  • Azure Data Lake Storage Gen1 destination
  • Azure Data Lake Storage Gen2 destination
  • Azure Event Hub Producer destination
  • Azure IoT Hub Producer destination
  • Azure Synapse SQL destination
  • ADLS Gen1 File Metadata executor
  • ADLS Gen2 File Metadata executor
streamsets-datacollector-basic-lib
Includes the following origins:
  • CoAP Server
  • Directory
  • File Tail
  • gRPC Client
  • HTTP Client
  • HTTP Server
  • JavaScript Scripting
  • MQTT Subscriber
  • NiFi HTTP Server
  • OPC UA Client
  • REST Service
  • SDC RPC
  • SFTP/FTP/FTPS Client
  • System Metrics
  • TCP Server
  • UDP Multithreaded Source
  • UDP Source
  • WebSocket Client
  • WebSocket Server
Includes the following processors:
  • Base64 Field Decoder
  • Base64 Field Encoder
  • Data Generator
  • Data Parser
  • Delay
  • Expression Evaluator
  • Field Flattener
  • Field Hasher
  • Field Mapper
  • Field Masker
  • Field Merger
  • Field Order
  • Field Pivoter
  • Field Remover
  • Field Renamer
  • Field Replacer
  • Field Splitter
  • Field Type Converter
  • Field Zip
  • Geo IP
  • HTTP Client
  • HTTP Router
  • JavaScript Evaluator
  • JSON Generator
  • JSON Parser
  • Log Parser
  • Record Deduplicator
  • Schema Generator
  • Static Lookup
  • Stream Selector
  • Value Replacer
  • Windowing Aggregator
  • XML Flattener
  • XML Parser
Includes the following destinations:
  • CoAP Client
  • HTTP Client
  • Local FS
  • MQTT Publisher
  • Named Pipe
  • SDC RPC
  • Send Response to Origin
  • SFTP/FTP/FTPS Client
  • Splunk
  • Syslog
  • To Error
  • Trash
  • WebSocket Client
Includes the following executors:
  • Databricks Job Launcher
  • Email
  • Pipeline Finisher
  • Shell
streamsets-datacollector-bigtable-lib For Google Cloud Bigtable.

Includes the Google Bigtable destination.

streamsets-datacollector-cassandra_3-lib For Cassandra 1.2, 2.x, and 3.x.

Includes the Cassandra destination.

streamsets-datacollector-cdh_5_14-lib For the Cloudera CDH version 5.14.x distribution of Apache Hadoop.
Includes:
  • Hadoop FS origin for cluster mode pipelines
  • Hadoop FS Standalone origin
  • HBase Lookup processor
  • Hive Metadata processor
  • Kudu Lookup processor
  • Spark Evaluator processor
  • Flume destination
  • Hadoop FS destination
  • HBase destination
  • Hive Metastore destination
  • Hive Streaming destination
  • Kudu destination
  • Solr destination
  • HDFS File Metadata executor
  • Hive Query executor
  • MapReduce executor
  • Spark executor
streamsets-datacollector-cdh_5_15-lib For the Cloudera CDH version 5.15.x distribution of Apache Hadoop.
Includes:
  • Hadoop FS origin for cluster mode pipelines
  • Hadoop FS Standalone origin
  • HBase Lookup processor
  • Hive Metadata processor
  • Kudu Lookup processor
  • Spark Evaluator processor
  • Flume destination
  • Hadoop FS destination
  • HBase destination
  • Hive Metastore destination
  • Hive Streaming destination
  • Kudu destination
  • Solr destination
  • HDFS File Metadata executor
  • Hive Query executor
  • MapReduce executor
  • Spark executor
streamsets-datacollector-cdh_5_16-lib For the Cloudera CDH version 5.16.x distribution of Apache Hadoop.
Includes:
  • Hadoop FS origin for cluster mode pipelines
  • Hadoop FS Standalone origin
  • HBase Lookup processor
  • Hive Metadata processor
  • Kudu Lookup processor
  • Spark Evaluator processor
  • Flume destination
  • Hadoop FS destination
  • HBase destination
  • Hive Metastore destination
  • Hive Streaming destination
  • Kudu destination
  • Solr destination
  • HDFS File Metadata executor
  • Hive Query executor
  • MapReduce executor
  • Spark executor
streamsets-datacollector-cdh_6_0-lib For the Cloudera CDH version 6.0.x distribution of Apache Hadoop.
Includes:
  • Hadoop FS origin for cluster mode pipelines
  • Hadoop FS Standalone origin
  • Kafka Consumer origin
  • Kafka Multitopic Consumer origin
  • HBase Lookup processor
  • Hive Metadata processor
  • Kudu Lookup processor
  • Spark Evaluator processor
  • Flume destination
  • Hadoop FS destination
  • HBase destination
  • Hive Metastore destination
  • Hive Streaming destination
  • Kafka Producer destination
  • Kudu destination
  • Solr destination
  • HDFS File Metadata executor
  • Hive Query executor
  • MapReduce executor
  • Spark executor
streamsets-datacollector-cdh_6_1-lib For the Cloudera CDH version 6.1.x distribution of Apache Hadoop.
Includes:
  • Hadoop FS origin for cluster mode pipelines
  • Hadoop FS Standalone origin
  • Kafka Consumer origin
  • Kafka Multitopic Consumer origin
  • HBase Lookup processor
  • Hive Metadata processor
  • Kudu Lookup processor
  • Spark Evaluator processor
  • Flume destination
  • Hadoop FS destination
  • HBase destination
  • Hive Metastore destination
  • Hive Streaming destination
  • Kafka Producer destination
  • Kudu destination
  • Solr destination
  • HDFS File Metadata executor
  • Hive Query executor
  • MapReduce executor
  • Spark executor
streamsets-datacollector-cdh_6_2-lib For the Cloudera CDH version 6.2.x distribution of Apache Hadoop.
Includes:
  • Hadoop FS origin for cluster mode pipelines
  • Hadoop FS Standalone origin
  • Kafka Consumer origin
  • Kafka Multitopic Consumer origin
  • HBase Lookup processor
  • Hive Metadata processor
  • Kudu Lookup processor
  • Spark Evaluator processor
  • Flume destination
  • Hadoop FS destination
  • HBase destination
  • Hive Metastore destination
  • Hive Streaming destination
  • Kafka Producer destination
  • Kudu destination
  • Solr destination
  • HDFS File Metadata executor
  • Hive Query executor
  • MapReduce executor
  • Spark executor
streamsets-datacollector-cdh_6_3-lib For the Cloudera CDH version 6.3.x distribution of Apache Hadoop.
Includes:
  • Hadoop FS origin for cluster mode pipelines
  • Hadoop FS Standalone origin
  • Kafka Consumer origin
  • Kafka Multitopic Consumer origin
  • HBase Lookup processor
  • Hive Metadata processor
  • Kudu Lookup processor
  • Spark Evaluator processor
  • Flume destination
  • Hadoop FS destination
  • HBase destination
  • Hive Metastore destination
  • Hive Streaming destination
  • Kafka Producer destination
  • Kudu destination
  • Solr destination
  • HDFS File Metadata executor
  • Hive Query executor
  • MapReduce executor
  • Spark executor
streamsets-datacollector-cdh_kafka_3_1-lib For the Cloudera distribution of Apache Kafka - CDK 3.1.0 (based on Apache Kafka version 1.0.1).
Includes:
  • Kafka Consumer origin
  • Kafka Multitopic Consumer origin
  • Kafka Producer destination
streamsets-datacollector-cdh_kafka_4_1-lib For the Cloudera distribution of Apache Kafka - CDK 4.1.0 (based on Apache Kafka version 2.2.1).
Includes:
  • Kafka Consumer origin
  • Kafka Multitopic Consumer origin
  • Kafka Producer destination
streamsets-datacollector-cdh_spark_2_1_r1-lib For the Cloudera distribution of Spark 2.1 release 1.
Includes:
  • Spark Evaluator processor
  • Spark executor
streamsets-datacollector-cdh_spark_2_2-lib For the Cloudera CDH cluster Kafka with CDS powered by Spark 2.2 release 1.

Includes the Kafka Consumer origin for cluster mode pipelines.

streamsets-datacollector-cdh_spark_2_3-lib For the Cloudera CDH cluster Kafka with CDS powered by Spark 2.3 release 2.

Includes the Kafka Consumer origin for cluster mode pipelines.

streamsets-datacollector-cdh_spark_2_3_r3-lib For the Cloudera CDH cluster Kafka with CDS powered by Spark 2.3 release 3.

Includes the Kafka Consumer origin for cluster mode pipelines.

streamsets-datacollector-cdh_spark_2_3_r4-lib For the Cloudera CDH cluster Kafka with CDS powered by Spark 2.3 release 4.

Includes the Kafka Consumer origin for cluster mode pipelines.

streamsets-datacollector-cdp_7_1-lib For Cloudera CDP 7.1.1 through 7.1.7.
Includes:
  • Hadoop FS origin for cluster mode pipelines
  • Hadoop FS Standalone origin
  • Kafka Consumer origin
  • Kafka Multitopic Consumer origin
  • HBase Lookup processor
  • Hive Metadata processor
  • Kudu Lookup processor
  • Spark Evaluator processor
  • Hadoop FS destination
  • HBase destination
  • Hive Metastore destination
  • Hive Streaming destination
  • Kafka Producer destination
  • Kudu destination
  • Solr destination
  • HDFS File Metadata executor
  • Hive Query executor
  • MapReduce executor
  • Spark executor
streamsets-datacollector-cdp_7_1_8-lib For Cloudera CDP 7.1.8.
Includes:
  • Hadoop FS origin for cluster mode pipelines
  • Hadoop FS Standalone origin
  • Kafka Consumer origin
  • Kafka Multitopic Consumer origin
  • HBase Lookup processor
  • Hive Metadata processor
  • Kudu Lookup processor
  • Spark Evaluator processor
  • Hadoop FS destination
  • HBase destination
  • Hive Metastore destination
  • Hive Streaming destination
  • Kafka Producer destination
  • Kudu destination
  • Solr destination
  • HDFS File Metadata executor
  • Hive Query executor
  • MapReduce executor
  • Spark executor
streamsets-datacollector-couchbase_2-lib For Couchcbase SDK 2.x.
Includes:
  • Couchbase Lookup processor
  • Couchbase destination
streamsets-datacollector-couchbase_3-lib For Couchcbase SDK 3.x.
Includes:
  • Couchbase origin
  • Couchbase destination
streamsets-datacollector-crypto-lib For cryptography stages.

Includes the Encrypt and Decrypt Fields processor.

streamsets-datacollector-cyberark-credentialstore-lib For the CyberArk credential store.
streamsets-datacollector-databricks-ml_2-lib For Databricks ML.

Includes the Databricks ML Evaluator processor.

streamsets-datacollector-dataformats-lib

Contains parsers and generators for the data formats supported by Data Collector.

streamsets-datacollector-dev-lib For developing and testing pipelines.
Includes:
  • Dev Data Generator origin
  • Dev Random Record origin
  • Dev Raw Data Source origin
  • Dev SDC RPC with Buffering origin
  • Dev Snapshot origin
  • Sensor Reader origin
  • Dev Identity processor
  • Dev Random Error processor
  • Dev Record Creator processor
  • To Event destination
Note: Do not use these stages in production pipelines.
streamsets-datacollector-elasticsearch_5-lib For Elasticsearch 1.x, 2.x, and 5.x.

Includes the Elasticsearch origin and destination.

streamsets-datacollector-elasticsearch_6-lib For Elasticsearch 6.x.

Includes the Elasticsearch origin and destination.

streamsets-datacollector-elasticsearch_7-lib For Elasticsearch 7.x.

Includes the Elasticsearch origin and destination.

streamsets-datacollector-elasticsearch_8-lib For Elasticsearch 8.x.

Includes the Elasticsearch origin and destination.

streamsets-datacollector-emr_hadoop_2_8_3-lib For Amazon EMR 5.14.x with Hadoop 2.8.3.

Includes the Hadoop FS origin for cluster mode pipelines.

streamsets-datacollector-google-cloud-lib For Google Cloud.
Includes:
  • Google BigQuery origin
  • Google Cloud Storage origin
  • Google Pub/Sub Subscriber origin
  • Google BigQuery destination
  • Google BigQuery (Legacy) destination
  • Google Cloud Storage destination
  • Google Pub/Sub Publisher destination
  • Google BigQuery executor
  • Google Cloud Storage executor
streamsets-datacollector-google-secret-manager-credentialstore-lib For the Google Secret Manager credential store.
streamsets-datacollector-groovy_2_4-lib For Groovy version 2.4.
Includes:
  • Groovy Scripting origin
  • Groovy Evaluator processor
streamsets-datacollector-groovy_4_0-lib For Groovy version 4.0.
Includes:
  • Groovy Scripting origin
  • Groovy Evaluator processor
streamsets-datacollector-hdp_3_1-lib For Hortonworks version 3.1.
Includes:
  • Hadoop FS origin for cluster mode pipelines
  • Hadoop FS Standalone origin
  • Kafka Consumer origin for standalone and cluster mode pipelines
  • Kafka Multitopic Consumer origin
  • HBase Lookup processor
  • Hive Metadata processor
  • Spark Evaluator processor
  • Hadoop FS destination
  • HBase destination
  • Hive Metastore destination
  • Kafka Producer destination
  • HDFS File Metadata executor
  • Hive Query executor
  • MapReduce executor
  • Spark executor
streamsets-datacollector-influxdb_0_9-lib For InfluxDB version 0.9 - 1.x.

Includes the InfluxDB destination.

streamsets-datacollector-influxdb_2_0-lib For InfluxDB version 2.x.

Includes the InfluxDB 2.x destination.

streamsets-datacollector-jdbc-branded-oracle-lib For Oracle.

Includes the Oracle destination.

streamsets-datacollector-jdbc-lib For JDBC access to databases.
Includes:
  • JDBC Multitable Consumer origin
  • JDBC Query Consumer origin
  • PostgreSQL CDC Client origin
  • Oracle CDC Client origin
  • SQL Server CDC Client origin
  • SQL Server Change Tracking origin
  • JDBC Lookup processor
  • JDBC Tee processor
  • PostgreSQL Metadata processor
  • SQL Parser processor
  • JDBC Producer destination
  • JDBC Query executor
streamsets-datacollector-jdbc-oracle-lib For Oracle.
Includes:
  • Oracle Bulkload origin
  • Oracle CDC origin
streamsets-datacollector-jdbc-sap-hana-lib For JDBC access to SAP HANA databases.

Includes the SAP HANA Query Consumer origin.

streamsets-datacollector-jks-credentialstore-lib For the Java keystore credential store.
streamsets-datacollector-jms-lib For Java Messaging Services (JMS).

Includes the JMS Consumer origin and JMS Producer destination.

streamsets-datacollector-jython_2_7-lib For Jython version 2.7.x.
Includes:
  • Jython Scripting origin
  • Jython Evaluator processor
streamsets-datacollector-kaitai-lib For Kaitai Struct.

Includes the Kaitai Struct Parser processor.

streamsets-datacollector-kinesis-lib For Amazon Kinesis.
Includes:
  • Kinesis Consumer origin
  • Kinesis Firehose destination
  • Kinesis Producer destination
streamsets-datacollector-kinetica_6_0-lib For Kinetica 6.0.

Includes the KineticaDB destination.

streamsets-datacollector-kinetica_6_1-lib For Kinetica 6.1.

Includes the KineticaDB destination.

streamsets-datacollector-kinetica_6_2-lib For Kinetica 6.2.

Includes the KineticaDB destination.

streamsets-datacollector-kinetica_7_0-lib For Kinetica 7.0.

Includes the KineticaDB destination.

streamsets-datacollector-mapr_6_0-lib For MapR version 6.0.0 and 6.0.1.
Includes:
  • MapR DB CDC origin
  • MapR DB JSON origin
  • MapR FS origin for cluster mode pipelines
  • MapR FS Standalone origin
  • MapR Multitopic Streams Consumer origin
  • MapR DB destination
  • MapR DB JSON destination
  • MapR FS destination
  • MapR FS File Metadata executor
streamsets-datacollector-mapr_6_1-lib For MapR version 6.1.0.
Includes:
  • MapR DB CDC origin
  • MapR DB JSON origin
  • MapR FS origin for cluster mode pipelines
  • MapR FS Standalone origin
  • MapR Multitopic Streams Consumer origin
  • MapR DB destination
  • MapR DB JSON destination
  • MapR FS destination
  • MapR FS File Metadata executor
streamsets-datacollector-mapr_6_0-mep4-lib For MapR 6.0.0 with EEP 4.x.
Includes:
  • MapR Streams Consumer origin
  • Hive Metadata processor
  • Spark Evaluator processor
  • Hive Metastore destination
  • Hive Streaming destination using the MapR library
streamsets-datacollector-mapr_6_0-mep5-lib For MapR 6.0.1 with EEP 5.x.
Includes:
  • MapR Streams Consumer origin
  • Hive Metadata processor
  • Spark Evaluator processor
  • Hive Metastore destination
  • Hive Streaming destination using the MapR library
streamsets-datacollector-mapr_6_1-mep6-lib For MapR 6.1.0 with EEP 6.x.
Includes:
  • MapR Streams Consumer origin
  • Hive Metadata processor
  • Spark Evaluator processor
  • Hive Metastore destination
  • Hive Streaming destination using the MapR library
streamsets-datacollector-mapr_7_0-lib For HPE Ezmeral Data Fabric 7.0.x.
Includes:
  • MapR DB CDC origin
  • MapR DB JSON origin
  • MapR FS origin for cluster mode pipelines
  • MapR FS Standalone origin
  • MapR Multitopic Streams Consumer origin
  • MapR DB destination
  • MapR DB JSON destination
  • MapR FS destination
  • MapR FS File Metadata executor
streamsets-datacollector-mapr_7_0-mep8-lib For HPE Ezmeral Data Fabric 7.0.x with EEP 8.x.
Includes:
  • MapR Streams Consumer origin
  • Hive Metadata processor
  • Spark Evaluator processor
  • Hive Metastore destination
  • Hive Streaming destination using the MapR library
streamsets-datacollector-mleap-lib For MLeap.

Includes the MLeap Evaluator processor.

streamsets-datacollector-mongodb_3-lib For MongoDB 3.0 with Java driver 3.5.0.
Includes:
  • MongoDB origin
  • MongoDB Oplog origin
  • MongoDB Lookup processor
  • MongoDB destination
streamsets-datacollector-mongodb_4-lib For MongoDB 4.0 with Java driver 3.12.0.
Includes:
  • MongoDB origin
  • MongoDB Oplog origin
  • MongoDB Lookup processor
  • MongoDB destination
streamsets-datacollector-mongodb-atlas-lib For MongoDB Atlas and MongoDB Enterprise Server.
Includes:
  • MongoDB Atlas origin
  • MongoDB Atlas CDC origin
  • MongoDB Atlas destination
streamsets-datacollector-mysql-binlog-lib For MySQL binary logs.

Includes the MySQL Binary Log origin.

streamsets-datacollector-omniture-lib For Omniture.

Includes the Omniture origin.

streamsets-datacollector-orchestrator-lib For the orchestration stages.
Includes:
  • Cron Scheduler origin
  • Start Jobs origin
  • Start Pipelines origin
  • Control Hub API processor
  • Start Jobs processor
  • Start Pipelines processor
  • Wait for Jobs processor
  • Wait for Pipelines processor
streamsets-datacollector-postgres-aurora-lib For Amazon Aurora PostgreSQL versions 1 through 4.

Includes the Aurora PostgreSQL CDC Client origin.

streamsets-datacollector-rabbitmq-lib For RabbitMQ version 3.5.6.

Includes the RabbitMQ Consumer origin and RabbitMQ Producer destination.

streamsets-datacollector-redis-lib For Redis versions 2.8 and 3.0.
Includes:
  • Redis Consumer origin
  • Redis Lookup processor
  • Redis destination
streamsets-datacollector-salesforce-lib

For Salesforce.

Includes:
  • Salesforce origin
  • Salesforce Bulk API 2.0 origin
  • Salesforce Lookup processor
  • Salesforce Bulk API 2.0 Lookup processor
  • Salesforce destination
  • Salesforce Bulk API 2.0 destination
  • Tableau CRM destination
streamsets-datacollector-sdc-databricks-lib For Databricks.
Includes:
  • Databricks Delta Lake destination
  • Databricks Query executor
streamsets-datacollector-sdc-snowflake-lib For Snowflake.
Includes:
  • Snowflake Bulk origin
  • Snowflake destination
  • Snowflake File Uploader destination
  • Snowflake executor
streamsets-datacollector-singlestore-lib For SingleStore.

Includes the SingleStore destination.

streamsets-datacollector-stats-lib

StreamSets Control Hub requires that the statistics stage library be installed on each registered Data Collector.

streamsets-datacollector-tensorflow-lib For TensorFlow.

Includes the TensorFlow Evaluator processor.

streamsets-datacollector-teradata-lib For Teradata.

Includes the Teradata destination.

streamsets-datacollector-thycotic-credentialstore-lib For the Thycotic Secret Server credential store.
streamsets-datacollector-vault-credentialstore-lib For the Hashicorp Vault credential store.
streamsets-datacollector-webclient-impl-okhttp For OkHttp.
Includes:
  • Web Client origin
  • Web Client processor
  • Web Client destination
streamsets-datacollector-wholefile-transformer-lib Includes the Whole File Transformer processor.
streamsets-datacollector-windows-lib

For Windows. Includes the Windows Event Log origin.

Legacy Stage Libraries

Legacy stage libraries are older stage libraries that have been removed from Data Collector. Though we strongly advise using the stage libraries provided with Data Collector, and upgrading related systems, you can use these legacy libraries when necessary.

For steps for upgrading pipelines that use legacy libraries to current stage libraries, see Update Pipelines using Legacy Stage Libraries.

To use a legacy library, you must install the legacy library. The installation method depends on how you installed Data Collector:
Tarball installations
Install legacy stage libraries with Package Manager. Follow the instructions in Installing for Tarball Using Package Manager. You can click Legacy Stage Libraries to filter the list of stage libraries, showing only legacy libraries.
RPM package or Cloudera Manager installations
Install legacy stage libraries manually:
  1. Download the legacy libraries:
    1. Go to the StreamSets archives page and navigate to the release that you are using.
    2. Click the "Legacy" link and download the legacy libraries that you want to use.
  2. Install and manage the legacy libraries as you would custom stage libraries. For more information, see Custom Stage Libraries.
The following table lists the legacy stage libraries:
Legacy Stage Library Included Stages
streamsets-datacollector-apache-kafka_0_8_1-lib For Kafka version 0.8.1.
Includes:
  • Kafka Consumer origin
  • Kafka Producer destination
streamsets-datacollector-apache-kafka_0_8_2-lib For Kafka version 0.8.2.
Includes:
  • Kafka Consumer origin
  • Kafka Producer destination
streamsets-datacollector-apache-kafka_0_9-lib For Kafka version 0.9.x.
Includes:
  • Kafka Consumer origin
  • Kafka Multitopic Consumer origin
  • Kafka Producer destination
streamsets-datacollector-apache-kafka_0_10-lib For Kafka version 0.10.x.
Includes:
  • Kafka Consumer origin
  • Kafka Multitopic Consumer origin
  • Kafka Producer destination
streamsets-datacollector-apache-kafka_0_11-lib For Kafka version 0.11.x.
Includes:
  • Kafka Consumer origin
  • Kafka Multitopic Consumer origin
  • Kafka Producer destination
streamsets-datacollector-apache-kudu_1_0-lib For Kudu version 1.0.x.

Includes the Kudu Lookup processor and Kudu destination.

streamsets-datacollector-apache-kudu_1_1-lib For Kudu version 1.1.x.

Includes the Kudu Lookup processor and Kudu destination.

streamsets-datacollector-apache-kudu_1_2-lib For Kudu version 1.2.x.

Includes the Kudu Lookup processor and Kudu destination.

streamsets-datacollector-cdh_5_2-lib

For the Cloudera CDH version 5.2 distribution of Apache Hadoop.

Includes:

  • Hadoop FS origin for cluster mode pipelines
  • Hadoop FS Standalone origin
  • HBase Lookup processor
  • Flume destination
  • Hadoop FS destination
  • HBase destination
  • Solr destination
  • HDFS File Metadata executor
  • MapReduce executor
streamsets-datacollector-cdh_5_3-lib

For the Cloudera CDH version 5.3 distribution of Apache Hadoop.

Includes:

  • Hadoop FS origin for cluster mode pipelines
  • Hadoop FS Standalone origin
  • HBase Lookup processor
  • Flume destination
  • Hadoop FS destination
  • HBase destination
  • Solr destination
  • HDFS File Metadata executor
  • MapReduce executor
streamsets-datacollector-cdh_5_4-lib

For the Cloudera CDH version 5.4 distribution of Apache Hadoop.

Includes:

  • Hadoop FS origin for cluster mode pipelines
  • Hadoop FS Standalone origin
  • HBase Lookup processor
  • Hive Metadata processor
  • Flume destination
  • Hadoop FS destination
  • HBase destination
  • Hive Metastore destination
  • Hive Streaming destination
  • Solr destination
  • HDFS File Metadata executor
  • Hive Query executor
  • MapReduce executor
streamsets-datacollector-cdh_5_5-lib

For the Cloudera CDH version 5.5 distribution of Apache Hadoop.

Includes:

  • Hadoop FS origin for cluster mode pipelines
  • Hadoop FS Standalone origin
  • HBase Lookup processor
  • Hive Metadata processor
  • Flume destination
  • Hadoop FS destination
  • HBase destination
  • Hive Metastore destination
  • Hive Streaming destination
  • Solr destination
  • HDFS File Metadata executor
  • Hive Query executor
  • MapReduce executor
streamsets-datacollector-cdh_5_7-lib

For the Cloudera CDH version 5.7 distribution of Apache Hadoop.

Includes:

  • Hadoop FS origin for cluster mode pipelines
  • Hadoop FS Standalone origin
  • HBase Lookup processor
  • Hive Metadata processor
  • Spark Evaluator processor
  • Flume destination
  • Hadoop FS destination
  • HBase destination
  • Hive Metastore destination
  • Hive Streaming destination
  • Solr destination
  • HDFS File Metadata executor
  • Hive Query executor
  • MapReduce executor
  • Spark executor
streamsets-datacollector-cdh_5_8-lib

For the Cloudera CDH version 5.8 distribution of Apache Hadoop.

Includes:

  • Hadoop FS origin for cluster mode pipelines
  • Hadoop FS Standalone origin
  • HBase Lookup processor
  • Hive Metadata processor
  • Spark Evaluator processor
  • Flume destination
  • Hadoop FS destination
  • HBase destination
  • Hive Metastore destination
  • Hive Streaming destination
  • Solr destination
  • HDFS File Metadata executor
  • Hive Query executor
  • MapReduce executor
  • Spark executor
streamsets-datacollector-cdh_5_9-lib

For the Cloudera CDH version 5.9 distribution of Apache Hadoop.

Includes:

  • Hadoop FS origin for cluster mode pipelines
  • Hadoop FS Standalone origin
  • HBase Lookup processor
  • Hive Metadata processor
  • Spark Evaluator processor
  • Flume destination
  • Hadoop FS destination
  • HBase destination
  • Hive Metastore destination
  • Hive Streaming destination
  • Solr destination
  • HDFS File Metadata executor
  • Hive Query executor
  • MapReduce executor
  • Spark executor
streamsets-datacollector-cdh_5_10-lib

For the Cloudera CDH version 5.10 distribution of Apache Hadoop.

Includes:

  • Hadoop FS origin for cluster mode pipelines
  • Hadoop FS Standalone origin
  • HBase Lookup processor
  • Hive Metadata processor
  • Spark Evaluator processor
  • Flume destination
  • Hadoop FS destination
  • HBase destination
  • Hive Metastore destination
  • Hive Streaming destination
  • Solr destination
  • HDFS Metadata executor
  • Hive Query executor
  • MapReduce executor
  • Spark executor
streamsets-datacollector-cdh_5_11-lib

For the Cloudera CDH version 5.11 distribution of Apache Hadoop.

Includes:

  • Hadoop FS origin for cluster mode pipelines
  • Hadoop FS Standalone origin
  • HBase Lookup processor
  • Hive Metadata processor
  • Spark Evaluator processor
  • Flume destination
  • Hadoop FS destination
  • HBase destination
  • Hive Metastore destination
  • Hive Streaming destination
  • Solr destination
  • HDFS Metadata executor
  • Hive Query executor
  • MapReduce executor
  • Spark executor
streamsets-datacollector-cdh_5_12-lib

For the Cloudera CDH version 5.12 distribution of Apache Hadoop.

Includes:

  • Hadoop FS origin for cluster mode pipelines
  • Hadoop FS Standalone origin
  • HBase Lookup processor
  • Hive Metadata processor
  • Spark Evaluator processor
  • Flume destination
  • Hadoop FS destination
  • HBase destination
  • Hive Metastore destination
  • Hive Streaming destination
  • Solr destination
  • HDFS Metadata executor
  • Hive Query executor
  • MapReduce executor
  • Spark executor
streamsets-datacollector-cdh_5_13-lib

For the Cloudera CDH version 5.13 distribution of Apache Hadoop.

Includes:

  • Hadoop FS origin for cluster mode pipelines
  • Hadoop FS Standalone origin
  • HBase Lookup processor
  • Hive Metadata processor
  • Kudu Lookup processor
  • Spark Evaluator processor
  • Flume destination
  • Hadoop FS destination
  • HBase destination
  • Hive Metastore destination
  • Hive Streaming destination
  • Kudu destination
  • Solr destination
  • HDFS File Metadata executor
  • Hive Query executor
  • MapReduce executor
  • Spark executor
streamsets-datacollector-cdh_kafka_1_2-lib For the Cloudera distribution of Apache Kafka 1.2 (0.8.2.0).
Includes:
  • Kafka Consumer origin
  • Kafka Producer destination
streamsets-datacollector-cdh_kafka_1_3-lib For the Cloudera distribution of Apache Kafka 1.3 (0.8.2.0).
Includes:
  • Kafka Consumer origin
  • Kafka Producer destination
streamsets-datacollector-cdh_kafka_2_0-lib For the Cloudera distribution of Apache Kafka 2.0.x (0.9.0).
Includes:
  • Kafka Consumer origin
  • Kafka Multitopic Consumer origin
  • Kafka Producer destination
streamsets-datacollector-cdh_kafka_2_1-lib For the Cloudera distribution of Apache Kafka 2.1.x (0.9.0).
Includes:
  • Kafka Consumer origin
  • Kafka Multitopic Consumer origin
  • Kafka Producer destination
streamsets-datacollector-cdh_kafka_3_0-lib For the Cloudera distribution of Apache Kafka 3.0.0 (0.11.0).
Includes:
  • Kafka Consumer origin
  • Kafka Multitopic Consumer origin
  • Kafka Producer destination
streamsets-datacollector-cdh_spark_2_1-lib For the Cloudera CDH cluster Kafka with CDS powered by Spark 2.1.

Includes the Kafka Consumer origin for cluster mode pipelines.

streamsets-datacollector-hdp_2_2-lib For the Hortonworks version 2.2 distribution of Apache Hadoop.
Includes:
  • Hadoop FS origin for cluster mode pipelines

  • Hadoop FS Standalone origin
  • Kafka Consumer origin
  • HBase Lookup processor
  • Hive Metadata processor
  • Flume destination
  • Hadoop FS destination
  • HBase destination
  • Hive Metastore destination
  • Hive Streaming destination
  • Kafka Producer destination
  • HDFS File Metadata executor
  • Hive Query executor
  • MapReduce executor
streamsets-datacollector-hdp_2_3-lib For the Hortonworks version 2.3 distribution of Apache Hadoop.
Includes:
  • Flume destination
  • Hadoop FS origin for cluster mode pipelines
  • Hadoop FS destination
  • Hadoop FS Standalone origin
  • HBase destination
  • HBase Lookup processor
  • HDFS File Metadata executor
  • Kafka Consumer origin
  • Kafka Producer destination
  • MapReduce executor
streamsets-datacollector-hdp_2_3-hive1-lib The Hortonworks version 2.3.x distribution of Apache Hive 1.x.
Includes:
  • Hive Metadata processor
  • Hive Metastore destination
  • Hive Streaming destination
  • Hive Query executor
streamsets-datacollector-hdp_2_4-lib For the Hortonworks version 2.4 distribution of Apache Hadoop.
Includes:
  • Hadoop FS origin for cluster mode pipelines

  • Hadoop FS Standalone origin
  • Kafka Consumer origin for standalone pipelines
  • HBase Lookup processor
  • Flume destination
  • Hadoop FS destination
  • HBase destination
  • Kafka Producer destination
  • HDFS Metadata executor
  • MapReduce executor
streamsets-datacollector-hdp_2_4-hive1-lib For the Hortonworks version 2.4.x distribution of Apache Hive version 1.x.
Includes:
  • Hive Metadata processor
  • Hive Metastore destination
  • Hive Streaming destination
  • Hive Query executor
streamsets-datacollector-hdp_2_5-lib For the Hortonworks version 2.5.x distribution of Apache Hadoop.
Includes:
  • Hadoop FS origin for cluster mode pipelines

  • Hadoop FS Standalone origin
  • Kafka Consumer origin for standalone and cluster mode pipelines
  • Kafka Multitopic Consumer origin
  • HBase Lookup processor
  • Hive Metadata processor
  • Hadoop FS destination
  • HBase destination
  • Hive Metastore destination
  • Hive Streaming destination
  • Kafka Producer destination
  • HDFS Metadata executor
  • Hive Query executor
  • MapReduce executor
streamsets-datacollector-hdp_2_5-flume-lib For the Hortonworks version 2.5.x distribution of Apache Flume.

Includes the Flume destination.

streamsets-datacollector-hdp_2_6-lib For the Hortonworks version 2.6.x distribution of Apache Hadoop.
Includes:
  • Hadoop FS origin for cluster mode pipelines

  • Hadoop FS Standalone origin
  • Kafka Consumer origin for standalone and cluster mode pipelines
  • Kafka Multitopic Consumer origin
  • HBase Lookup processor
  • Flume destination
  • Hadoop FS destination
  • HBase destination
  • Kafka Producer destination
  • HDFS Metadata executor
  • MapReduce executor
streamsets-datacollector-hdp_2_6-flume-lib For the Hortonworks version 2.6.x distribution of Apache Flume.

Includes the Flume destination.

streamsets-datacollector-hdp_2_6-hive2-lib For the Hortonworks version 2.6.x distribution of Apache Hive version 2.1.
Includes:
  • Hive Metadata processor
  • Hive Metastore destination
  • Hive Streaming destination
  • Hive Query executor
streamsets-datacollector-hdp_2_6_1-hive1-lib For the Hortonworks version 2.6.1 distribution of Apache Hive version 1.x.
Includes:
  • Hive Metadata processor
  • Hive Metastore destination
  • Hive Streaming destination
  • Hive Query executor
streamsets-datacollector-hdp_2_6_2-hive1-lib For the Hortonworks version 2.6.2 distribution of Apache Hive version 1.x.
Includes:
  • Hive Metadata processor
  • Hive Metastore destination
  • Hive Streaming destination
  • Hive Query executor