Available Stage Libraries

A full Data Collector installation includes all of the following stage libraries. A core installation includes only some of the following stage libraries and typically requires you to install additional stage libraries. A common installation includes commonly-used stage libraries.

You can install additional stage libraries into either a core or common installation.

The following table describes the stages installed with each stage library:
Stage Library Name Included Stages
streamsets-datacollector-apache-kafka-lib For Kafka.
Includes:
  • Kafka Multitopic Consumer origin
  • Kafka Producer destination
streamsets-datacollector-apache-pulsar-lib For Apache Pulsar versions 2.x.
Includes:
  • Pulsar Consumer origin
  • Pulsar Consumer (Legacy) origin
  • Pulsar Producer destination
streamsets-datacollector-aws-lib For Amazon Web Services 1.11.x.
Includes:
  • Amazon S3 origin
  • Amazon SQS Consumer origin
  • Amazon S3 destination
  • Amazon S3 executor
streamsets-datacollector-aws-secrets-manager-credentialstore-lib For the AWS Secrets Manager credential store.
streamsets-datacollector-azure-keyvault-credentialstore-lib For the Microsoft Azure Key Vault credential store.
streamsets-datacollector-azure-lib For Microsoft Azure.
Includes:
  • Azure Blob Storage origin
  • Azure Data Lake Storage Gen2 origin
  • Azure Data Lake Storage Gen2 (Legacy) origin
  • Azure IoT/Event Hub Consumer origin
  • Azure Blob Storage destination
  • Azure Data Lake Storage Gen2 destination
  • Azure Event Hub Producer destination
  • Azure IoT Hub Producer destination
  • Azure Synapse SQL destination
  • ADLS Gen2 File Metadata executor
streamsets-datacollector-basic-lib
Includes the following origins:
  • CoAP Server
  • Directory
  • File Tail
  • JavaScript Scripting
  • MQTT Subscriber
  • OPC UA Client
  • REST Service
  • TCP Server
  • UDP Multithreaded Source
  • UDP Source
  • WebSocket Client
  • WebSocket Server
Includes the following processors:
  • Base64 Field Decoder
  • Base64 Field Encoder
  • Data Generator
  • Data Parser
  • Delay
  • Expression Evaluator
  • Field Flattener
  • Field Hasher
  • Field Mapper
  • Field Masker
  • Field Merger
  • Field Order
  • Field Pivoter
  • Field Remover
  • Field Renamer
  • Field Replacer
  • Field Splitter
  • Field Type Converter
  • Field Zip
  • Geo IP
  • JavaScript Evaluator
  • JSON Generator
  • JSON Parser
  • Log Parser
  • Record Deduplicator
  • Schema Generator
  • Static Lookup
  • Stream Selector
  • Windowing Aggregator
  • XML Flattener
  • XML Parser
Includes the following destinations:
  • CoAP Client
  • Local FS
  • MQTT Publisher
  • Named Pipe
  • Send Response to Origin
  • Splunk
  • Syslog
  • To Error
  • Trash
  • WebSocket Client
Includes the following executors:
  • Databricks Job Launcher
  • Email
  • Pipeline Finisher
  • Shell
streamsets-datacollector-bigtable-lib For Google Cloud Bigtable.

Includes the Google Bigtable destination.

streamsets-datacollector-cassandra_3-lib For Cassandra 1.2, 2.x, and 3.x.

Includes the Cassandra destination.

streamsets-datacollector-cdp_7_1_8-lib For Cloudera CDP 7.1.8.
Includes:
  • Hadoop FS Standalone origin
  • HBase Lookup processor
  • Hive Metadata processor
  • Kudu Lookup processor
  • Hadoop FS destination
  • HBase destination
  • Hive Metastore destination
  • Kudu destination
  • Solr destination
  • HDFS File Metadata executor
  • Hive Query executor
  • MapReduce executor
  • Spark executor
streamsets-datacollector-cdp_7_1_9-lib For Cloudera CDP 7.1.9.
Includes:
  • Hadoop FS Standalone origin
  • HBase Lookup processor
  • Hive Metadata processor
  • Kudu Lookup processor
  • Hadoop FS destination
  • HBase destination
  • Hive Metastore destination
  • Kudu destination
  • Solr destination
  • HDFS File Metadata executor
  • Hive Query executor
  • MapReduce executor
  • Spark executor
streamsets-datacollector-connx-lib For CONNX.
Includes:
  • CONNX origin
  • CONNX CDC origin
streamsets-datacollector-couchbase_3-lib For Couchcbase SDK 3.x.
Includes:
  • Couchbase origin
  • Couchbase destination
streamsets-datacollector-crypto-lib For cryptography stages.

Includes the Encrypt and Decrypt Fields processor.

streamsets-datacollector-cyberark-credentialstore-lib For the CyberArk credential store.
streamsets-datacollector-dataformats-lib

Contains parsers and generators for the data formats supported by Data Collector.

streamsets-datacollector-dev-lib For developing and testing pipelines.
Includes:
  • Dev Data Generator origin
  • Dev Random Record origin
  • Dev Raw Data Source origin
  • Dev Snapshot origin
  • Dev Identity processor
  • Dev Random Error processor
  • Dev Record Creator processor
  • To Event destination
Note: Do not use these stages in production pipelines.
streamsets-datacollector-elasticsearch_7-lib For Elasticsearch 7.x.

Includes the Elasticsearch origin and destination.

streamsets-datacollector-elasticsearch_8-lib For Elasticsearch 8.x.

Includes the Elasticsearch origin and destination.

streamsets-datacollector-file-transfer-lib For SFTP/FTP/FTPS.

Includes the SFTP/FTP/FTPS Client origin, destination, and executor.

streamsets-datacollector-google-cloud-lib For Google Cloud.
Includes:
  • Google BigQuery origin
  • Google Cloud Storage origin
  • Google Pub/Sub Subscriber origin
  • Google BigQuery destination
  • Google Cloud Storage destination
  • Google Pub/Sub Publisher destination
  • Google BigQuery executor
  • Google Cloud Storage executor
streamsets-datacollector-google-secret-manager-credentialstore-lib For the Google Secret Manager credential store.
streamsets-datacollector-groovy_2_4-lib For Groovy version 2.4.
Includes:
  • Groovy Scripting origin
  • Groovy Evaluator processor
streamsets-datacollector-groovy_4_0-lib For Groovy version 4.0.
Includes:
  • Groovy Scripting origin
  • Groovy Evaluator processor
streamsets-datacollector-hpe_edf_7_2-eep_9_2-lib For HPE Ezmeral Data Fabric 7.2.x with EEP 9.2.
Includes:
  • MapR DB CDC Consumer origin
  • MapR DB JSON origin
  • MapR FS Standalone origin
  • MapR Multitopic Streams Consumer origin
  • MapR Streams Consumer origin
  • MapR DB destination
  • MapR DB JSON destination
  • MapR FS destination
  • MapR Streams Producer destination
  • MapR FS File Metadata executor
streamsets-datacollector-http-lib For HTTP.
Includes:
  • HTTP Client origin, processor, and destination

  • HTTP Router processor

  • HTTP Server origin

streamsets-datacollector-influxdb_2_0-lib For InfluxDB version 2.x.

Includes the InfluxDB 2.x destination.

streamsets-datacollector-jdbc-branded-oracle-lib For Oracle.
Includes:
  • Oracle Multitable Consumer origin
  • Oracle destination
streamsets-datacollector-jdbc-lib For JDBC access to databases.
Includes:
  • JDBC Multitable Consumer origin
  • JDBC Query Consumer origin
  • PostgreSQL CDC Client origin
  • Oracle CDC Client origin
  • SQL Server CDC Client origin
  • SQL Server Change Tracking origin
  • JDBC Lookup processor
  • JDBC Tee processor
  • PostgreSQL Metadata processor
  • SQL Parser processor
  • JDBC Producer destination
  • JDBC Query executor
streamsets-datacollector-jdbc-oracle-lib For Oracle.
Includes:
  • Oracle Bulkload origin
  • Oracle CDC origin
  • Oracle Multitable Consumer origin
streamsets-datacollector-jdbc-sap-hana-lib For JDBC access to SAP HANA databases.

Includes the SAP HANA Query Consumer origin.

streamsets-datacollector-jks-credentialstore-lib For the Java keystore credential store.
streamsets-datacollector-jms-lib For Java Messaging Services (JMS).

Includes the JMS Consumer origin and JMS Producer destination.

streamsets-datacollector-jython_2_7-lib For Jython version 2.7.x.
Includes:
  • Jython Scripting origin
  • Jython Evaluator processor
streamsets-datacollector-kaitai-lib For Kaitai Struct.

Includes the Kaitai Struct Parser processor.

streamsets-datacollector-kinesis-lib For Amazon Kinesis.
Includes:
  • Kinesis Consumer origin
  • Kinesis Firehose destination
  • Kinesis Producer destination
streamsets-datacollector-mapr_7_0-lib For HPE Ezmeral Data Fabric 7.0.x.
Includes:
  • MapR DB CDC origin
  • MapR DB JSON origin
  • MapR FS Standalone origin
  • MapR Multitopic Streams Consumer origin
  • MapR DB destination
  • MapR DB JSON destination
  • MapR FS destination
  • MapR FS File Metadata executor
streamsets-datacollector-mapr_7_0-mep8-lib For HPE Ezmeral Data Fabric 7.0.x with EEP 8.x.
Includes:
  • MapR Streams Consumer origin
  • Hive Metadata processor
  • Hive Metastore destination
streamsets-datacollector-mleap-lib For MLeap.

Includes the MLeap Evaluator processor.

streamsets-datacollector-mongodb_3-lib For MongoDB 3.0 with Java driver 3.5.0.
Includes:
  • MongoDB origin
  • MongoDB Oplog origin
  • MongoDB Lookup processor
  • MongoDB destination
streamsets-datacollector-mongodb_4-lib For MongoDB 4.0 with Java driver 3.12.0.
Includes:
  • MongoDB origin
  • MongoDB Oplog origin
  • MongoDB Lookup processor
  • MongoDB destination
streamsets-datacollector-mongodb-atlas-lib For MongoDB Atlas and MongoDB Enterprise Server.
Includes:
  • MongoDB Atlas origin
  • MongoDB Atlas CDC origin
  • MongoDB Atlas destination
streamsets-datacollector-mysql-binlog-lib For MySQL binary logs.

Includes the MySQL Binary Log origin.

streamsets-datacollector-orchestrator-lib For the orchestration stages.
Includes:
  • Cron Scheduler origin
  • Start Jobs origin
  • Control Hub API processor
  • Start Jobs processor
  • Wait for Jobs processor
streamsets-datacollector-postgres-aurora-lib For Amazon Aurora PostgreSQL versions 1 through 4.

Includes the Aurora PostgreSQL CDC Client origin.

streamsets-datacollector-rabbitmq-lib For RabbitMQ version 3.5.6.

Includes the RabbitMQ Consumer origin and RabbitMQ Producer destination.

streamsets-datacollector-redis-lib For Redis versions 2.8 and 3.0.
Includes:
  • Redis Consumer origin
  • Redis Lookup processor
  • Redis destination
streamsets-datacollector-salesforce-lib

For Salesforce.

Includes:
  • Salesforce origin
  • Salesforce Bulk API 2.0 origin
  • Salesforce Lookup processor
  • Salesforce Bulk API 2.0 Lookup processor
  • Salesforce destination
  • Salesforce Bulk API 2.0 destination
  • Tableau CRM destination
streamsets-datacollector-sdc-databricks-lib For Databricks.
Includes:
  • Databricks Delta Lake destination
  • Databricks Query executor
streamsets-datacollector-sdc-snowflake-lib For Snowflake.
Includes:
  • Snowflake Bulk origin
  • Snowflake destination
  • Snowflake File Uploader destination
  • Snowflake executor
streamsets-datacollector-singlestore-lib For SingleStore.

Includes the SingleStore destination.

streamsets-datacollector-tensorflow-lib For TensorFlow.

Includes the TensorFlow Evaluator processor.

streamsets-datacollector-teradata-lib For Teradata.

Includes the Teradata destination.

streamsets-datacollector-thycotic-credentialstore-lib For the Thycotic Secret Server credential store.
streamsets-datacollector-vault-credentialstore-lib For the Hashicorp Vault credential store.
streamsets-datacollector-webclient-impl-okhttp For OkHttp.
Includes:
  • Jira origin
  • Jira destination
  • Web Client origin
  • Web Client processor
  • Web Client destination
streamsets-datacollector-wholefile-transformer-lib Includes the Whole File Transformer processor.
streamsets-datacollector-windows-lib

For Windows. Includes the Windows Event Log origin.