Stage Libraries

A Control Hub deployment defines the stage libraries that are installed on all engine instances managed by the deployment. When you configure any deployment type, you select the stage libraries to install on the engine.

Important: You must perform additional steps to install the MapR stage libraries, as described in MapR Prerequisites.

Common Stage Libraries

Common stage libraries include stages that are the most commonly used.

The following table describes the stages installed with each common stage library:
Stage Library Name Included Stages
streamsets-datacollector-apache-kafka_1_0-lib For Kafka version 1.0.x.
Includes:
  • Kafka Multitopic Consumer origin
  • Kafka Producer destination
streamsets-datacollector-apache-kafka_1_1-lib For Kafka version 1.1.x.
Includes:
  • Kafka Multitopic Consumer origin
  • Kafka Producer destination
streamsets-datacollector-apache-kafka_2_0-lib For Kafka version 2.0.x.
Includes:
  • Kafka Multitopic Consumer origin
  • Kafka Producer destination
streamsets-datacollector-apache-kafka_2_1-lib For Kafka version 2.1.x.
Includes:
  • Kafka Multitopic Consumer origin
  • Kafka Producer destination
streamsets-datacollector-apache-kafka_2_2-lib For Kafka version 2.2.x.
Includes:
  • Kafka Multitopic Consumer origin
  • Kafka Producer destination
streamsets-datacollector-apache-kafka_2_3-lib For Kafka version 2.3.x.
Includes:
  • Kafka Multitopic Consumer origin
  • Kafka Producer destination
streamsets-datacollector-apache-kafka_2_4-lib For Kafka version 2.4.x.
Includes:
  • Kafka Multitopic Consumer origin
  • Kafka Producer destination
streamsets-datacollector-apache-kafka_2_5-lib For Kafka version 2.5.x.
Includes:
  • Kafka Multitopic Consumer origin
  • Kafka Producer destination
streamsets-datacollector-apache-kafka_2_6-lib For Kafka version 2.6.x.
Includes:
  • Kafka Multitopic Consumer origin
  • Kafka Producer destination
streamsets-datacollector-apache-kafka_2_7-lib For Kafka version 2.7.x.
Includes:
  • Kafka Multitopic Consumer origin
  • Kafka Producer destination
streamsets-datacollector-apache-kafka_2_8-lib For Kafka version 2.8.x.
Includes:
  • Kafka Multitopic Consumer origin
  • Kafka Producer destination
streamsets-datacollector-apache-kafka_3_0-lib For Kafka version 3.0.x.
Includes:
  • Kafka Multitopic Consumer origin
  • Kafka Producer destination
streamsets-datacollector-apache-kafka_3_1-lib For Kafka version 3.1.x.
Includes:
  • Kafka Multitopic Consumer origin
  • Kafka Producer destination
streamsets-datacollector-apache-kafka_3_2-lib For Kafka version 3.2.x.
Includes:
  • Kafka Multitopic Consumer origin
  • Kafka Producer destination
streamsets-datacollector-apache-pulsar_2-lib For Apache Pulsar version 2.x.
Includes:
  • Pulsar Consumer origin
  • Pulsar Consumer (Legacy) origin
  • Pulsar Producer destination
streamsets-datacollector-apache-solr_6_1_0-lib For Apache Solr version 6.1.

Includes the Solr destination.

streamsets-datacollector-aws-lib For Amazon Web Services 1.11.x.
Includes:
  • Amazon S3 origin
  • Amazon SQS Consumer origin
  • Amazon S3 destination
  • Amazon S3 executor
streamsets-datacollector-aws-secrets-manager-credentialstore-lib For the AWS Secrets Manager credential store.
streamsets-datacollector-azure-keyvault-credentialstore-lib For the Microsoft Azure Key Vault credential store.
streamsets-datacollector-azure-lib For Microsoft Azure.
Includes:
  • Azure Data Lake Storage Gen2 origin
  • Azure IoT/Event Hub Consumer origin
  • Azure Data Lake Storage Gen2 destination
  • Azure Event Hub Producer destination
  • Azure IoT Hub Producer destination
  • ADLS Gen2 File Metadata executor
streamsets-datacollector-basic-lib
Includes the following origins:
  • CoAP Server
  • Directory
  • File Tail
  • gRPC Client
  • HTTP Client
  • HTTP Server
  • JavaScript Scripting
  • MQTT Subscriber
  • OPC UA Client
  • REST Service
  • SFTP/FTP/FTPS Client
  • System Metrics
  • TCP Server
  • UDP Multithreaded Source
  • UDP Source
  • WebSocket Client
  • WebSocket Server
Includes the following processors:
  • Base64 Field Decoder
  • Base64 Field Encoder
  • Data Generator
  • Data Parser
  • Delay
  • Expression Evaluator
  • Field Flattener
  • Field Hasher
  • Field Mapper
  • Field Masker
  • Field Merger
  • Field Order
  • Field Pivoter
  • Field Remover
  • Field Renamer
  • Field Replacer
  • Field Splitter
  • Field Type Converter
  • Field Zip
  • Geo IP
  • HTTP Client
  • HTTP Router
  • JavaScript Evaluator
  • JSON Generator
  • JSON Parser
  • Log Parser
  • Record Deduplicator
  • Schema Generator
  • Static Lookup
  • Stream Selector
  • Windowing Aggregator
  • XML Flattener
  • XML Parser
Includes the following destinations:
  • CoAP Client
  • HTTP Client
  • Local FS
  • MQTT Publisher
  • Named Pipe
  • Send Response to Origin
  • SFTP/FTP/FTPS Client
  • Splunk
  • Syslog
  • To Error
  • Trash
  • WebSocket Client
Includes the following executors:
  • Databricks Job Launcher
  • Email
  • Pipeline Finisher
  • Shell
streamsets-datacollector-bigtable-lib For Google Cloud Bigtable.

Includes the Google Bigtable destination.

streamsets-datacollector-cassandra_3-lib For Cassandra 1.2, 2.x, and 3.x.

Includes the Cassandra destination.

streamsets-datacollector-cdp_7_1-lib For Cloudera CDP 7.1.x.
Includes:
  • Hadoop FS Standalone origin
  • Kafka Multitopic Consumer origin
  • HBase Lookup processor
  • Hive Metadata processor
  • Kudu Lookup processor
  • Hadoop FS destination
  • HBase destination
  • Hive Metastore destination
  • Kafka Producer destination
  • Kudu destination
  • Solr destination
  • HDFS File Metadata executor
  • Hive Query executor
  • MapReduce executor
  • Spark executor
streamsets-datacollector-couchbase_5-lib For Couchbase.
Includes:
  • Couchbase Lookup processor
  • Couchbase destination
streamsets-datacollector-crypto-lib For cryptography stages.

Includes the Encrypt and Decrypt Fields processor.

streamsets-datacollector-cyberark-credentialstore-lib For the CyberArk credential store.
streamsets-datacollector-dataformats-lib

Contains parsers and generators for the data formats supported by Data Collector.

streamsets-datacollector-dev-lib For developing and testing pipelines.
Includes:
  • Dev Data Generator origin
  • Dev Random Record origin
  • Dev Raw Data Source origin
  • Dev SDC RPC with Buffering origin
  • Dev Snapshot origin
  • Sensor Reader origin
  • Dev Identity processor
  • Dev Random Error processor
  • Dev Record Creator processor
  • To Event destination
Note: Do not use these stages in production pipelines.
streamsets-datacollector-elasticsearch_5-lib For Elasticsearch 1.x, 2.x, and 5.x.

Includes the Elasticsearch origin and destination.

streamsets-datacollector-elasticsearch_6-lib For Elasticsearch 6.x.

Includes the Elasticsearch origin and destination.

streamsets-datacollector-elasticsearch_7-lib For Elasticsearch 7.x.

Includes the Elasticsearch origin and destination.

streamsets-datacollector-elasticsearch_8-lib For Elasticsearch 8.x.

Includes the Elasticsearch origin and destination.

streamsets-datacollector-google-cloud-lib For Google Cloud.
Includes:
  • Google BigQuery origin
  • Google Cloud Storage origin
  • Google Pub/Sub Subscriber origin
  • Google BigQuery destination
  • Google Cloud Storage destination
  • Google Pub/Sub Publisher destination
  • Google BigQuery executor
  • Google Cloud Storage executor
streamsets-datacollector-google-secret-manager-credentialstore-lib For the Google Secret Manager credential store.
streamsets-datacollector-groovy_2_4-lib For Groovy version 2.4.
Includes:
  • Groovy Scripting origin
  • Groovy Evaluator processor
streamsets-datacollector-groovy_4_0-lib For Groovy version 4.0.
Includes:
  • Groovy Scripting origin
  • Groovy Evaluator processor
streamsets-datacollector-influxdb_0_9-lib For InfluxDB version 0.9 - 1.x.

Includes the InfluxDB destination.

streamsets-datacollector-influxdb_2_0-lib For InfluxDB version 2.x.

Includes the InfluxDB 2.x destination.

streamsets-datacollector-jdbc-lib For JDBC access to databases.
Includes:
  • JDBC Multitable Consumer origin
  • JDBC Query Consumer origin
  • PostgreSQL CDC Client origin
  • Oracle CDC Client origin
  • SQL Server CDC Client origin
  • SQL Server Change Tracking origin
  • JDBC Lookup processor
  • JDBC Tee processor
  • PostgreSQL Metadata processor
  • SQL Parser processor
  • JDBC Producer destination
  • JDBC Query executor
streamsets-datacollector-jdbc-sap-hana-lib For JDBC access to SAP HANA databases.

Includes the SAP HANA Query Consumer origin.

streamsets-datacollector-jks-credentialstore-lib For the Java keystore credential store.
streamsets-datacollector-jms-lib For Java Messaging Services (JMS).

Includes the JMS Consumer origin and JMS Producer destination.

streamsets-datacollector-jython_2_7-lib For Jython version 2.7.x.
Includes:
  • Jython Scripting origin
  • Jython Evaluator processor
streamsets-datacollector-kinesis-lib For Amazon Kinesis.
Includes:
  • Kinesis Consumer origin
  • Kinesis Firehose destination
  • Kinesis Producer destination
streamsets-datacollector-mapr_6_1-lib For MapR version 6.1.0.
Includes:
  • MapR DB CDC origin
  • MapR DB JSON origin
  • MapR FS Standalone origin
  • MapR Multitopic Streams Consumer origin
  • MapR DB destination
  • MapR DB JSON destination
  • MapR FS destination
  • MapR FS File Metadata executor
streamsets-datacollector-mapr_6_1-mep6-lib For MapR 6.1.0 with MEP 6.x.
Includes:
  • MapR Streams Consumer origin
  • Hive Metadata processor
  • Spark Evaluator processor
  • Hive Metastore destination
streamsets-datacollector-mapr_7_0-lib For MapR 7.0.x.
Includes:
  • MapR DB CDC origin
  • MapR DB JSON origin
  • MapR FS Standalone origin
  • MapR Multitopic Streams Consumer origin
  • MapR DB destination
  • MapR DB JSON destination
  • MapR FS destination
  • MapR FS File Metadata executor
streamsets-datacollector-mapr_7_0-mep8-lib For MapR 7.0.x with MEP 8.x.
Includes:
  • MapR Streams Consumer origin
  • Hive Metadata processor
  • Spark Evaluator processor
  • Hive Metastore destination
streamsets-datacollector-mleap-lib For MLeap.

Includes the MLeap Evaluator processor.

streamsets-datacollector-mongodb_3-lib For MongoDB 3.0 with Java driver 3.5.0.
Includes:
  • MongoDB origin
  • MongoDB Oplog origin
  • MongoDB Lookup processor
  • MongoDB destination
streamsets-datacollector-mongodb_4-lib For MongoDB 4.0 with Java driver 3.12.0.
Includes:
  • MongoDB origin
  • MongoDB Oplog origin
  • MongoDB Lookup processor
  • MongoDB destination
streamsets-datacollector-mongodb-atlas-lib For MongoDB Atlas and Mongo Enterprise Server.
Includes:
  • MongoDB Atlas origin
  • MongoDB Atlas destination
streamsets-datacollector-mysql-binlog-lib For MySQL binary logs.

Includes the MySQL Binary Log origin.

streamsets-datacollector-orchestrator-lib For the orchestration stages.
Includes:
  • Cron Scheduler origin
  • Start Jobs origin
  • Control Hub API processor
  • Start Jobs processor
  • Wait for Jobs processor
streamsets-datacollector-postgres-aurora-lib For Amazon Aurora PostgreSQL versions 1 through 4.

Includes the Aurora PostgreSQL CDC Client origin.

streamsets-datacollector-rabbitmq-lib For RabbitMQ version 3.5.6.

Includes the RabbitMQ Consumer origin and RabbitMQ Producer destination.

streamsets-datacollector-redis-lib For Redis versions 2.8 and 3.0.
Includes:
  • Redis Consumer origin
  • Redis Lookup processor
  • Redis destination
streamsets-datacollector-salesforce-lib

For Salesforce.

Includes:
  • Salesforce origin
  • Salesforce Bulk API 2.0 origin
  • Salesforce Lookup processor
  • Salesforce Bulk API 2.0 Lookup processor
  • Salesforce destination
  • Salesforce Bulk API 2.0 destination
  • Tableau CRM destination
streamsets-datacollector-stats-lib

StreamSets Control Hub requires that the statistics stage library be installed on each Data Collector.

streamsets-datacollector-tensorflow-lib For TensorFlow.

Includes the TensorFlow Evaluator processor.

streamsets-datacollector-thycotic-credentialstore-lib For the Thycotic Secret Server credential store.
streamsets-datacollector-vault-credentialstore-lib For the Hashicorp Vault credential store.
streamsets-datacollector-wholefile-transformer-lib Includes the Whole File Transformer processor.
streamsets-datacollector-windows-lib

For Windows. Includes the Windows Event Log origin.

Enterprise Stage Libraries

Enterprise stage libraries provide stages that connect to advanced external systems. Releases of Enterprise stage libraries occur separately from Data Collector releases.

The release notes for Enterprise stage libraries are available on the StreamSets Documentation page.

StreamSets provides the following Enterprise stage libraries:
Stage Library Stage Library Name Description
Azure Synapse streamsets-datacollector-azure-synapse-lib For Azure Synapse.

Includes the Azure Synapse SQL destination.

Databricks streamsets-datacollector-databricks-lib For Databricks.

Includes the Databricks Delta Lake destination and the Databricks Query executor.

Oracle streamsets-datacollector-oracle-lib For bulk loading from Oracle tables.

Includes the Oracle Bulkload origin.

Protector streamsets-datacollector-protector-lib For protecting sensitive data.

Includes a set of Protector stages. For a full list, see the Protector release notes.

Snowflake streamsets-datacollector-snowflake-lib For Snowflake.

Includes the Snowflake destination, the Snowflake File Uploader destination, and the Snowflake executor.

Microsoft SQL Server 2019 Big Data Cluster streamsets-datacollector-sql-server-bdc-lib For SQL Server 2019 Big Data Cluster.

Includes the SQL Server 2019 BDC Multitable Consumer origin and the SQL Server 2019 BDC Bulk Loader destination.