• A
    • additional drivers
      • installing through Cloudera Manager[1]
    • additional properties
      • Kafka Consumer[1]
      • MapR Streams Producer[1]
    • ADLS Gen1 destination
    • ADLS Gen1 File Metadata executor
      • event generation[1]
      • overview[1]
    • ADLS Gen1 origin
    • ADLS Gen2 destination
    • ADLS Gen2 File Metadata executor
      • event generation[1]
      • overview[1]
    • ADLS Gen2 origin
      • overview[1]
      • retrieve configuration details[1]
    • ADLS stages
      • local pipeline prerequisites[1]
    • advanced options
      • pipelines and stages[1]
    • Aerospike destination
    • aggregated statistics
    • Aggregate processor
      • shuffling of data[1]
    • alerts and rules
    • alert webhook
    • Amazon EMR EMR[1]
    • Amazon Redshift destination
      • AWS credentials and write requirements[1]
      • installing the JDBC driver[1]
    • Amazon S3 destination
      • bucket[1]
      • event generation[1]
      • object names[1]
      • overview[1][2]
      • overwrite partition prerequisite[1]
      • partition prefix[1]
      • server-side encryption[1]
    • Amazon S3 destinations
    • Amazon S3 executor
      • event generation[1]
      • overview[1]
    • Amazon S3 origin
      • common prefix and prefix pattern[1]
      • data formats[1]
      • event generation[1]
      • including metadata[1]
      • overview[1][2]
    • Amazon S3 stages
      • authentication method[1]
      • enabling security[1]
      • local pipeline prerequisites[1]
    • Amazon SQS Consumer origin
    • Amazon stages
      • authentication method[1]
      • enabling security[1]
    • Amazon Web Services
      • StreamSets for Databricks[1]
    • authentication
    • authentication method
    • authentication tokens
      • unregistered[1]
    • authoring
      • Data Collectors[1]
    • available features
      • Spark versions[1]
    • AWS Fargate with EKS
      • provisioned Data Collectors[1]
    • AWS Secrets Manager
    • AWS Secrets Manager access
    • Azure
      • StreamSets for Databricks[1]
    • Azure Data Lake Storage (Legacy) destination
      • event generation[1]
      • overview[1]
    • Azure Data Lake Storage Gen1 destination
      • event generation[1]
      • overview[1]
    • Azure Data Lake Storage Gen1 origin
    • Azure Data Lake Storage Gen2 destination
      • event generation[1]
      • overview[1]
    • Azure Data Lake Storage Gen2 origin
    • Azure Event Hub Producer destination
    • Azure Event Hubs destination
      • prerequisites[1]
    • Azure Event Hubs origin
    • Azure IoT/Event Hub Consumer origin
      • overview[1]
      • resetting the origin in Event Hub[1]
    • Azure IoT Hub Producer destination
    • Azure Key Vault
    • Azure Key Vault access
    • Azure Synapse SQL destination
      • Azure Synapse connection[1]
      • copy statement connection[1]
      • creating new tables[1]
      • data drift handling[1]
      • enable container ac[1]
      • install the stage library[1]
      • multiple tables[1]
      • overview[1]
      • prepare the Azure Synapse instance[1]
      • prepare the staging area[1]
      • prerequisites[1]
      • staging connection[1]
      • supported versions[1]
  • B
    • Base64 Field Decoder processor
    • Base64 Field Encoder processor
    • Base64 functions
    • batch pipelines
    • batch size and wait time
    • batch strategy
      • JDBC Multitable Consumer origin[1]
    • branching
      • streams in a pipeline[1]
    • bucket
      • Amazon S3 destination[1]
    • bulk edit mode
  • C
    • caching
      • for origins and processors[1]
    • case study
      • batch pipelines[1]
      • streaming pipelines[1]
    • Cassandra destination
    • category functions
      • credit card numbers[1]
      • description[1]
      • email address[1]
      • phone numbers[1]
      • social security numbers[1]
      • zip codes[1]
    • CDC processing
      • CRUD-enabled destinations[1]
      • overview[1]
      • stages enabled for CDC[1]
      • use cases[1]
    • cipher suites
      • defaults and configuration[1]
    • Cloudera Manager
      • installing additional drivers[1]
      • installing external libraries[1]
    • cloud service provider
      • Amazon Web Services[1]
      • Azure[1]
      • Azure HDInsight[1]
      • Google Cloud Platform[1]
      • installation[1]
    • cluster
    • cluster batch mode
    • cluster compatibility matrix
      • installation requirements[1]
    • cluster EMR batch mode
    • cluster mode
      • batch[1]
      • configuration for Kafka[1]
      • configuration for Kafka on YARN[1]
      • description[1]
      • EMR batch[1]
      • streaming[1]
    • cluster streaming mode
    • cluster YARN streaming mode
      • configuration requirements[1]
    • CoAP Client destination
    • CoAP Server origin
    • command line interface
      • jks-credentialstore command[1]
      • jks-cs command, deprecated[1]
      • stagelib-cli command[1]
    • common tarball install
    • compression formats
      • read by origins and processors[1]
    • conditions
      • Delta Lake destination[1]
    • connections
    • control characters
      • removing from data[1]
    • Control Hub
      • aggregated statistics[1]
      • overview[1]
    • Control Hub API processor
    • Control Hub configuration files
      • storing passwords and other sensitive values[1]
    • core RPM install
      • installing additional libraries[1]
    • core tarball install
    • Couchbase destination
    • Couchbase Lookup processor
    • credential functions
    • credentials
    • credential stores
    • Cron Scheduler origin
    • cross join
      • Join processor[1]
    • CRUD operation
      • Databricks Delta Lake destination[1]
      • Google BigQuery (Enterprise) destination[1]
      • JDBC Producer[1]
      • Snowflake destination[1]
    • CSV parser
      • delimited data format[1]
    • custom delimiters
      • text data format[1]
    • custom drivers external libraries[1]
    • custom schemas
      • application to JSON and delimited data[1]
      • DDL schema format[1]
      • error handling[1]
      • JSON schema format[1]
      • origins[1]
    • custom stages
    • CyberArk
      • credential store[1]
    • CyberArk access
  • D
    • database versions tested
      • Teradata Consumer origin[1]
    • Databricks
    • Databricks Delta Lake destination
      • CRUD operation[1]
      • install the stage library[1]
      • overview[1]
      • prerequisites[1]
      • solution[1]
      • solution for change capture data[1]
      • supported versions[1]
    • Databricks init scripts
      • access keys for ABFSS[1]
    • Databricks Job Launcher executor
      • event generation[1]
      • overview[1]
    • Databricks ML Evaluator processor
    • Databricks pipelines
      • job details[1]
      • provisioned cluster[1]
      • staging directory[1]
    • Databricks Query executor
      • event generation[1]
      • install the stage library[1]
      • overview[1]
      • prerequisites[1]
    • Data Collector
      • data types[1]
      • delete unregistered tokens[1]
      • disconnected mode[1]
      • environment variables[1]
      • expression language[1]
      • Java Security Manager[1]
      • Monitor mode[1]
      • resource thresholds[1][2]
      • Security Manager[1]
      • supported systems[1]
      • viewing and downloading log data[1]
    • Data Collector configuration
      • for sending email[1]
      • overview[1]
    • Data Collector configuration file
      • configuring[1]
      • enabling Kerberos authentication[1]
    • Data Collector configuration properties
      • storing passwords and other sensitive values[1]
    • Data Collector Edge
      • configuration file[1]
      • customizing[1]
      • description[1]
    • Data Collector environment
    • Data Collector pipelines
      • failing over[1]
    • Data Collectors
    • Data Collector UI
      • Edit mode[1]
      • overview[1]
      • pipelines view on the Home page[1]
      • Preview mode[1]
    • data drift functions
    • dataflow triggers
      • overview[1]
      • TensorFlow Evaluator processor event generation[1]
      • Windowing Aggregator processor event generation[1]
    • dataflow trigger solution
      • Apache Sqoop replacement (batch loading to Hadoop)[1]
      • Drift Synchronization Solution for Hive with Impala[1]
      • event storage[1]
      • HDFS avro to parquet[1]
      • output file management[1]
      • sending email[1]
      • stop the pipeline[1]
    • data formats
      • Amazon S3[1]
      • Excel[1]
      • Kafka Consumer[1]
      • Kafka Producer destinations[1]
    • data generation functions
    • Data Generator processor
    • Data Parser processor
    • data preview
    • Dataproc
    • data rules and alerts
    • datetime variables
      • in the expression language[1]
    • Delay processor
    • delimited data
    • delimited data format
    • delimited data functions
    • delimiter element
      • using with XML namespaces[1]
    • delivery guarantee
      • pipeline property[1]
    • Delta Lake
    • Delta Lake destination
      • overwrite condition[1]
      • Overwrite Data write mode[1]
    • Delta Lake Lookup processor
    • Delta Lake origin
    • deployments
      • expose as service[1]
      • Horizontal Pod Autoscaler[1]
      • Ingress[1]
      • labels[1]
      • YAML specification[1]
    • deprecated functionality
    • destinations
      • ADLS G1[1]
      • ADLS G2[1]
      • Aerospike[1]
      • Amazon S3[1][2]
      • Azure Data Lake Storage (Legacy)[1]
      • Azure Data Lake Storage Gen1[1]
      • Azure Data Lake Storage Gen2[1]
      • Azure Event Hub Producer[1]
      • Azure IoT Hub Producer[1]
      • Azure Synapse SQL[1]
      • Cassandra[1]
      • CoAP Client[1]
      • Couchbase[1]
      • CRUD-enabled[1]
      • Databricks Delta Lake[1]
      • Elasticsearch[1]
      • File[1]
      • Google BigQuery[1]
      • Google BigQuery (Enterprise)[1]
      • Google Bigtable[1]
      • Google Cloud Storage[1]
      • Google Pub/Sub Publisher[1]
      • GPSS Producer[1]
      • Hadoop FS[1]
      • HBase[1]
      • Hive[1]
      • Hive Metastore[1]
      • Hive Streaming[1]
      • HTTP Client[1]
      • InfluxDB[1]
      • InfluxDB 2.x[1]
      • JDBC[1]
      • JDBC Producer[1]
      • JMS Producer[1]
      • Kafka[1]
      • Kafka Producer[1]
      • Kinesis Firehose[1]
      • Kinesis Producer[1]
      • KineticaDB[1]
      • Kudu[1]
      • Local FS[1]
      • MapR DB[1]
      • MapR DB JSON[1]
      • MapR FS[1]
      • MapR Streams Producer[1]
      • MemSQL Fast Loader[1]
      • microservice[1]
      • MongoDB[1]
      • MQTT Publisher[1]
      • Named Pipe[1]
      • overview[1]
      • Pulsar Producer[1]
      • RabbitMQ Producer[1]
      • record based writes[1]
      • Redis[1]
      • Salesforce[1]
      • SDC RPC[1]
      • Send Response to Origin[1]
      • SFTP/FTP/FTPS Client[1]
      • Snowflake[1]
      • Snowflake File Uploader[1]
      • Solr[1]
      • Splunk[1]
      • SQL Server 2019 BDC Bulk Loader[1]
      • SQL Server 2019 BDC Multitable Consumer[1]
      • supported data formats[1]
      • Syslog[1]
      • Tableau CRM[1]
      • To Error[1]
      • Trash[1]
      • WebSocket Client[1]
    • Dev Data Generator origin
    • Dev Random Error processor
    • Dev Random Source origin
    • Dev Raw Data Source origin
    • Dev Record Creator processor
    • directories
    • Directory origin
      • batch size and wait time[1]
      • event generation[1]
      • multithreaded processing[1]
      • overview[1]
      • read order[1]
    • directory path
      • File destination[1]
      • File origin[1]
    • directory templates
    • disconnected mode
    • display settings
    • Drift Synchronization Solution for Hive
      • overview[1]
      • Parquet case study[1]
    • Drift Synchronization Solution for PostgreSQL
    • drivers
      • installing additional for stages[1]
      • JDBC destination[1]
      • JDBC Lookup processor[1]
      • JDBC origin[1]
      • JDBC Table origin[1]
      • MySQL JDBC Table origin[1]
      • Oracle JDBC Table origin[1]
    • drivers external libraries[1]
    • driver versions tested
      • Hive Query executor[1]
      • Teradata Consumer origin[1]
  • E
    • Edge Data Collectors
    • edge pipelines
    • Elasticsearch destination
    • Elasticsearch origin
    • email
      • Data Collector configuration[1]
    • email addresses
      • configuring for alerts[1]
    • Email executor
    • EMR
      • base URI and staging directory[1]
      • cluster[1]
      • Kerberos stage limitation[1]
    • enabling TLS
      • in SDC RPC pipelines[1]
    • Encrypt and Decrypt Fields processor
    • engines
      • labels[1]
      • resource thresholds[1]
    • Enterprise stage libraries
    • environment variable
      • STREAMSETS_LIBRARIES_EXTRA_DIR[1][2]
    • environment variables
    • error handling
      • error record description[1]
    • error record
      • description and version[1]
    • error records
    • event framework
      • Amazon S3 destination event generation[1]
      • Azure Data Lake Storage destination event generation[1]
      • Azure Data Lake Storage Gen1 destination event generation[1]
      • Azure Data Lake Storage Gen2 destination event generation[1]
      • Google Cloud Storage destination event generation[1]
      • Hadoop FS destination event generation[1]
      • overview[1]
      • pipeline event generation[1]
      • stage event generation[1]
    • event generating stages
    • event generation
      • ADLS Gen1 File Metadata executor[1]
      • ADLS Gen2 File Metadata executor[1]
      • Amazon S3 executor[1]
      • Databricks Job Launcher executor[1]
      • Databricks Query executor[1]
      • Google Cloud Storage executor[1]
      • Groovy Evaluator processor[1]
      • Groovy Scripting origin[1]
      • HDFS File Metadata executor[1]
      • Hive Metastore destination[1]
      • Hive Query executor[1]
      • JavaScript Evaluator[1]
      • JavaScript Scripting origin[1]
      • JDBC Query executor[1]
      • Jython Evaluator[1]
      • Jython Scripting origin[1]
      • Local FS destination[1]
      • MapReduce executor[1]
      • MapR FS destination[1]
      • MapR FS File Metadata executor[1]
      • SFTP/FTP/FTPS Client destination[1]
      • Snowflake executor[1]
      • Snowflake File Uploader destination[1]
      • Spark executor[1]
      • SQL Server CDC Client origin[1]
      • SQL Server Change Tracking[1]
    • event records
      • JDBC Query Consumer origin[1]
    • events
    • event types
      • subscriptions[1]
    • Excel data format
    • execution engines
    • execution mode
      • pipelines[1]
      • standalone and cluster modes[1]
    • executors
      • ADLS Gen1 File Metadata[1]
      • ADLS Gen2 File Metadata[1]
      • Amazon S3[1]
      • Databricks Job Launcher[1]
      • Databricks Query[1]
      • Email[1]
      • Google Cloud Storage[1]
      • HDFS File Metadata[1]
      • Hive Query[1]
      • JDBC Query[1]
      • MapReduce[1]
      • MapR FS File Metadata[1]
      • overview[1]
      • Pipeline Finisher[1]
      • SFTP/FTP/FTPS Client[1]
      • Shell[1]
      • Snowflake[1]
      • Spark[1][2]
    • expression completion
    • Expression Evaluator processor
    • expression language
      • datetime variables[1]
      • field path expressions[1]
      • functions[1]
      • overview[1]
    • external libraries
      • installing additional for stages[1]
      • installing for stages[1]
      • installing through Cloudera Manager[1]
      • manual installation[1]
      • Package Manager installation[1]
      • set up external directory[1]
      • stage properties installation[1][2]
  • F
    • failover
      • Data Collector pipeline[1]
      • Transformer pipeline[1]
    • failover retries
      • Data Collector jobs[1]
      • Transformer jobs[1]
    • field attributes
    • Field Flattener processor
    • field functions
    • Field Hasher processor
      • overview[1]
      • using a field separator[1]
    • Field Mapper
    • Field Masker processor
    • Field Merger processor
    • Field Order
    • field path expressions
      • overview[1]
      • supported stages[1]
    • Field Pivoter
    • Field Remover processor
    • Field Renamer processor
    • Field Replacer processor
    • fields
    • field separators
      • Field Hasher processor[1]
    • Field Splitter processor
    • Field Type Converter processor
    • field XPaths and namespaces
    • Field Zip processor
    • FIFO
      • Named Pipe destination[1]
    • File destination
      • directory path[1]
      • overview[1]
      • overwrite partition prerequisite[1]
    • file functions
    • file name expression
      • writing whole files[1]
    • File origin
    • file processing
      • for File Tail origin[1]
    • File Tail origin
      • event generation[1]
      • file processing[1]
      • overview[1]
    • Filter processor
    • first file to process
      • File Tail origin[1]
    • Flume destination
    • fragments
      • pipeline fragments[1]
    • full outer join
      • Join processor[1]
    • functions
      • Base64 functions[1]
      • category functions[1]
      • credential[1]
      • credential functions[1]
      • data drift functions[1]
      • data generation[1]
      • delimited data[1]
      • error record functions[1]
      • field functions[1]
      • file functions[1]
      • in the expression language[1]
      • job functions[1]
      • math functions[1]
      • miscellaneous functions[1][2]
      • pipeline functions[1]
      • record functions[1]
      • string functions[1]
      • time functions[1]
  • G
    • generated record
      • PostgreSQL CDC Client[1]
    • generated records
    • generators
      • support bundles[1]
    • Geo IP processor
      • overview[1]
      • supported databases[1]
    • Google BigQuery (Enterprise) destination
      • CRUD operation[1]
      • overview[1]
      • supported versions[1]
    • Google BigQuery destination
    • Google BigQuery origin
      • event generation[1]
      • overview[1]
    • Google Big Query origin
    • Google Bigtable destination
    • Google Cloud stages
      • credentials[1]
      • credentials in a property[1]
      • credentials in file[1]
      • enabling security[1]
    • Google Cloud Storage destination
      • event generation[1]
      • object names[1]
      • overview[1]
      • partition prefix[1]
      • time basis and partition prefixes[1]
    • Google Cloud Storage executor
      • event generation[1]
      • overview[1]
    • Google Cloud Storage origin
      • event generation[1]
      • overview[1]
    • Google Pub/Sub Publisher destination
    • Google Pub/Sub Subscriber origin
    • Google Secret Manager
    • GPSS Producer destination
      • CRUD operation[1]
      • overview[1]
      • prerequisites[1]
      • supported versions[1]
    • grok patterns
    • Groovy Evaluator processor
      • generating events[1]
      • overview[1]
    • Groovy Scripting origin
      • event generation[1]
      • overview[1]
    • gRPC Client origin
  • H
    • Hadoop FS destination
      • directory templates[1]
      • event generation[1]
      • late record handling[1]
      • overview[1]
      • time basis[1]
    • Hadoop FS origin
      • overview[1]
      • reading from Amazon S3[1]
    • Hadoop FS Standalone origin
    • Hadoop impersonation mode
    • Hadoop YARN
      • cluster[1]
      • directory requirements[1]
      • driver requirement[1]
      • impersonation[1]
      • Kerberos authentication[1]
    • Hashicorp Vault
      • credential store[1]
    • HBase destination
    • HBase Lookup processor
    • HDFS File Metadata executor
      • event generation[1]
      • overview[1]
    • heap size
    • help
      • local or hosted[1]
    • Hive destination
    • Hive Drift Solution Drift Synchronization Solution for Hive[1]
    • Hive Metadata executor
    • Hive Metadata processor
    • Hive Metastore destination
      • event generation[1]
      • overview[1]
    • Hive origin
    • Hive Query executor
      • event generation[1]
      • installing the Impala JDBC driver[1]
      • overview[1]
      • solution[1]
      • tested drivers[1]
    • Hive Streaming destination
    • Home page
      • Data Collector UI[1]
    • Horizontal Pod Autoscaler
      • associating with deployment[1]
    • HTTP Client destination
    • HTTP Client origin
    • HTTP Client processor
    • HTTP origins
    • HTTP Router processor
    • HTTP Server origin
    • HTTPS protocol
  • I
    • Impala JDBC driver
      • installing for the Hive Query executor[1]
    • impersonation mode
      • enabling for the Shell executor[1]
      • for Hadoop stages[1]
      • Hadoop[1]
    • including metadata
      • Amazon S3 origin[1]
    • InfluxDB 2.x destination
    • InfluxDB destination
    • Ingress
      • associating with deployment[1]
    • initial table order strategy
      • JDBC Multitable Consumer origin[1]
    • inner join
      • Join processor[1]
    • input
    • installation
      • Amazon Web Services[1]
      • Azure[1]
      • Azure HDInsight[1]
      • cloud service provider[1]
      • cluster[1]
      • common installation[1]
      • common tarball[1]
      • core tarball[1]
      • core with additional libraries[1]
      • Google Cloud Platform[1]
      • local[1]
      • manual start[1]
      • requirements[1][2]
      • Scala, Spark, and Java JDK requirements[1]
      • service start[1][2]
    • installation package
      • choosing Scala version[1]
    • installation requirements
      • cluster compatibility matrix[1]
  • J
    • Java configuration options
      • heap size[1]
      • memory strategy[1]
    • Java keystore
    • JavaScript Evaluator
      • scripts for delimited data[1]
    • JavaScript Evaluator processor
      • generating events[1]
      • overview[1]
    • JavaScript Scripting origin
      • event generation[1]
      • overview[1]
    • Java Security Manager
      • Data Collector[1]
    • JDBC destination
      • driver installation[1]
      • overview[1]
    • JDBC Lookup processor
    • JDBC Multitable Consumer origin
      • batch strategy[1]
      • event generation[1]
      • initial table order strategy[1]
      • JDBC record header attributes[1]
      • multiple offset values[1]
      • multithreaded processing for partitions[1]
      • multithreaded processing for tables[1]
      • multithreaded processing types[1]
      • non-incremental processing[1]
      • offset column and value[1]
      • overview[1]
      • partition processing requirements[1]
      • schema, table name, and exclusion pattern[1]
      • table configuration[1]
      • understanding the processing queue[1]
    • JDBC Producer destination
    • JDBC Query Consumer origin
      • event generation[1]
      • event records[1]
      • overview[1]
    • JDBC Query executor
      • event generation[1]
      • overview[1]
    • JDBC Query origin
      • driver installation[1]
      • overview[1]
    • JDBC record header attributes
      • JDBC Multitable Consumer[1]
    • JDBC Table origin
      • driver installation[1]
      • overview[1]
    • JDBC Tee processor
    • JMS Consumer origin
    • JMS Producer destination
    • job functions
    • job instances
    • jobs
      • balancing[1]
      • Data Collector failover retries[1]
      • Data Collector pipeline failover[1]
      • editing[1]
      • error handling[1]
      • filtering[1]
      • labels[1][2]
      • offsets[1]
      • pipeline instances[1]
      • resetting the origin[1]
      • runtime parameters[1]
      • scaling out[1]
      • scaling out automatically[1]
      • searching[1]
      • status[1]
      • synchronizing[1]
      • tags[1]
      • templates[1]
      • time series analysis[1]
      • Transformer failover retries[1]
      • Transformer pipeline failover[1]
    • job templates
      • attached job instances[1]
      • detached job instances[1]
      • editing[1]
      • filtering[1]
      • searching[1]
      • tags[1]
    • Join processor
      • cross join[1]
      • full outer join[1]
      • inner join[1]
      • left anti join[1]
      • left outer join[1]
      • left semi join[1]
      • overview[1]
      • right anti join[1]
      • right outer join[1]
      • shuffling of data[1]
    • JSON Generator processor
    • JSON Parser processor
    • JVM memory strategy
    • Jython Evaluator
      • scripts for delimited data[1]
    • Jython Evaluator processor
      • generating events[1]
      • overview[1]
    • Jython Scripting origin
      • event generation[1]
      • overview[1]
  • K
    • Kafka Consumer origin
      • additional properties[1]
      • data formats[1]
      • message keys[1]
      • overview[1]
      • storing message keys[1]
    • Kafka destination
    • Kafka message keys
    • Kafka Multitopic Consumer origin
      • message keys[1]
      • multithreaded processing[1]
      • storing message keys[1]
    • Kafka origin
    • Kafka Producer
      • message keys[1]
      • passing message keys to Kafka[1]
    • Kafka Producer destination
      • data formats[1]
      • overview[1]
      • partition expression[1]
      • partition strategy[1]
      • runtime topic resolution[1]
    • Kafka stages
      • enabling SASL[1]
      • enabling SASL on SSL/TLS[1]
      • enabling security[1]
      • enabling SSL/TLS security[1]
      • providing Kerberos credentials[1]
      • security prerequisite tasks[1]
      • using keytabs in a credential store[1]
    • Kerberos
      • credentials for Kafka stages[1]
      • enabling[1]
    • Kerberos authentication
      • enabling for the Data Collector[1]
      • Hadoop YARN cluster[1]
    • keystore
      • properties and defaults[1]
      • remote[1]
    • Kinesis Consumer origin
      • overview[1]
      • resetting the origin[1]
    • Kinesis Firehose destination
    • Kinesis Producer destination
    • KineticaDB destination
    • Kudu destination
    • Kudu Lookup processor
    • Kudu origin
  • L
    • labels
    • late record handling
    • launch Data Collector
    • LDAP
      • authentication[1]
    • LDAP authentication
    • left anti join
      • Join processor[1]
    • left outer join
      • Join processor[1]
    • left semi join
      • Join processor[1]
    • list-map root field type
      • delimited data[1]
    • list root field type
      • delimited data[1]
    • load methods
      • Snowflake destination[1]
    • Local FS destination
      • event generation[1]
      • overview[1]
    • local pipelines
    • log files
    • log formats
    • log level
    • Log Parser processor
    • logs
      • modifying log level[1]
      • pipelines[1]
      • Spark driver[1]
      • Transformer[1]
    • lookups
    • ludicrous mode
      • optimizing pipeline performance[1]
  • M
    • MapR
      • prerequisites[1]
    • MapR cluster
      • dynamic allocation requirement[1]
    • MapR clusters
      • Hadoop impersonation prerequisite[1]
      • pipeline start prerequisite[1]
      • prerequisite tasks[1]
    • MapR DB CDC origin
      • overview[1]
      • record header attributes[1]
    • MapR DB destination
    • MapR DB JSON destination
    • MapR DB JSON origin
    • MapReduce executor
    • MapR FS destination
      • event generation[1]
      • overview[1]
      • record header attributes for record-based writes[1]
    • MapR FS File Metadata executor
      • event generation[1]
      • overview[1]
    • MapR FS origin
    • MapR FS Standalone origin
      • event generation[1]
      • overview[1]
    • MapR origins
    • MapR Streams Consumer origin
      • overview[1]
      • processing all unread data[1]
    • MapR Streams Producer destination
      • additional properties[1]
      • overview[1]
    • math functions
    • maximum record size properties
    • MemSQL Fast Loader destination
      • driver installation[1]
      • install the stage library[1]
      • overview[1]
      • prerequisites[1]
      • supported versions[1]
    • merging data[1]
    • messages
      • processing NetFlow messages[1]
    • microservice pipelines
    • miscellaneous functions
    • MLeap Evaluator processor
    • MongoDB destination
    • MongoDB Lookup processor
    • MongoDB Oplog origin
      • generated records[1]
      • overview[1]
    • MongoDB origin
      • event generation[1]
      • offset field[1]
      • overview[1]
      • read preference[1]
    • monitoring
      • data rules and alerts[1]
      • multithreaded pipelines[1]
      • overview[1]
      • snapshots of data[1]
      • Spark web UI[1]
    • MQTT Publisher destination
    • MQTT Subscriber origin
    • multithreaded origins
      • HTTP Server[1]
      • JDBC Multitable Consumer[1]
      • Teradata Consumer[1]
      • WebSocket Server[1]
    • multithreaded pipeline
    • multithreaded pipelines
      • origins[1]
      • overview[1]
      • tuning threads and pipeline runners[1]
    • MySQL Binary Log origin
      • generated records[1]
      • overview[1]
    • MySQL JDBC Table origin
      • driver installation[1]
      • overview[1]
  • N
    • Named Pipe destination
    • namespaces
      • using with delimiter elements[1]
      • using with XPath expressions[1]
    • NetFlow 9
      • configuring template cache limitations[1]
      • generated records[1]
    • NetFlow messages
    • NiFi HTTP Server
    • non-incremental processing
      • JDBC Multitable Consumer[1]
    • notifications
      • pipeline state changes[1]
    • Number of Threads
      • Directory origin[1]
      • JDBC Multitable Consumer[1]
      • Kafka Multitopic Consumer origin[1]
  • O
    • offset
      • resetting for Kinesis Consumer[1]
    • offset column and value
      • JDBC Multitable Consumer[1]
    • offsets
      • jobs[1]
      • resetting for the pipeline[1]
      • skipping tracking[1]
    • Omniture origin
    • OPC UA Client origin
    • Oracle Bulkload origin
      • driver installation[1]
      • event generation[1]
      • install the stage library[1]
      • prerequisites[1]
      • supported versions[1]
    • Oracle CDC Client origin
      • CDC header attributes[1]
      • CRUD header attributes[1]
      • event generation[1]
      • generated records and Parse SQL Query[1]
      • overview[1]
    • Oracle JDBC Table origin
      • driver installation[1]
      • overview[1]
    • orchestration pipelines
    • orchestration record
    • organizations
      • admin[1]
      • global configurations[1]
      • system[1]
      • system administrator configuration[1]
    • origins
      • ADLS Gen1[1]
      • ADLS Gen2[1]
      • Amazon S3[1][2]
      • Amazon SQS Consumer origin[1]
      • Azure Data Lake Storage Gen1[1]
      • Azure Data Lake Storage Gen2[1]
      • Azure Event Hubs[1]
      • Azure IoT/Event Hub Consumer[1]
      • batch size and wait time[1]
      • caching[1]
      • CDC-enabled origins[1]
      • CoAP Server[1]
      • Cron Scheduler[1]
      • Delta Lake[1]
      • development origins[1]
      • Directory[1]
      • Elasticsearch[1]
      • File[1]
      • File Tail[1]
      • for microservice pipelines[1]
      • for multithreaded pipelines[1]
      • Google BigQuery[1]
      • Google Big Query[1]
      • Google Cloud Storage[1]
      • Google Pub/Sub Subscriber[1]
      • Groovy Scripting[1]
      • gRPC Client[1]
      • Hadoop FS[1]
      • Hadoop FS Standalone origin[1]
      • Hive[1]
      • HTTP Client[1]
      • HTTP Server[1]
      • JavaScript Scripting[1]
      • JDBC Multitable Consumer[1]
      • JDBC Query[1]
      • JDBC Query Consumer[1]
      • JDBC Table[1]
      • JMS Consumer[1]
      • Jython Scripting[1]
      • Kafka[1]
      • Kafka Consumer[1]
      • Kafka Multitopic Consumer[1]
      • Kinesis Consumer[1]
      • Kudu origin[1]
      • MapR DB CDC[1]
      • MapR DB JSON[1]
      • MapR FS[1]
      • MapR FS Standalone origin[1]
      • MapR Multitopic Streams Consumer[1]
      • MapR Streams Consumer[1]
      • maximum record size[1]
      • MongoDB Oplog[1]
      • MongoDB origin[1]
      • MQTT Subscriber[1]
      • multiple[1]
      • MySQL Binary Log[1]
      • MySQL JDBC Table[1]
      • NiFi HTTP Server[1]
      • Omniture[1]
      • OPC UA Client[1]
      • Oracle CDC Client[1]
      • Oracle JDBC Table[1]
      • overview[1][2]
      • PostgreSQL CDC Client[1]
      • PostgreSQL JDBC Table[1]
      • Pulsar Consumer[1]
      • RabbitMQ Consumer[1]
      • reading and processing XML data[1]
      • Redis Consumer[1]
      • resetting the origin[1]
      • REST Service[1]
      • Salesforce[1]
      • SAP HANA Query Consumer[1]
      • schema inference[1]
      • SDC RPC[1]
      • SFTP/FTP/FTPS Client[1]
      • Snowflake[1]
      • SQL Server CDC Client[1]
      • SQL Server Change Tracking[1]
      • SQL Server JDBC Table[1]
      • Start Jobs[1]
      • Start Pipelines[1]
      • supported data formats[1]
      • System Metrics[1]
      • TCP Server[1]
      • Teradata Consumer[1]
      • test origin[1]
      • UDP Multithreaded Source[1]
      • UDP Source[1]
      • WebSocket Client[1]
      • WebSocket Server[1]
      • Whole Directory[1]
      • Windows Event Log[1]
    • output
    • Overwrite Data write mode
      • Delta Lake destination[1]
  • P
    • Package Manager
      • installing additional libraries[1]
    • parameters
    • partitioning
    • partition prefix
      • Amazon S3 destination[1]
      • Google Cloud Storage destination[1]
    • partition processing requirements
      • JDBC Multitable Consumer[1]
    • partitions
    • partition strategy
      • Kafka Producer[1]
    • pass records
      • HTTP Client processor per-status actions or timeouts[1]
    • passwords
    • performing lookups
    • permissions
      • disabling enforcement[1]
      • enabling enforcement[1]
      • transferring[1]
      • transferring overview[1]
    • pipeline canvas
      • installing additional libraries[1]
    • pipeline design
      • control character removal[1]
      • delimited data root field type[1]
      • development stages[1]
      • preconditions[1]
      • replicating streams[1]
      • required fields[1]
      • SDC Record data format[1]
    • Pipeline Designer
      • authoring Data Collectors[1]
      • creating pipelines and pipeline fragments[1]
      • previewing pipelines[1]
      • validating pipelines[1]
    • pipeline events
    • Pipeline Finisher
    • Pipeline Finisher executor
      • overview[1]
      • related event generating stages[1]
    • pipeline fragments
    • pipeline functions
    • pipeline labels
      • for pipelines and fragments[1]
    • pipeline permissions
    • pipeline properties
      • delivery guarantee[1]
      • rate limit[1]
      • runtime parameters[1]
    • pipelines
      • advanced options[1]
      • aggregated statistics for Control Hub[1]
      • configuring[1]
      • edge devices[1]
      • error record handling[1]
      • event generation[1]
      • expression completion[1]
      • labels[1]
      • logs[1]
      • merging streams[1]
      • microservice[1]
      • monitoring[1]
      • number of instances[1]
      • offsets[1]
      • orchestration[1]
      • pipeline labels[1]
      • redistributing[1]
      • resetting the origin[1]
      • retry attempts upon error[1]
      • runtime parameters[1]
      • sample[1]
      • scaling out[1]
      • scaling out automatically[1]
      • SDC RPC pipelines[1]
      • sharing[1]
      • sharing and permissions[1]
      • Spark configuration[1]
      • Spark executors[1]
      • stage library match requirement[1]
      • status[1]
      • using webhooks[1]
    • pipeline state
    • pipeline state notifications
    • PK Chunking
      • configuring for the Salesforce origin[1]
    • PMML Evaluator processor
    • PostgreSQL CDC Client origin
      • CDC record header attributes[1]
      • generated record[1]
      • overview[1]
    • PostgreSQL Drift Solution Drift Synchronization Solution for PostgreSQL[1]
    • PostgreSQL JDBC Table origin
    • PostgreSQL Metadata processor
    • preconditions
    • preprocessing script
    • prerequisites
      • ADLS and Amazon S3 stages[1]
      • Azure Event Hubs destination[1]
      • Azure Event Hubs origin[1]
      • for the Scala processor and preprocessing script[1]
      • PySpark processor[1]
      • Snowflake destination[1]
      • Snowflake executor[1]
      • Snowflake File Uploader destination[1]
    • preview
    • previewing data data preview[1]
    • processing mode
      • HTTP Client[1]
      • ludicrous mode versus standard[1]
    • processing queue
      • JDBC Multitable Consumer[1]
    • processors
      • Base64 Field Decoder[1]
      • Base64 Field Encoder[1]
      • caching[1]
      • Control Hub API[1]
      • Couchbase Lookup[1]
      • Databricks ML Evaluator[1]
      • Data Generator[1]
      • Data Parser[1]
      • Delay processor[1]
      • Delta Lake Lookup[1]
      • development processors[1]
      • Encrypt and Decrypt Fields[1]
      • Expression Evaluator[1]
      • Field Flattener[1]
      • Field Hasher[1]
      • Field Mapper[1]
      • Field Masker[1]
      • Field Merger[1]
      • Field Order[1]
      • Field Pivoter[1]
      • Field Remover[1]
      • Field Renamer[1]
      • Field Replacer[1]
      • Field Splitter[1]
      • Field Type Converter[1]
      • Field Zip[1]
      • Filter[1]
      • Geo IP[1]
      • Groovy Evaluator[1]
      • HBase Lookup[1]
      • Hive Metadata[1]
      • HTTP Client[1]
      • HTTP Router[1]
      • JavaScript Evaluator[1]
      • JDBC Lookup[1][2]
      • JDBC Tee[1]
      • Join[1]
      • JSON Generator[1]
      • JSON Parser[1]
      • Jython Evaluator[1]
      • Kudu Lookup[1]
      • Log Parser[1]
      • MLeap Evaluator[1]
      • MongoDB Lookup[1]
      • overview[1]
      • PMML Evaluator[1]
      • PostgreSQL Metadata[1]
      • PySpark[1]
      • Record Deduplicator[1]
      • Redis Lookup[1]
      • referencing fields[1]
      • Repartition[1]
      • Salesforce Lookup[1]
      • Scala[1]
      • Schema Generator[1]
      • shuffling of data[1]
      • Slowly Changing Dimensions[1]
      • Snowflake Lookup[1]
      • Spark Evaluator[1]
      • Spark SQL Expression[1]
      • Spark SQL Query[1]
      • SQL Parser[1]
      • Start Jobs[1]
      • Start Pipelines[1]
      • Static Lookup[1]
      • Stream Selector[1][2]
      • TensorFlow Evaluator[1]
      • Value Replacer[1]
      • Wait for Jobs[1]
      • Wait for Pipelines[1]
      • Whole File Transformer[1]
      • Window[1]
      • Windowing Aggregator[1]
      • XML Flattener[1]
      • XML Parser[1]
    • protobuf data format
      • processing prerequisites[1]
    • proxy users
    • Pulsar Consumer origin
    • Pulsar Producer destination
    • PySpark processor
      • Databricks prerequisites[1]
      • EMR prerequisites[1]
      • other cluster and local pipeline prerequisites[1]
      • overview[1]
      • prerequisites[1]
    • PySpark processor requirements for provisioned Databricks clusters[1]
  • R
    • RabbitMQ Consumer origin
    • RabbitMQ Producer destinations
    • Rank processor
      • shuffling of data[1]
    • rate limit
    • read order
      • Directory origin[1]
    • Record Deduplicator processor
    • record functions
    • record header attributes
      • Amazon S3 origin[1]
      • configuring[1]
      • overview[1]
      • PostgreSQL CDC Client CDC[1]
      • record-based writes[1]
      • working with[1]
    • Redis Consumer origin
    • Redis destination
    • Redis Lookup processor
    • regular expressions
    • Repartition processor
      • coalesce by number repartition method[1]
      • overview[1]
      • repartition by field range repartition method[1]
      • repartition by number repartition method[1]
      • shuffling of data[1]
    • required fields
    • resetting the origin
      • for the Azure IoT/Event Hub Consumer origin[1]
    • resource thresholds[1][2][3][4]
    • REST Service origin
    • right anti join
      • Join processor[1]
    • right outer join
      • Join processor[1]
    • roles
      • System Administrator[1]
    • roles and permissions
    • root element
      • preserving in XML data[1]
    • rules and alerts
    • runtime parameters
    • runtime properties
    • runtime resources
  • S
    • Salesforce destination
    • Salesforce Lookup processor
    • Salesforce origin
      • aggregate functions in SOQL queries[1]
      • Bulk API with PK Chunking[1]
      • CRUD operation header attribute[1]
      • event generation[1]
      • overview[1]
      • using the SOAP and Bulk API without PK chunking[1]
    • SAML
    • sample pipelines
      • creating system samples[1]
      • system[1]
      • user-defined[1]
    • samples
      • pipeline, system[1]
    • SAP HANA Query Consumer origin
      • event generation[1]
      • overview[1]
    • Scala
      • choosing an Transformer installation package engine version[1]
    • Scala, Spark, and Java JDK requirements
      • installation[1]
    • Scala processor
    • schema
    • Schema Generator processor
    • scripts
      • preprocessing[1]
    • SDC_CONF
      • environment variable[1]
    • SDC_DATA
      • environment variable[1]
    • SDC_DIST
      • environment variable[1]
    • SDC_GROUP
      • environment variable[1]
    • SDC_LOG
      • environment variable[1]
    • SDC_RESOURCES
      • environment variable[1]
    • SDC_USER
      • environment variable[1]
    • sdc.properties
      • Data Collector configuration file[1]
    • sdc.properties file
    • sdcd-env.sh file
    • SDC Edge
    • sdc-env.sh file
    • SDC Records
    • SDC RPC destination
    • SDC RPC origins
    • SDC RPC pipelines
    • Security Manager
      • Data Collector[1]
    • sending email
      • Data Collector configuration[1]
    • Send Response to Origin destination
    • server-side encryption
      • Amazon S3 destination[1]
    • service
      • associating with deployment[1]
    • SFTP/FTP/FTPS Client destination
      • event generation[1]
      • overview[1]
    • SFTP/FTP/FTPS Client executor
    • SFTP/FTP/FTPS Client origin
      • event generation[1]
      • overview[1]
    • Shell executor
      • enabling shell impersonation mode[1]
      • overview[1]
    • shuffling
    • simple edit mode
    • single sign on
    • Slowly Changing Dimension processor
      • configuring a file dimension pipeline[1]
      • dimension types[1]
      • overview[1]
      • partitioned file dimension prerequisite[1]
      • pipeline processing[1]
      • tracking fields[1]
    • Slowly Changing Dimensions processor
    • snapshots
    • Snowflake destination
      • command load optimization[1]
      • COPY command prerequisites[1]
      • CRUD operation[1]
      • defining a role[1]
      • enabling data drift handling[1]
      • installation by Package Manager[1]
      • load methods[1]
      • MERGE command prerequisites[1]
      • overview[1]
      • prerequisites[1]
      • required privileges[1]
      • role[1]
      • Snowpipe prerequisites[1]
      • supported versions[1]
    • Snowflake executor
      • event generation[1]
      • overview[1]
      • prerequisites[1]
      • using with the Snowflake File Uploader[1]
    • Snowflake File Uploader destination
      • event generation[1]
      • overview[1]
      • prerequisites[1]
    • Snowflake Lookup processor
    • Snowflake origin
    • Snowpipe load method
      • Snowflake destination[1]
    • Solr destination
    • solutions
      • CDC to Databricks Delta Lake[1]
      • load to Databricks Delta Lake[1]
    • Spark
      • available features[1]
    • Spark cluster
      • callback URl[1]
      • Transformer URL[1]
    • Spark configuration
    • Spark Evaluator processor
    • Spark executor
      • event generation[1]
      • overview[1]
    • Spark executors
    • Spark processing
    • Spark SQL Expression processor
    • Spark SQL Query processor
    • Spark web UI
    • Splunk destination
    • SQL Parser
    • SQL Server 2019 BDC
      • cluster[1]
      • JDBC connection information[1]
      • quick start deployment script[1]
    • SQL Server 2019 BDC Bulk Loader destination
    • SQL Server 2019 BDC Multitable Consumer origin
      • event generation[1]
      • overview[1]
      • supported versions[1]
    • SQL Server CDC Client origin
      • event generation[1]
      • overview[1]
      • record header attributes[1]
    • SQL Server Change Tracking origin
      • event generation[1]
      • overview[1]
      • record header attributes[1]
    • SQL Server JDBC Table origin
    • SSL/TLS
      • configuring in stages[1]
      • Syslog destination[1]
    • stage library match requirement
      • in a pipeline[1]
    • stage library panel
      • installing additional libraries[1]
    • stages
      • advanced options[1]
      • error record handling[1]
    • staging directory
      • Databricks pipelines[1]
      • EMR pipelines[1]
    • standalone mode
    • Start Jobs origin
    • Start Jobs processor
    • Start Pipelines origin
    • Start Pipelines processor
    • Static Lookup processor
    • streaming pipelines
    • Stream Selector processor
    • STREAMSETS_LIBRARIES_EXTRA_DIR
    • StreamSets Control Hub
      • disconnected mode[1]
    • StreamSets for Databricks
      • installation on AWS[1]
      • installation on Azure[1]
    • string functions
    • subscriptions
    • support bundles
    • supported systems
    • supported versions
      • Azure Synapse Enterprise stage library[1]
      • Databricks Enterprise stage library[1]
      • Google Enterprise stage library[1]
      • GPSS Enterprise stage library[1]
      • MemSQL Enterprise stage library[1]
      • Oracle Enterprise stage library[1]
      • Snowflake Enterprise stage library[1]
      • SQL Server 2019 Big Data Cluster Enterprise Library[1]
      • Teradata Enterprise stage library[1]
    • Syslog destination
      • enabling SSL/TLS[1]
      • overview[1]
    • system
      • Data Collector[1]
      • Data Collectors[1]
    • system administrator
    • system Data Collector
      • requirements[1]
    • System Metrics origin
    • system organization
  • T
    • Tableau CRM destination
    • table configuration
      • JDBC Multitable Consumer origin[1]
    • tags
    • TCP Server origin
    • Technology Preview functionality
    • templates
    • TensorFlow Evaluator processor
      • evaluating each record[1]
      • evaluating entire batch[1]
      • event generation[1]
      • overview[1]
    • Teradata Consumer origin
      • driver installation[1]
      • event generation[1]
      • install the stage library[1]
      • overview[1]
      • prerequisites[1]
      • tested databases and drivers[1]
    • Teradata origin
      • supported versions[1]
    • test origin
    • text data format
      • custom delimiters[1]
      • processing XML with custom delimiters[1]
    • the event framework
      • Amazon S3 origin event generation[1]
      • Directory event generation[1]
      • File Tail event generation[1]
      • Google BigQuery event generation[1]
      • Google Cloud Storage origin event generation[1]
      • JDBC Multitable Consumer origin event generation[1]
      • JDBC Query Consumer origin event generation[1]
      • MapR FS Standalone event generation[1]
      • MongoDB origin event generation[1]
      • Oracle Bulkload event generation[1]
      • Oracle CDC Client event generation[1]
      • Salesforce origin event generation[1]
      • SAP HANA Query Consumer origin event generation[1]
      • SFTP/FTP/FTPS Client origin event generation[1]
      • SQL Server 2019 BDC Multitable Consumer origin event generation[1]
      • Teradata Consumer origin event generation[1]
    • third party libraries
      • installing[1]
      • installing additional for stages[1]
    • time basis
    • time basis, buckets, and partition prefixes
      • for Amazon S3 destination[1]
    • time basis and partition prefixes
      • Google Cloud Storage destination[1]
    • time functions
    • time series
    • TLS
      • configuring in stages[1]
    • To Error destination
    • tokens
      • unregistered[1]
    • topics
      • MQTT Publisher destination[1]
    • tracking fields
      • Slowly Changing Dimension processor[1]
    • Transformer
      • architecture[1]
      • description[1]
      • directories[1]
      • environment variables[1]
      • execution engine[1]
      • proxy users[1]
      • resource thresholds[1]
      • spark-submit[1]
      • starting manually[1]
      • viewing and downloading log data[1][2]
    • TRANSFORMER_GROUP
      • environment variable[1]
    • TRANSFORMER_USER
      • environment variable[1]
    • Transformer configuration files
      • protecting passwords and other sensitive values[1]
    • Transformer pipelines
      • failing over[1]
    • Transformers
    • transport protocol
      • default and configuration[1]
    • Trash destination
    • truststore
      • properties and defaults[1]
      • remote[1]
  • U
    • UDP Multithreaded Source origin
    • UDP Source origin
    • UDP Source origins
    • unregistered tokens
    • URL
      • cluster callback[1]
    • USER_LIBRARIES_DIR
      • environment variable[1]
    • user libraries
    • users
    • using Soap and BULK APIs
      • Salesforce origin[1]
  • V
    • Value Replacer processor
    • Vault access
  • W
    • Wait for Jobs processor
    • Wait for Pipelines processor
    • Wave Analytics destination Tableau CRM destination[1]
    • webhooks
      • configuring an alert webhook[1]
      • overview[1]
      • payload and parameters[1]
    • WebSocket Client destination
    • WebSocket Client origin
    • WebSocket Server origin
    • Whole Directory origin
    • whole file
      • including checksums in events[1]
    • whole file data format
      • defining transfer rate[1]
      • file access permissions[1]
      • overview[1]
    • whole files
      • file name expression[1]
    • Whole File Transformer processors
    • Windowing Aggregator processor
      • event generation[1]
      • overview[1]
    • Window processor
    • Windows Event Log origin
  • X
    • XML data
      • including field XPaths and namespaces[1]
      • predicates in XPath expressions[1]
      • preserving root element[1]
      • processing in origins and the XML Parser processor[1]
      • processing with the simplified XPath syntax[1]
      • processing with the text data format[1]
      • root element[1]
    • XML data format
      • requirement for writing XML[1]
    • XML Flattener processor
    • XML Parser processor
      • overview[1]
      • processing XML data[1]
    • XPath expression
      • using with namespaces[1]
    • XPath syntax
      • for processing XML data[1]
      • using node predicates[1]
  • Y
    • YAML specification
© 2023 StreamSets, Inc.