• A
    • activation code
    • ADLS Gen2 destination
      • data formats[1]
      • prerequisites[1]
      • retrieve configuration details[1]
      • write mode[1]
    • ADLS Gen2 origin
      • data formats[1]
      • partitions[1]
      • prerequisites[1]
      • retrieve configuration details[1]
      • schema requirement[1]
    • Aggregate processor
      • aggregate functions[1]
      • configuring[1]
      • default output fields[1]
      • example[1]
      • overview[1]
      • shuffling of data[1]
    • Amazon Redshift destination
      • AWS credentials and write requirements[1]
      • configuring[1]
      • installing the JDBC driver[1]
      • partitions[1]
      • server-side encryption[1]
      • write mode[1]
    • Amazon S3 destination
      • authentication method[1]
      • AWS credentials[1]
      • data formats[1]
      • overview[1]
      • server-side encryption[1]
      • write mode[1]
    • Amazon S3 origin
      • authentication method[1]
      • AWS credentials[1]
      • data formats[1]
      • overview[1]
      • partitions[1]
    • Append Data write mode
      • Delta Lake destination[1]
    • authentication
    • authentication method
    • authentication properties
    • AWS credentials
    • AWS Secrets Manager
      • credential store[1]
      • properties file[1]
    • AWS Secrets Manager access
    • Azure
      • StreamSets for Databricks[1]
    • Azure Event Hubs destination
    • Azure Event Hubs origin
      • configuring[1]
      • default and specific offsets[1]
      • overview[1]
      • prerequisites[1]
    • Azure Key Vault
      • credential store[1]
      • credential store, prerequisites[1]
      • properties file[1]
    • Azure Key Vault access
    • Azure SQL destination
  • B
    • Base64 functions
    • basic syntax
      • for expressions[1]
    • batch pipelines
    • browser
      • requirements[1]
    • bulk edit mode
  • C
    • case study
      • batch pipelines[1]
      • streaming pipelines[1]
    • CDC writes
      • Delta lake destination[1]
    • classloader
    • client deployment mode
      • Hadoop YARN cluster[1]
    • cloud service provider
    • cluster
      • Dataproc[1]
      • Hadoop YARN[1]
      • running pipelines[1]
      • SQL Server 2019 BDC[1]
    • cluster configuration
      • Databricks instance pool[1]
      • Databricks pipelines[1]
    • cluster deployment mode
      • Hadoop YARN cluster[1]
    • command line interface
      • jks-credentialstore command[1]
      • stagelib-cli command[1]
    • conditions
      • Delta Lake destination[1]
      • Filter processor[1]
      • Join processor[1]
      • Stream Selector processor[1]
      • Window processor[1]
    • constants
      • in the StreamSets expression language[1]
    • Control Hub
      • HTTP or HTTPS proxy[1]
    • credential stores
      • AWS Secrets Manager[1]
      • Azure Key Vault[1]
      • CyberArk[1]
      • enabling[1]
      • functions to access[1]
      • Java keystore[1]
      • overview[1]
    • cross join
      • Join processor[1]
    • custom schemas
      • application to JSON and delimited data[1]
      • DDL schema format[1][2]
      • error handling[1]
      • JSON schema format[1][2]
      • origins[1]
    • CyberArk
      • credential store[1]
      • properties file[1]
    • CyberArk access
  • D
    • Databricks
      • init scripts for provisioned clusters[1]
      • provisioned cluster configuration[1]
      • provisioned cluster with instance pool[1]
      • uninstalling old Transformer libraries[1]
    • Databricks init scripts
      • access keys for ABFSS[1]
    • Databricks pipelines
    • Data Collectors
    • data formats
      • ADLS Gen2 destination[1]
      • ADLS Gen2 origin[1]
      • Amazon S3 destination[1]
      • Amazon S3 origin[1]
      • Azure Event Hubs destination[1]
      • File destination[1]
      • File origin[1]
      • Whole Directory origin[1]
    • data preview
      • data type display[1]
      • overview[1]
    • Dataproc
      • cluster[1]
      • credentials[1]
      • credentials in a file[1]
      • credentials in a property[1]
      • default credentials[1]
    • Dataproc pipelines
      • existing cluster[1]
    • data types
    • datetime variables
      • in the StreamSets expression language[1]
    • Deduplicate processor
    • default output fields
      • Aggregate processor[1]
    • default stream
      • Stream Selector[1]
    • Delete from Table write mode
      • Delta Lake destination[1]
    • Delta Lake destination
      • ADLS Gen2 prerequisites[1]
      • Amazon S3 credential mode[1]
      • Append Data write mode[1]
      • CDC example[1]
      • creating a managed table[1]
      • creating a table[1]
      • creating a table or managed table[1]
      • Delete from Table write mode[1]
      • overview[1]
      • overwrite condition[1]
      • Overwrite Data write mode[1]
      • partitions[1]
      • retrieve ADLS Gen2 authentication information[1]
      • Update Table write mode[1]
      • Upsert Using Merge write mode[1]
      • write mode[1]
      • writing to a local file system[1]
    • Delta Lake Lookup processor
      • ADLS Gen2 prerequisites[1]
      • Amazon S3 credential mode[1]
      • retrieve ADLS Gen2 authentication information[1]
      • using from a local file system[1]
    • Delta Lake origin
      • ADLS Gen2 prerequisites[1]
      • Amazon S3 credential mode[1]
      • reading from a local file system[1]
      • retrieve ADLS Gen2 authentication information[1]
    • deployment mode
      • Hadoop YARN cluster[1]
    • destinations
    • directories
    • directory path
      • File destination[1]
      • File origin[1]
    • disconnected mode
    • drivers
      • JDBC destination[1]
      • JDBC Lookup processor[1]
      • JDBC origin[1]
      • JDBC Table origin[1]
      • MySQL JDBC Table origin[1]
      • Oracle JDBC Table origin[1]
  • E
    • EMR
      • authentication method[1]
      • Kerberos stage limitation[1]
      • server-side encryption[1]
      • SSE Key Management Service (KMS) requirement[1]
      • Transformer installation location[1]
    • EMR pipelines
    • encryption zones
      • using KMS to access HDFS encryption zones[1]
    • environment variables
    • execution engines
    • execution mode
    • expression language
      • constants[1]
      • datetime variables[1]
      • functions[1]
      • literals[1]
      • operator precedence[1]
      • operators[1]
      • reserved words[1]
    • external libraries
      • manual install[1]
      • manual installation[1]
      • Package Manager installation[1]
      • stage properties installation[1]
  • F
    • Field Flattener processor
    • Field Order processor
    • Field Remover processor
    • Field Renamer processor
    • fields
    • file descriptors
    • File destination
    • file functions
    • File origin
      • configuring[1]
      • custom schema[1]
      • data formats[1]
      • directory path[1]
      • overview[1]
      • partitions[1]
      • schema requirement[1]
    • Filter processor
    • force stop
      • EMR pipelines[1]
    • full outer join
      • Join processor[1]
    • full read
      • Snowflake origin[1]
    • functions
      • Base64 functions[1]
      • credential[1]
      • file functions[1]
      • in the StreamSets expression language[1]
      • job functions[1]
      • math functions[1]
      • miscellaneous functions[1]
      • pipeline functions[1]
      • string functions[1]
      • time functions[1]
  • G
    • garbage collection
    • Google Big Query destination
      • merge properties[1]
      • prerequisite[1]
      • write mode[1]
    • Google Big Query origin
      • incremental and full query mode[1]
      • offset column and supported types[1]
      • supported data types[1]
  • H
    • Hadoop clusters
      • post-upgrade task[1]
    • Hadoop impersonation mode
      • configuring KMS for encryption zones[1]
      • lowercasing user names[1]
      • overview[1]
    • Hadoop YARN
      • cluster[1]
      • deployment mode[1]
      • directory requirements[1]
      • driver requirement[1]
      • impersonation[1]
      • Kerberos authentication[1]
    • heap dump creation
    • heap size
    • history
      • pipeline run[1]
    • Hive destination
      • additional Hive configuration properties[1]
      • configuring[1]
      • data drift column order[1]
    • Hive origin
      • reading Delta Lake managed tables[1]
    • Hortonworks clusters
      • post-upgrade task[1]
    • HTTP or HTTPS proxy
      • for Control Hub[1]
  • I
    • impersonation mode
    • incremental read
      • Snowflake origin[1]
    • init scripts
      • Databricks provisioned clusters[1]
    • inner join
      • Join processor[1]
    • inputs variable
    • installation
      • Azure[1]
      • cloud[1]
      • local[1]
      • overview[1]
      • requirements[1]
      • Scala, Spark, and Java JDK requirements[1]
      • Spark shuffle service requirement[1]
      • Transformer[1]
    • installation package
      • choosing Scala version[1]
    • installation requirements
    • install from RPM
    • install from tarball
  • J
    • Java
      • garbage collection[1]
    • Java configuration options
      • heap size[1]
      • Transformer environment configuration[1]
    • Java keystore
      • credential store[1]
      • properties file[1]
    • JDBC destination
      • configuring[1]
      • driver installation[1]
      • overview[1]
      • partitions[1]
      • tested versions and drivers[1]
    • JDBC Lookup processor
      • driver installation[1]
      • overview[1]
      • tested versions and drivers[1]
    • JDBC Query origin
      • configuring[1]
      • driver installation[1]
      • overview[1]
      • tested versions and drivers[1]
    • JDBC Table origin
      • configuring[1]
      • driver installation[1]
      • offset column[1]
      • overview[1]
      • partitions[1]
      • supported offset data types[1]
      • tested versions and drivers[1]
    • job functions
    • Join processor
      • condition[1]
      • configuring[1]
      • criteria[1]
      • cross join[1]
      • full outer join[1]
      • inner join[1]
      • join types[1]
      • left anti join[1]
      • left outer join[1]
      • left semi join[1]
      • matching fields[1]
      • overview[1]
      • right anti join[1]
      • right outer join[1]
      • shuffling of data[1]
    • join types
      • Join processor[1]
    • JSON Parser processor
      • configuring[1]
      • custom schema[1]
      • error handling[1]
      • overview[1]
      • schema inference[1]
  • K
    • Kafka destination
      • Kerberos authentication[1]
      • security[1]
      • SSL/TLS encryption[1]
    • Kafka origin
      • custom schemas[1]
      • Kerberos authentication[1]
      • overview[1]
      • security[1]
      • SSL/TLS encryption[1]
    • Kafka stages
      • enabling SASL[1]
      • enabling SASL on SSL/TLS[1]
      • enabling security[1]
      • enabling SSL/TLS security[1]
      • providing Kerberos credentials[1]
      • security prerequisite tasks[1]
    • Kerberos
      • credentials for Kafka stages[1]
      • enabling[1]
    • Kerberos authentication
      • Hadoop YARN cluster[1]
      • Kafka destination[1]
      • Kafka origin[1]
    • Kerberos keytab
      • configuring in pipelines[1]
    • Kudu origin
  • L
    • LDAP authentication
    • left anti join
      • Join processor[1]
    • left outer join
      • Join processor[1]
    • left semi join
      • Join processor[1]
    • literals
      • in the StreamSets expression language[1]
    • log files
    • log level
    • logs
      • modifying log level[1]
      • pipelines[1]
      • Spark driver[1]
      • Transformer[1]
    • lookups
      • streaming example[1]
  • M
    • MapR cluster
      • dynamic allocation requirement[1]
    • MapR clusters
      • Hadoop impersonation prerequisite[1]
      • pipeline start prerequisite[1]
    • master instance
      • retrieving details[1]
    • math functions
    • miscellaneous functions
    • monitoring
    • MySQL JDBC Table origin
      • custom offset queries[1]
      • default offset queries[1]
      • driver installation[1]
      • MySQL data types[1]
      • null offset value handling[1]
      • supported offset data types[1]
  • O
    • offset column
      • Google Big Query origin[1]
      • JDBC Table[1]
    • offsets
      • resetting for the pipeline[1]
      • skipping tracking[1]
    • open file limit
    • operators
      • in the StreamSets expression language[1]
      • precedence[1]
    • Oracle JDBC Table origin
      • custom offset queries[1]
      • default offset queries[1]
      • driver installation[1]
      • null offset value handling[1]
      • Oracle data types[1]
      • supported offset data types[1]
    • origins
    • output order
    • output variable
    • Overwrite Data write mode
      • Delta Lake destination[1]
  • P
    • parameters
      • starting pipelines with[1]
    • partitions
      • ADLS Gen2 origin[1]
      • Amazon Redshift destination[1]
      • Amazon S3 origin[1]
      • Azure SQL destination[1]
      • based on origins[1]
      • Delta Lake destination[1]
      • File origin[1]
      • initial[1]
      • JDBC destination[1]
      • JDBC Table origin[1]
      • Rank processor[1]
    • pipeline functions
    • pipeline run
    • pipelines
      • comparison with Data Collector[1]
      • logs[1]
      • monitoring[1]
      • pause monitoring[1]
      • previewing[1]
      • run history[1]
      • Spark configuration[1]
      • starting with parameters[1]
    • ports
    • PostgreSQL JDBC Table origin
      • custom offset queries[1]
      • default offset queries[1]
      • null offset value handling[1]
      • PostgreSQL JDBC driver[1]
      • supported data types[1]
      • supported offset data types[1]
    • post-upgrade task
      • enable the Spark shuffle service on clusters[1]
      • update drivers on older Hadoop clusters[1]
    • post-upgrade tasks
      • access Databricks job details[1]
      • update ADLS stages in HDInsight pipelines[1]
      • update keystore and truststore location[1]
    • preprocessing script
      • pipeline[1]
      • prerequisites[1]
      • requirements[1]
      • Spark-Scala prerequisites[1]
    • prerequisites
      • Azure Event Hubs destination[1]
      • Azure Event Hubs origin[1]
      • for the Scala processor and preprocessing script[1]
      • PySpark processor[1]
    • preview
      • availability[1]
      • color codes[1]
      • configured cluster[1]
      • editing properties[1]
      • embedded Spark[1]
      • output order[1]
      • overview[1]
      • pipeline[1]
      • writing to destinations[1]
    • processor
      • output order[1]
    • processors
    • Profile processor
    • proxy users
    • PySpark processor
      • configuring[1]
      • custom code[1]
      • Databricks prerequisites[1]
      • EMR prerequisites[1]
      • examples[1]
      • input and output variables[1]
      • other cluster and local pipeline prerequisites[1]
      • overview[1]
      • prerequisites[1][2]
      • referencing fields[1]
    • PySpark processor requirements for provisioned Databricks clusters[1]
  • Q
    • query mode
      • Google Big Query origin[1]
  • R
    • Rank processor
    • read mode
      • Snowflake origin[1]
    • register
    • release notes 4.0.x[1]
    • release notes 4.1.x[1]
    • remote debugging
    • repartitioning
    • Repartition processor
      • coalesce by number repartition method[1]
      • configuring[1]
      • methods[1]
      • overview[1]
      • repartition by field range repartition method[1]
      • repartition by number repartition method[1]
      • shuffling of data[1]
      • use cases[1]
    • reserved words
      • in the StreamSets expression language[1]
    • reverse proxy
      • configuring for Transformer[1]
    • right anti join
      • Join processor[1]
    • right outer join
      • Join processor[1]
    • roles
      • for users with file-based authentication[1]
    • RPM package
      • uninstallation[1]
    • runtime parameters
      • calling from a pipeline[1]
      • calling from checkboxes and drop-down menus[1]
      • calling from scripting processors[1]
      • calling from text boxes[1]
      • defining[1]
      • monitoring[1]
      • viewing[1]
    • runtime properties
    • runtime resources
      • calling from a pipeline[1]
      • defining[1]
    • runtime values
  • S
    • Scala
      • choosing an Transformer installation package[1]
    • Scala, Spark, and Java JDK requirements
      • installation[1]
    • Scala processor
      • configuring[1]
      • custom code[1]
      • examples[1]
      • input and output variables[1]
      • inputs variable[1]
      • output variable[1]
      • overview[1]
      • prerequisites[1]
      • requirements[1]
      • Spark-Scala prerequisite[1]
      • Spark SQL queries[1]
    • scripting processors
      • calling runtime values[1]
    • scripts
      • preprocessing[1]
    • security
      • Kafka destination[1]
      • Kafka origin[1]
    • server-side encryption
      • Amazon Redshift destination[1]
      • Amazon S3 destination[1]
      • EMR clusters[1]
    • shuffling
    • simple edit mode
    • Slowly Changing Dimension processor
      • configuring[1]
      • pipeline processing[1]
    • Slowly Changing Dimensions processor
    • Snowflake destination
    • Snowflake Lookup processor
    • Snowflake origin
      • full query guidelines[1]
      • incremental or full read[1]
      • incremental query guidelines[1]
      • overview[1]
      • read mode[1]
      • SQL query guidelines[1]
    • sorting
      • multiple fields[1]
    • Sort processor
    • Spark configuration
    • Spark history server
    • Spark processing
    • Spark SQL Expression processor
    • Spark SQL processor
    • Spark SQL query
    • Spark SQL Query processor
    • Spark web UI
    • SQL query
      • guidelines for the Snowflake origin[1]
    • SQL Server 2019 BDC
      • cluster[1]
      • JDBC connection information[1]
      • master instance details for JDBC[1]
      • quick start deployment script[1]
      • retrieving information[1]
    • SQL Server JDBC Table origin
      • configuring[1]
      • custom offset queries[1]
      • default offset queries[1]
      • null offset value handling[1]
      • SQL Server JDBC driver[1]
      • supported data types[1]
      • supported offset data types[1]
    • SSL/TLS encryption
      • Kafka destination[1]
      • Kafka origin[1]
    • statistics
    • streaming pipelines
    • Stream Selector processor
    • StreamSets Control Hub
      • disconnected mode[1]
    • StreamSets for Databricks
      • installation on Azure[1]
    • string functions
  • T
    • tarball
      • uninstallation[1]
    • Technology Preview functionality
    • time functions
    • Transformer
      • activation code[1]
      • architecture[1]
      • description[1]
      • directories[1]
      • disconnected mode[1]
      • environment variables[1]
      • execution engine[1][2]
      • for Data Collector users[1]
      • heap dump creation[1]
      • installation[1]
      • Java configuration options[1]
      • launching[1]
      • proxy users[1]
      • registering[1]
      • release notes[1]
      • remote debugging[1]
      • restarting[1]
      • spark-submit[1]
      • starting[1]
      • starting as service[1]
      • starting manually[1]
      • uninstallation[1]
      • viewing and downloading log data[1][2]
      • viewing configuration properties[1]
    • TRANSFORMER_CONF
      • environment variable[1]
    • TRANSFORMER_DATA
      • environment variable[1]
    • TRANSFORMER_DIST
      • environment variable[1]
    • TRANSFORMER_JAVA_OPTS
      • Java environment variable[1]
    • TRANSFORMER_LOG
      • environment variable[1]
    • TRANSFORMER_RESOURCES
      • environment variable[1]
    • TRANSFORMER_ROOT_CLASSPATH
      • Java environment variable[1]
    • Transformer libraries
      • removing from Databricks[1]
    • Transformer metrics
    • troubleshooting
      • origin errors[1]
    • Type Converter processor
      • configuring[1]
      • field type conversion[1]
      • overview[1]
  • U
    • ulimit
    • uninstallation
    • union processor
    • Update Table write mode
      • Delta Lake destination[1]
    • upgrade
      • installation from RPM[1]
      • installation from tarball[1]
      • troubleshooting[1]
    • Upsert Using Merge write mode
      • Delta Lake destination[1]
    • usage statistics
    • users
      • creating for file-based authentication[1]
      • default for file-based authentication[1]
      • roles for file-based authentication[1]
  • V
    • validation
      • implicit and explicit[1]
  • W
    • Whole Directory origin
    • Window processor
    • window types
      • Window processor[1]
    • write mode
      • Delta Lake destination[1]
      • Google Big Query destination[1]
      • Snowflake destination[1]
© Copyright IBM Corporation