Index Terms - Transformer User Guide

A
- ADLS Gen2 destination
  - data formats[1]
  - prerequisites[1]
  - retrieve configuration details[1]
  - write mode[1]
- ADLS Gen2 origin
  - data formats[1]
  - partitions[1]
  - prerequisites[1]
  - retrieve configuration details[1]
  - schema requirement[1]
- Aggregate processor
  - aggregate functions[1]
  - configuring[1]
  - default output fields[1]
  - example[1]
  - overview[1]
  - shuffling of data[1]
- Amazon Redshift destination
  - AWS credentials and write requirements[1]
  - configuring[1]
  - installing the JDBC driver[1]
  - partitions[1]
  - server-side encryption[1]
  - write mode[1]
- Amazon S3 destination
  - authentication method[1]
  - AWS credentials[1]
  - data formats[1]
  - overview[1]
  - server-side encryption[1]
  - write mode[1]
- Amazon S3 origin
  - authentication method[1]
  - AWS credentials[1]
  - data formats[1]
  - overview[1]
  - partitions[1]
- Append Data write mode
  - Delta Lake destination[1]
- authentication method
  - Amazon S3[1][2]
- AWS credentials
  - Amazon S3[1][2]
- AWS Secrets Manager
  - credential store[1]
  - properties file[1]
- AWS Secrets Manager access
  - overview[1]
- Azure
  - StreamSets for Databricks[1]
- Azure Event Hubs destination
  - configuring[1]
  - data formats[1]
  - overview[1]
  - prerequisites[1]
- Azure Event Hubs origin
  - configuring[1]
  - default and specific offsets[1]
  - overview[1]
  - prerequisites[1]
- Azure Key Vault
  - credential store[1]
  - credential store, prerequisites[1]
  - properties file[1]
- Azure Key Vault access
  - overview[1]
  - prerequisites[1]
- Azure SQL destination
  - partitions[1]
B
- Base64 functions
  - description[1]
- basic syntax
  - for expressions[1]
- batch pipelines
  - case study[1]
  - description[1]
- bulk edit mode
  - description[1]
C
- case study
  - batch pipelines[1]
  - streaming pipelines[1]
- CDC writes
  - Delta lake destination[1]
- classloader
  - root[1]
- client deployment mode
  - Hadoop YARN cluster[1]
- cloud service provider
  - Azure[1]
- cluster
  - Dataproc[1]
  - Hadoop YARN[1]
  - running pipelines[1]
- cluster configuration
  - Databricks instance pool[1]
  - Databricks pipelines[1]
- cluster deployment mode
  - Hadoop YARN cluster[1]
- command line interface
  - jks-credentialstore command[1]
  - stagelib-cli command[1]
- conditions
  - Delta Lake destination[1]
  - Filter processor[1]
  - Join processor[1]
  - Stream Selector processor[1]
  - Window processor[1]
- constants
  - in the StreamSets expression language[1]
- Control Hub
  - HTTP or HTTPS proxy[1]
- credential stores
  - AWS Secrets Manager[1]
  - Azure Key Vault[1]
  - CyberArk[1]
  - enabling[1]
  - functions to access[1]
  - Java keystore[1]
  - overview[1]
- cross join
  - Join processor[1]
- custom schemas
  - application to JSON and delimited data[1]
  - DDL schema format[1][2]
  - error handling[1]
  - JSON schema format[1][2]
  - origins[1]
- CyberArk
  - credential store[1]
  - properties file[1]
- CyberArk access
  - overview[1]
D
- Databricks
  - init scripts for provisioned clusters[1]
  - provisioned cluster configuration[1]
  - provisioned cluster with instance pool[1]
  - uninstalling old Transformer libraries[1]
- Databricks init scripts
  - access keys for ABFSS[1]
- Databricks pipelines
  - existing cluster[1]
  - job details[1]
  - provisioned cluster[1][2]
- Data Collectors
  - labels[1]
  - versions[1]
- data formats
  - ADLS Gen2 destination[1]
  - ADLS Gen2 origin[1]
  - Amazon S3 destination[1]
  - Amazon S3 origin[1]
  - Azure Event Hubs destination[1]
  - File destination[1]
  - File origin[1]
  - Whole Directory origin[1]
- data preview
  - data type display[1]
  - overview[1]
- Dataproc
  - cluster[1]
  - credentials[1]
  - credentials in a file[1]
  - credentials in a property[1]
  - default credentials[1]
- Dataproc pipelines
  - existing cluster[1]
- data types
  - in preview[1]
- datetime variables
  - in the StreamSets expression language[1]
- Deduplicate processor
  - configuring[1]
  - overview[1]
- default output fields
  - Aggregate processor[1]
- default stream
  - Stream Selector[1]
- Delete from Table write mode
  - Delta Lake destination[1]
- Delta Lake destination
  - ADLS Gen2 prerequisites[1]
  - Amazon S3 credential mode[1]
  - Append Data write mode[1]
  - CDC example[1]
  - creating a managed table[1]
  - creating a table[1]
  - creating a table or managed table[1]
  - Delete from Table write mode[1]
  - overview[1]
  - overwrite condition[1]
  - Overwrite Data write mode[1]
  - partitions[1]
  - retrieve ADLS Gen2 authentication information[1]
  - Update Table write mode[1]
  - Upsert Using Merge write mode[1]
  - write mode[1]
  - writing to a local file system[1]
- Delta Lake Lookup processor
  - ADLS Gen2 prerequisites[1]
  - Amazon S3 credential mode[1]
  - retrieve ADLS Gen2 authentication information[1]
  - using from a local file system[1]
- Delta Lake origin
  - ADLS Gen2 prerequisites[1]
  - Amazon S3 credential mode[1]
  - reading from a local file system[1]
  - retrieve ADLS Gen2 authentication information[1]
- deployment mode
  - Hadoop YARN cluster[1]
- destinations
  - Amazon S3[1]
  - Azure Event Hubs[1]
  - Delta Lake[1]
  - File[1]
  - JDBC[1]
  - Snowflake[1]
- directories
  - internal[1]
  - protected[1]
  - Transformer[1]
- directory path
  - File destination[1]
  - File origin[1]
- disconnected mode
  - Control Hub[1]
- drivers
  - JDBC destination[1]
  - JDBC Lookup processor[1]
  - JDBC origin[1]
  - JDBC Table origin[1]
  - MySQL JDBC Table origin[1]
  - Oracle JDBC Table origin[1]
E
- EMR
  - authentication method[1]
  - Kerberos stage limitation[1]
  - server-side encryption[1]
  - SSE Key Management Service (KMS) requirement[1]
  - Transformer installation location[1]
- EMR pipelines
  - force stop[1]
- encryption zones
  - using KMS to access HDFS encryption zones[1]
- environment variables
  - directories[1]
  - modifying[1]
- execution engines
  - Transformer[1][2]
- execution mode
  - pipelines[1]
- expression language
  - constants[1]
  - datetime variables[1]
  - functions[1]
  - literals[1]
  - operator precedence[1]
  - operators[1]
  - reserved words[1]
- external libraries
  - manual install[1]
  - manual installation[1]
  - Package Manager installation[1]
  - stage properties installation[1]
F
- Field Flattener processor
  - configuring[1]
- Field Order processor
  - configuring[1]
  - overview[1]
- Field Remover processor
  - configuring[1]
  - overview[1]
- Field Renamer processor
  - configuring[1]
  - overview[1]
  - rename methods[1]
- fields
  - referencing[1]
- file descriptors
  - increasing[1]
- File destination
  - configuring[1]
  - data formats[1]
  - directory path[1]
  - overview[1]
  - write mode[1]
- file functions
  - description[1]
- File origin
  - configuring[1]
  - custom schema[1]
  - data formats[1]
  - directory path[1]
  - overview[1]
  - partitions[1]
  - schema requirement[1]
- Filter processor
  - configuring[1]
  - filter condition[1]
  - overview[1]
- force stop
  - EMR pipelines[1]
- full outer join
  - Join processor[1]
- full read
  - Snowflake origin[1]
- functions
  - Base64 functions[1]
  - credential[1]
  - file functions[1]
  - in the StreamSets expression language[1]
  - job functions[1]
  - math functions[1]
  - miscellaneous functions[1]
  - pipeline functions[1]
  - string functions[1]
  - time functions[1]
G
- garbage collection
  - Java[1]
- Google Big Query destination
  - merge properties[1]
  - prerequisite[1]
  - write mode[1]
- Google Big Query origin
  - incremental and full query mode[1]
  - offset column and supported types[1]
  - supported data types[1]
H
- Hadoop clusters
  - post-upgrade task[1]
- Hadoop impersonation mode
  - configuring KMS for encryption zones[1]
  - lowercasing user names[1]
  - overview[1]
- Hadoop YARN
  - cluster[1]
  - deployment mode[1]
  - directory requirements[1]
  - driver requirement[1]
  - impersonation[1]
  - Kerberos authentication[1]
- heap dump creation
  - Transformer[1]
- heap size
  - configuring[1]
- Hive destination
  - additional Hive configuration properties[1]
  - configuring[1]
  - data drift column order[1]
- Hive origin
  - reading Delta Lake managed tables[1]
- Hortonworks clusters
  - post-upgrade task[1]
- HTTP or HTTPS proxy
  - for Control Hub[1]
I
- impersonation mode
  - Hadoop[1]
- incremental read
  - Snowflake origin[1]
- init scripts
  - Databricks provisioned clusters[1]
- inner join
  - Join processor[1]
- input
  - schema[1]
- inputs variable
  - PySpark processor[1]
  - Scala processor[1][2]
- installation
  - Azure[1]
  - cloud[1]
  - local[1]
  - requirements[1]
  - Scala, Spark, and Java JDK requirements[1]
  - Spark shuffle service requirement[1]
  - Transformer[1]
- installation package
  - choosing Scala version[1]
- installation requirements
  - system[1]
- install from RPM
  - upgrade[1]
- install from tarball
  - upgrade[1]
J
- Java
  - garbage collection[1]
- Java configuration options
  - heap size[1]
  - Transformer environment configuration[1]
- Java keystore
  - credential store[1]
  - properties file[1]
- JDBC destination
  - configuring[1]
  - driver installation[1]
  - overview[1]
  - partitions[1]
  - tested versions and drivers[1]
- JDBC Lookup processor
  - driver installation[1]
  - overview[1]
  - tested versions and drivers[1]
- JDBC Query origin
  - configuring[1]
  - driver installation[1]
  - overview[1]
  - tested versions and drivers[1]
- JDBC Table origin
  - configuring[1]
  - driver installation[1]
  - offset column[1]
  - overview[1]
  - partitions[1]
  - supported offset data types[1]
  - tested versions and drivers[1]
- job functions
  - description[1]
- jobs
  - labels[1]
  - pipeline instances[1]
  - scaling out[1]
- Join processor
  - condition[1]
  - configuring[1]
  - criteria[1]
  - cross join[1]
  - full outer join[1]
  - inner join[1]
  - join types[1]
  - left anti join[1]
  - left outer join[1]
  - left semi join[1]
  - matching fields[1]
  - overview[1]
  - right anti join[1]
  - right outer join[1]
  - shuffling of data[1]
- join types
  - Join processor[1]
- JSON Parser processor
  - configuring[1]
  - custom schema[1]
  - error handling[1]
  - overview[1]
  - schema inference[1]
K
- Kafka destination
  - Kerberos authentication[1]
  - security[1]
  - SSL/TLS encryption[1]
- Kafka origin
  - custom schemas[1]
  - Kerberos authentication[1]
  - overview[1]
  - security[1]
  - SSL/TLS encryption[1]
- Kafka stages
  - enabling SASL[1]
  - enabling SASL on SSL/TLS[1]
  - enabling security[1]
  - enabling SSL/TLS security[1]
  - providing Kerberos credentials[1]
  - security prerequisite tasks[1]
- Kerberos
  - credentials for Kafka stages[1]
  - enabling[1]
- Kerberos authentication
  - Hadoop YARN cluster[1]
  - Kafka destination[1]
  - Kafka origin[1]
- Kerberos keytab
  - configuring in pipelines[1]
- Kudu origin
  - configuring[1]
  - overview[1]
L
- labels
  - for jobs[1]
  - overview[1]
- left anti join
  - Join processor[1]
- left outer join
  - Join processor[1]
- left semi join
  - Join processor[1]
- literals
  - in the StreamSets expression language[1]
- log files
  - viewing and downloading[1]
- log level
  - modifying[1]
- logs
  - modifying log level[1]
  - pipelines[1]
  - Spark driver[1]
  - Transformer[1]
- lookups
  - streaming example[1]
M
- math functions
  - description[1]
- miscellaneous functions
  - description[1]
- MySQL JDBC Table origin
  - custom offset queries[1]
  - default offset queries[1]
  - driver installation[1]
  - MySQL data types[1]
  - null offset value handling[1]
  - supported offset data types[1]
O
- offset column
  - Google Big Query origin[1]
  - JDBC Table[1]
- offsets
  - resetting for the pipeline[1]
  - skipping tracking[1]
- open file limit
  - configuring[1]
- operators
  - in the StreamSets expression language[1]
  - precedence[1]
- Oracle JDBC Table origin
  - custom offset queries[1]
  - default offset queries[1]
  - driver installation[1]
  - null offset value handling[1]
  - Oracle data types[1]
  - supported offset data types[1]
- origins
  - Amazon S3[1]
  - Azure Event Hubs[1]
  - File[1]
  - JDBC Query[1]
  - JDBC Table[1]
  - Kafka[1]
  - Kudu[1]
  - Kudu origin[1]
  - multiple[1]
  - Snowflake[1]
  - Whole Directory[1]
- output
  - schema[1]
- output order
  - preview[1]
- output variable
  - PySpark processor[1]
  - Scala processor[1][2]
- Overwrite Data write mode
  - Delta Lake destination[1]
P
- parameters
  - starting pipelines with[1]
- partitions
  - ADLS Gen2 origin[1]
  - Amazon Redshift destination[1]
  - Amazon S3 origin[1]
  - Azure SQL destination[1]
  - based on origins[1]
  - Delta Lake destination[1]
  - File origin[1]
  - initial[1]
  - JDBC destination[1]
  - JDBC Table origin[1]
  - Rank processor[1]
- pipeline functions
  - description[1]
- pipelines
  - comparison with Data Collector[1]
  - logs[1]
  - number of instances[1]
  - previewing[1]
  - scaling out[1]
  - Spark configuration[1]
  - starting with parameters[1]
- ports
  - default[1]
- PostgreSQL JDBC Table origin
  - custom offset queries[1]
  - default offset queries[1]
  - null offset value handling[1]
  - PostgreSQL JDBC driver[1]
  - supported data types[1]
  - supported offset data types[1]
- post-upgrade task
  - enable the Spark shuffle service on clusters[1]
  - update drivers on older Hadoop clusters[1]
- post-upgrade tasks
  - access Databricks job details[1]
  - update ADLS stages in HDInsight pipelines[1]
  - update keystore and truststore location[1]
- preprocessing script
  - pipeline[1]
  - prerequisites[1]
  - requirements[1]
  - Spark-Scala prerequisites[1]
- prerequisites
  - Azure Event Hubs destination[1]
  - Azure Event Hubs origin[1]
  - for the Scala processor and preprocessing script[1]
  - PySpark processor[1]
- preview
  - availability[1]
  - color codes[1]
  - configured cluster[1]
  - editing properties[1]
  - embedded Spark[1]
  - output order[1]
  - overview[1]
  - pipeline[1]
  - writing to destinations[1]
- processor
  - output order[1]
- processors
  - Aggregate[1]
  - Deduplicate[1]
  - Field Order[1]
  - Field Remover[1]
  - Field Renamer[1]
  - Filter[1]
  - JDBC Lookup[1]
  - Join[1]
  - JSON Parser[1]
  - Profile[1]
  - PySpark[1]
  - Rank[1]
  - referencing fields[1]
  - Repartition[1]
  - Scala[1]
  - shuffling of data[1]
  - Snowflake Lookup[1]
  - Sort[1]
  - Spark SQL Expression[1]
  - Spark SQL Query[1]
  - Stream Selector[1]
  - Type Converter[1]
  - union[1]
  - Window[1]
- Profile processor
  - configuring[1]
  - output records[1]
  - overview[1]
  - statistics[1]
- proxy users
  - Transformer[1]
- PySpark processor
  - configuring[1]
  - custom code[1]
  - Databricks prerequisites[1]
  - EMR prerequisites[1]
  - examples[1]
  - input and output variables[1]
  - other cluster and local pipeline prerequisites[1]
  - overview[1]
  - prerequisites[1][2]
  - referencing fields[1]
- PySpark processor requirements for provisioned Databricks clusters[1]
Q
- query mode
  - Google Big Query origin[1]
R
- Rank processor
  - configuring[1]
  - example[1]
  - order by[1]
  - overview[1]
  - partition by[1]
  - rank functions[1]
  - shuffling of data[1]
- read mode
  - Snowflake origin[1]
- register
  - Transformer[1]
- release notes 4.0.x[1]
- release notes 4.1.x[1]
- remote debugging
  - Transformer[1]
- repartitioning
  - methods[1]
  - overview[1]
- Repartition processor
  - coalesce by number repartition method[1]
  - configuring[1]
  - methods[1]
  - overview[1]
  - repartition by field range repartition method[1]
  - repartition by number repartition method[1]
  - shuffling of data[1]
  - use cases[1]
- reserved words
  - in the StreamSets expression language[1]
- reverse proxy
  - configuring for Transformer[1]
- right anti join
  - Join processor[1]
- right outer join
  - Join processor[1]
- RPM package
  - uninstallation[1]
- runtime parameters
  - calling from a pipeline[1]
  - calling from checkboxes and drop-down menus[1]
  - calling from scripting processors[1]
  - calling from text boxes[1]
  - defining[1]
  - monitoring[1]
  - viewing[1]
- runtime properties
  - defining[1]
  - overview[1]
- runtime resources
  - calling from a pipeline[1]
  - defining[1]
- runtime values
  - overview[1]
S
- Scala
  - choosing an Transformer installation package[1]
- Scala, Spark, and Java JDK requirements
  - installation[1]
- Scala processor
  - configuring[1]
  - custom code[1]
  - examples[1]
  - input and output variables[1]
  - inputs variable[1]
  - output variable[1]
  - overview[1]
  - prerequisites[1]
  - requirements[1]
  - Spark-Scala prerequisite[1]
  - Spark SQL queries[1]
- schema
  - input[1]
  - output[1]
- scripting processors
  - calling runtime values[1]
- scripts
  - preprocessing[1]
- security
  - Kafka destination[1]
  - Kafka origin[1]
- server-side encryption
  - Amazon Redshift destination[1]
  - Amazon S3 destination[1]
  - EMR clusters[1]
- shuffling
  - overview[1]
- simple edit mode
  - description[1]
- Slowly Changing Dimension processor
  - configuring[1]
  - pipeline processing[1]
- Slowly Changing Dimensions processor
  - pipeline[1]
- Snowflake destination
  - merge properties[1]
  - overview[1]
  - write mode[1]
- Snowflake Lookup processor
  - overview[1]
- Snowflake origin
  - full query guidelines[1]
  - incremental or full read[1]
  - incremental query guidelines[1]
  - overview[1]
  - read mode[1]
  - SQL query guidelines[1]
- sorting
  - multiple fields[1]
- Sort processor
  - configuring[1]
  - multiple fields[1]
  - overview[1]
- Spark configuration
  - pipelines[1]
- Spark processing
  - description[1]
- Spark SQL Expression processor
  - overview[1]
- Spark SQL processor
  - configuring[1]
- Spark SQL query
  - syntax[1]
- Spark SQL Query processor
  - configuring[1]
  - examples[1]
  - overview[1]
  - query syntax[1]
  - referencing fields[1]
- SQL query
  - guidelines for the Snowflake origin[1]
- SQL Server JDBC Table origin
  - configuring[1]
  - custom offset queries[1]
  - default offset queries[1]
  - null offset value handling[1]
  - SQL Server JDBC driver[1]
  - supported data types[1]
  - supported offset data types[1]
- SSL/TLS encryption
  - Kafka destination[1]
  - Kafka origin[1]
- statistics
  - Profile processor[1]
- streaming pipelines
  - case study[1]
  - description[1]
- Stream Selector processor
  - conditions[1]
  - configuring[1]
  - default stream[1]
  - overview[1]
- StreamSets Control Hub
  - disconnected mode[1]
- StreamSets for Databricks
  - installation on Azure[1]
- string functions
  - description[1]
T
- tarball
  - uninstallation[1]
- Technology Preview functionality
  - description[1]
- time functions
  - description[1]
- Transformer
  - architecture[1]
  - description[1]
  - directories[1]
  - disconnected mode[1]
  - environment variables[1]
  - execution engine[1][2]
  - for Data Collector users[1]
  - heap dump creation[1]
  - installation[1]
  - Java configuration options[1]
  - launching[1]
  - proxy users[1]
  - registering[1]
  - release notes[1]
  - remote debugging[1]
  - spark-submit[1]
  - starting[1]
  - starting as service[1]
  - starting manually[1]
  - uninstallation[1]
  - viewing and downloading log data[1]
- TRANSFORMER_CONF
  - environment variable[1]
- TRANSFORMER_DATA
  - environment variable[1]
- TRANSFORMER_DIST
  - environment variable[1]
- TRANSFORMER_JAVA_OPTS
  - Java environment variable[1]
- TRANSFORMER_LOG
  - environment variable[1]
- TRANSFORMER_RESOURCES
  - environment variable[1]
- TRANSFORMER_ROOT_CLASSPATH
  - Java environment variable[1]
- Transformer libraries
  - removing from Databricks[1]
- Transformers
  - labels[1]
- troubleshooting
  - origin errors[1]
- Type Converter processor
  - configuring[1]
  - field type conversion[1]
  - overview[1]
U
- ulimit
  - configuring[1]
- uninstallation
  - RPM package[1]
  - tarball[1]
  - Transformer[1]
- union processor
  - overview[1]
- Update Table write mode
  - Delta Lake destination[1]
- upgrade
  - installation from RPM[1]
  - installation from tarball[1]
  - troubleshooting[1]
- Upsert Using Merge write mode
  - Delta Lake destination[1]
V
- validation
  - implicit and explicit[1]
W
- Whole Directory origin
  - data formats[1]
  - overview[1]
- Window processor
  - conditions[1]
  - configuring[1]
  - overview[1]
  - window types[1]
- window types
  - Window processor[1]
- write mode
  - Delta Lake destination[1]
  - Google Big Query destination[1]
  - Snowflake destination[1]