Transformer User Guide
Index
A
B
C
D
E
F
G
H
I
J
K
L
M
O
P
Q
R
S
T
U
V
W
A
activation code
Transformer
[1]
ADLS Gen2 destination
data formats
[1]
prerequisites
[1]
retrieve configuration details
[1]
write mode
[1]
ADLS Gen2 origin
data formats
[1]
partitions
[1]
prerequisites
[1]
retrieve configuration details
[1]
schema requirement
[1]
Aggregate processor
aggregate functions
[1]
configuring
[1]
default output fields
[1]
example
[1]
overview
[1]
shuffling of data
[1]
Amazon Redshift destination
AWS credentials and write requirements
[1]
configuring
[1]
installing the JDBC driver
[1]
partitions
[1]
server-side encryption
[1]
write mode
[1]
Amazon S3 destination
authentication method
[1]
AWS credentials
[1]
data formats
[1]
overview
[1]
server-side encryption
[1]
write mode
[1]
Amazon S3 origin
authentication method
[1]
AWS credentials
[1]
data formats
[1]
overview
[1]
partitions
[1]
Append Data write mode
Delta Lake destination
[1]
authentication
Transformer
[1]
authentication method
Amazon S3
[1]
[2]
authentication properties
configuring
[1]
AWS credentials
Amazon S3
[1]
[2]
AWS Secrets Manager
credential store
[1]
properties file
[1]
AWS Secrets Manager access
overview
[1]
Azure
StreamSets for Databricks
[1]
Azure Event Hubs destination
configuring
[1]
data formats
[1]
overview
[1]
prerequisites
[1]
Azure Event Hubs origin
configuring
[1]
default and specific offsets
[1]
overview
[1]
prerequisites
[1]
Azure Key Vault
credential store
[1]
credential store, prerequisites
[1]
properties file
[1]
Azure Key Vault access
overview
[1]
prerequisites
[1]
Azure SQL destination
partitions
[1]
B
Base64 functions
description
[1]
basic syntax
for expressions
[1]
batch pipelines
case study
[1]
description
[1]
browser
requirements
[1]
bulk edit mode
description
[1]
C
case study
batch pipelines
[1]
streaming pipelines
[1]
CDC writes
Delta lake destination
[1]
classloader
root
[1]
client deployment mode
Hadoop YARN cluster
[1]
cloud service provider
Azure
[1]
cluster
Dataproc
[1]
Hadoop YARN
[1]
running pipelines
[1]
SQL Server 2019 BDC
[1]
cluster configuration
Databricks instance pool
[1]
Databricks pipelines
[1]
cluster deployment mode
Hadoop YARN cluster
[1]
command line interface
jks-credentialstore command
[1]
stagelib-cli command
[1]
conditions
Delta Lake destination
[1]
Filter processor
[1]
Join processor
[1]
Stream Selector processor
[1]
Window processor
[1]
constants
in the StreamSets expression language
[1]
Control Hub
HTTP or HTTPS proxy
[1]
credential stores
AWS Secrets Manager
[1]
Azure Key Vault
[1]
CyberArk
[1]
enabling
[1]
functions to access
[1]
Java keystore
[1]
overview
[1]
cross join
Join processor
[1]
custom schemas
application to JSON and delimited data
[1]
DDL schema format
[1]
[2]
error handling
[1]
JSON schema format
[1]
[2]
origins
[1]
CyberArk
credential store
[1]
properties file
[1]
CyberArk access
overview
[1]
D
Databricks
init scripts for provisioned clusters
[1]
provisioned cluster configuration
[1]
provisioned cluster with instance pool
[1]
uninstalling old Transformer libraries
[1]
Databricks init scripts
access keys for ABFSS
[1]
Databricks pipelines
existing cluster
[1]
job details
[1]
provisioned cluster
[1]
[2]
Data Collectors
versions
[1]
data formats
ADLS Gen2 destination
[1]
ADLS Gen2 origin
[1]
Amazon S3 destination
[1]
Amazon S3 origin
[1]
Azure Event Hubs destination
[1]
File destination
[1]
File origin
[1]
Whole Directory origin
[1]
data preview
data type display
[1]
overview
[1]
Dataproc
cluster
[1]
credentials
[1]
credentials in a file
[1]
credentials in a property
[1]
default credentials
[1]
Dataproc pipelines
existing cluster
[1]
data types
in preview
[1]
datetime variables
in the StreamSets expression language
[1]
Deduplicate processor
configuring
[1]
overview
[1]
default output fields
Aggregate processor
[1]
default stream
Stream Selector
[1]
Delete from Table write mode
Delta Lake destination
[1]
Delta Lake destination
ADLS Gen2 prerequisites
[1]
Amazon S3 credential mode
[1]
Append Data write mode
[1]
CDC example
[1]
creating a managed table
[1]
creating a table
[1]
creating a table or managed table
[1]
Delete from Table write mode
[1]
overview
[1]
overwrite condition
[1]
Overwrite Data write mode
[1]
partitions
[1]
retrieve ADLS Gen2 authentication information
[1]
Update Table write mode
[1]
Upsert Using Merge write mode
[1]
write mode
[1]
writing to a local file system
[1]
Delta Lake Lookup processor
ADLS Gen2 prerequisites
[1]
Amazon S3 credential mode
[1]
retrieve ADLS Gen2 authentication information
[1]
using from a local file system
[1]
Delta Lake origin
ADLS Gen2 prerequisites
[1]
Amazon S3 credential mode
[1]
reading from a local file system
[1]
retrieve ADLS Gen2 authentication information
[1]
deployment mode
Hadoop YARN cluster
[1]
destinations
Amazon S3
[1]
Azure Event Hubs
[1]
Delta Lake
[1]
File
[1]
JDBC
[1]
Snowflake
[1]
directories
internal
[1]
protected
[1]
Transformer
[1]
directory path
File destination
[1]
File origin
[1]
disconnected mode
Control Hub
[1]
drivers
JDBC destination
[1]
JDBC Lookup processor
[1]
JDBC origin
[1]
JDBC Table origin
[1]
MySQL JDBC Table origin
[1]
Oracle JDBC Table origin
[1]
E
EMR
authentication method
[1]
Kerberos stage limitation
[1]
server-side encryption
[1]
SSE Key Management Service (KMS) requirement
[1]
Transformer installation location
[1]
EMR pipelines
force stop
[1]
encryption zones
using KMS to access HDFS encryption zones
[1]
environment variables
directories
[1]
modifying
[1]
execution engines
Transformer
[1]
[2]
execution mode
pipelines
[1]
expression language
constants
[1]
datetime variables
[1]
functions
[1]
literals
[1]
operator precedence
[1]
operators
[1]
reserved words
[1]
external libraries
manual install
[1]
manual installation
[1]
Package Manager installation
[1]
stage properties installation
[1]
F
Field Flattener processor
configuring
[1]
Field Order processor
configuring
[1]
overview
[1]
Field Remover processor
configuring
[1]
overview
[1]
Field Renamer processor
configuring
[1]
overview
[1]
rename methods
[1]
fields
referencing
[1]
file descriptors
increasing
[1]
File destination
configuring
[1]
data formats
[1]
directory path
[1]
overview
[1]
write mode
[1]
file functions
description
[1]
File origin
configuring
[1]
custom schema
[1]
data formats
[1]
directory path
[1]
overview
[1]
partitions
[1]
schema requirement
[1]
Filter processor
configuring
[1]
filter condition
[1]
overview
[1]
force stop
EMR pipelines
[1]
full outer join
Join processor
[1]
full read
Snowflake origin
[1]
functions
Base64 functions
[1]
credential
[1]
file functions
[1]
in the StreamSets expression language
[1]
job functions
[1]
math functions
[1]
miscellaneous functions
[1]
pipeline functions
[1]
string functions
[1]
time functions
[1]
G
garbage collection
Java
[1]
Google Big Query destination
merge properties
[1]
prerequisite
[1]
write mode
[1]
Google Big Query origin
incremental and full query mode
[1]
offset column and supported types
[1]
supported data types
[1]
H
Hadoop clusters
post-upgrade task
[1]
Hadoop impersonation mode
configuring KMS for encryption zones
[1]
lowercasing user names
[1]
overview
[1]
Hadoop YARN
cluster
[1]
deployment mode
[1]
directory requirements
[1]
driver requirement
[1]
impersonation
[1]
Kerberos authentication
[1]
heap dump creation
Transformer
[1]
heap size
configuring
[1]
history
pipeline run
[1]
Hive destination
additional Hive configuration properties
[1]
configuring
[1]
data drift column order
[1]
Hive origin
reading Delta Lake managed tables
[1]
Hortonworks clusters
post-upgrade task
[1]
HTTP or HTTPS proxy
for Control Hub
[1]
I
impersonation mode
Hadoop
[1]
incremental read
Snowflake origin
[1]
init scripts
Databricks provisioned clusters
[1]
inner join
Join processor
[1]
inputs variable
PySpark processor
[1]
Scala processor
[1]
[2]
installation
Azure
[1]
cloud
[1]
local
[1]
overview
[1]
requirements
[1]
Scala, Spark, and Java JDK requirements
[1]
Spark shuffle service requirement
[1]
Transformer
[1]
installation package
choosing Scala version
[1]
installation requirements
system
[1]
install from RPM
upgrade
[1]
install from tarball
upgrade
[1]
J
Java
garbage collection
[1]
Java configuration options
heap size
[1]
Transformer environment configuration
[1]
Java keystore
credential store
[1]
properties file
[1]
JDBC destination
configuring
[1]
driver installation
[1]
overview
[1]
partitions
[1]
tested versions and drivers
[1]
JDBC Lookup processor
driver installation
[1]
overview
[1]
tested versions and drivers
[1]
JDBC Query origin
configuring
[1]
driver installation
[1]
overview
[1]
tested versions and drivers
[1]
JDBC Table origin
configuring
[1]
driver installation
[1]
offset column
[1]
overview
[1]
partitions
[1]
supported offset data types
[1]
tested versions and drivers
[1]
job functions
description
[1]
Join processor
condition
[1]
configuring
[1]
criteria
[1]
cross join
[1]
full outer join
[1]
inner join
[1]
join types
[1]
left anti join
[1]
left outer join
[1]
left semi join
[1]
matching fields
[1]
overview
[1]
right anti join
[1]
right outer join
[1]
shuffling of data
[1]
join types
Join processor
[1]
JSON Parser processor
configuring
[1]
custom schema
[1]
error handling
[1]
overview
[1]
schema inference
[1]
K
Kafka destination
Kerberos authentication
[1]
security
[1]
SSL/TLS encryption
[1]
Kafka origin
custom schemas
[1]
Kerberos authentication
[1]
overview
[1]
security
[1]
SSL/TLS encryption
[1]
Kafka stages
enabling SASL
[1]
enabling SASL on SSL/TLS
[1]
enabling security
[1]
enabling SSL/TLS security
[1]
providing Kerberos credentials
[1]
security prerequisite tasks
[1]
Kerberos
credentials for Kafka stages
[1]
enabling
[1]
Kerberos authentication
Hadoop YARN cluster
[1]
Kafka destination
[1]
Kafka origin
[1]
Kerberos keytab
configuring in pipelines
[1]
Kudu origin
configuring
[1]
overview
[1]
L
LDAP authentication
configuring
[1]
left anti join
Join processor
[1]
left outer join
Join processor
[1]
left semi join
Join processor
[1]
literals
in the StreamSets expression language
[1]
log files
viewing and downloading
[1]
[2]
log level
modifying
[1]
logs
modifying log level
[1]
pipelines
[1]
Spark driver
[1]
Transformer
[1]
lookups
streaming example
[1]
M
MapR cluster
dynamic allocation requirement
[1]
MapR clusters
Hadoop impersonation prerequisite
[1]
pipeline start prerequisite
[1]
master instance
retrieving details
[1]
math functions
description
[1]
miscellaneous functions
description
[1]
monitoring
overview
[1]
pausing
[1]
Spark web UI
[1]
viewing statistics
[1]
MySQL JDBC Table origin
custom offset queries
[1]
default offset queries
[1]
driver installation
[1]
MySQL data types
[1]
null offset value handling
[1]
supported offset data types
[1]
O
offset column
Google Big Query origin
[1]
JDBC Table
[1]
offsets
resetting for the pipeline
[1]
skipping tracking
[1]
open file limit
configuring
[1]
operators
in the StreamSets expression language
[1]
precedence
[1]
Oracle JDBC Table origin
custom offset queries
[1]
default offset queries
[1]
driver installation
[1]
null offset value handling
[1]
Oracle data types
[1]
supported offset data types
[1]
origins
Amazon S3
[1]
Azure Event Hubs
[1]
File
[1]
JDBC Query
[1]
JDBC Table
[1]
Kafka
[1]
Kudu
[1]
Kudu origin
[1]
multiple
[1]
Snowflake
[1]
Whole Directory
[1]
output order
preview
[1]
output variable
PySpark processor
[1]
Scala processor
[1]
[2]
Overwrite Data write mode
Delta Lake destination
[1]
P
parameters
starting pipelines with
[1]
partitions
ADLS Gen2 origin
[1]
Amazon Redshift destination
[1]
Amazon S3 origin
[1]
Azure SQL destination
[1]
based on origins
[1]
Delta Lake destination
[1]
File origin
[1]
initial
[1]
JDBC destination
[1]
JDBC Table origin
[1]
Rank processor
[1]
pipeline functions
description
[1]
pipeline run
history
[1]
summary
[1]
pipelines
comparison with Data Collector
[1]
logs
[1]
monitoring
[1]
pause monitoring
[1]
previewing
[1]
run history
[1]
Spark configuration
[1]
starting with parameters
[1]
ports
default
[1]
PostgreSQL JDBC Table origin
custom offset queries
[1]
default offset queries
[1]
null offset value handling
[1]
PostgreSQL JDBC driver
[1]
supported data types
[1]
supported offset data types
[1]
post-upgrade task
enable the Spark shuffle service on clusters
[1]
update drivers on older Hadoop clusters
[1]
post-upgrade tasks
access Databricks job details
[1]
update ADLS stages in HDInsight pipelines
[1]
update keystore and truststore location
[1]
preprocessing script
pipeline
[1]
prerequisites
[1]
requirements
[1]
Spark-Scala prerequisites
[1]
prerequisites
Azure Event Hubs destination
[1]
Azure Event Hubs origin
[1]
for the Scala processor and preprocessing script
[1]
PySpark processor
[1]
preview
availability
[1]
color codes
[1]
configured cluster
[1]
editing properties
[1]
embedded Spark
[1]
output order
[1]
overview
[1]
pipeline
[1]
writing to destinations
[1]
processor
output order
[1]
processors
Aggregate
[1]
Deduplicate
[1]
Field Order
[1]
Field Remover
[1]
Field Renamer
[1]
Filter
[1]
JDBC Lookup
[1]
Join
[1]
JSON Parser
[1]
Profile
[1]
PySpark
[1]
Rank
[1]
referencing fields
[1]
Repartition
[1]
Scala
[1]
shuffling of data
[1]
Snowflake Lookup
[1]
Sort
[1]
Spark SQL Expression
[1]
Spark SQL Query
[1]
Stream Selector
[1]
Type Converter
[1]
union
[1]
Window
[1]
Profile processor
configuring
[1]
output records
[1]
overview
[1]
statistics
[1]
proxy users
Transformer
[1]
PySpark processor
configuring
[1]
custom code
[1]
Databricks prerequisites
[1]
EMR prerequisites
[1]
examples
[1]
input and output variables
[1]
other cluster and local pipeline prerequisites
[1]
overview
[1]
prerequisites
[1]
[2]
referencing fields
[1]
PySpark processor requirements for provisioned Databricks clusters
[1]
Q
query mode
Google Big Query origin
[1]
R
Rank processor
configuring
[1]
example
[1]
order by
[1]
overview
[1]
partition by
[1]
rank functions
[1]
shuffling of data
[1]
read mode
Snowflake origin
[1]
register
Transformer
[1]
release notes 4.0.x
[1]
release notes 4.1.x
[1]
remote debugging
Transformer
[1]
repartitioning
methods
[1]
overview
[1]
Repartition processor
coalesce by number repartition method
[1]
configuring
[1]
methods
[1]
overview
[1]
repartition by field range repartition method
[1]
repartition by number repartition method
[1]
shuffling of data
[1]
use cases
[1]
reserved words
in the StreamSets expression language
[1]
reverse proxy
configuring for Transformer
[1]
right anti join
Join processor
[1]
right outer join
Join processor
[1]
roles
for users with file-based authentication
[1]
RPM package
uninstallation
[1]
runtime parameters
calling from a pipeline
[1]
calling from checkboxes and drop-down menus
[1]
calling from scripting processors
[1]
calling from text boxes
[1]
defining
[1]
monitoring
[1]
viewing
[1]
runtime properties
defining
[1]
overview
[1]
runtime resources
calling from a pipeline
[1]
defining
[1]
runtime values
overview
[1]
S
Scala
choosing an Transformer installation package
[1]
Scala, Spark, and Java JDK requirements
installation
[1]
Scala processor
configuring
[1]
custom code
[1]
examples
[1]
input and output variables
[1]
inputs variable
[1]
output variable
[1]
overview
[1]
prerequisites
[1]
requirements
[1]
Spark-Scala prerequisite
[1]
Spark SQL queries
[1]
scripting processors
calling runtime values
[1]
scripts
preprocessing
[1]
security
Kafka destination
[1]
Kafka origin
[1]
server-side encryption
Amazon Redshift destination
[1]
Amazon S3 destination
[1]
EMR clusters
[1]
shuffling
overview
[1]
simple edit mode
description
[1]
Slowly Changing Dimension processor
configuring
[1]
pipeline processing
[1]
Slowly Changing Dimensions processor
pipeline
[1]
Snowflake destination
merge properties
[1]
overview
[1]
write mode
[1]
Snowflake Lookup processor
overview
[1]
Snowflake origin
full query guidelines
[1]
incremental or full read
[1]
incremental query guidelines
[1]
overview
[1]
read mode
[1]
SQL query guidelines
[1]
sorting
multiple fields
[1]
Sort processor
configuring
[1]
multiple fields
[1]
overview
[1]
Spark configuration
pipelines
[1]
Spark history server
monitoring
[1]
Spark processing
description
[1]
Spark SQL Expression processor
overview
[1]
Spark SQL processor
configuring
[1]
Spark SQL query
syntax
[1]
Spark SQL Query processor
configuring
[1]
examples
[1]
overview
[1]
query syntax
[1]
referencing fields
[1]
Spark web UI
monitoring
[1]
SQL query
guidelines for the Snowflake origin
[1]
SQL Server 2019 BDC
cluster
[1]
JDBC connection information
[1]
master instance details for JDBC
[1]
quick start deployment script
[1]
retrieving information
[1]
SQL Server JDBC Table origin
configuring
[1]
custom offset queries
[1]
default offset queries
[1]
null offset value handling
[1]
SQL Server JDBC driver
[1]
supported data types
[1]
supported offset data types
[1]
SSL/TLS encryption
Kafka destination
[1]
Kafka origin
[1]
statistics
pipeline
[1]
Profile processor
[1]
stages
[1]
streaming pipelines
case study
[1]
description
[1]
Stream Selector processor
conditions
[1]
configuring
[1]
default stream
[1]
overview
[1]
StreamSets Control Hub
disconnected mode
[1]
StreamSets for Databricks
installation on Azure
[1]
string functions
description
[1]
T
tarball
uninstallation
[1]
Technology Preview functionality
description
[1]
time functions
description
[1]
Transformer
activation code
[1]
architecture
[1]
description
[1]
directories
[1]
disconnected mode
[1]
environment variables
[1]
execution engine
[1]
[2]
for Data Collector users
[1]
heap dump creation
[1]
installation
[1]
Java configuration options
[1]
launching
[1]
proxy users
[1]
registering
[1]
release notes
[1]
remote debugging
[1]
restarting
[1]
spark-submit
[1]
starting
[1]
starting as service
[1]
starting manually
[1]
uninstallation
[1]
viewing and downloading log data
[1]
[2]
viewing configuration properties
[1]
TRANSFORMER_CONF
environment variable
[1]
TRANSFORMER_DATA
environment variable
[1]
TRANSFORMER_DIST
environment variable
[1]
TRANSFORMER_JAVA_OPTS
Java environment variable
[1]
TRANSFORMER_LOG
environment variable
[1]
TRANSFORMER_RESOURCES
environment variable
[1]
TRANSFORMER_ROOT_CLASSPATH
Java environment variable
[1]
Transformer libraries
removing from Databricks
[1]
Transformer metrics
viewing
[1]
troubleshooting
origin errors
[1]
Type Converter processor
configuring
[1]
field type conversion
[1]
overview
[1]
U
ulimit
configuring
[1]
uninstallation
RPM package
[1]
tarball
[1]
Transformer
[1]
union processor
overview
[1]
Update Table write mode
Delta Lake destination
[1]
upgrade
installation from RPM
[1]
installation from tarball
[1]
troubleshooting
[1]
Upsert Using Merge write mode
Delta Lake destination
[1]
usage statistics
opting out
[1]
users
creating for file-based authentication
[1]
default for file-based authentication
[1]
roles for file-based authentication
[1]
V
validation
implicit and explicit
[1]
W
Whole Directory origin
data formats
[1]
overview
[1]
Window processor
conditions
[1]
configuring
[1]
overview
[1]
window types
[1]
window types
Window processor
[1]
write mode
Delta Lake destination
[1]
Google Big Query destination
[1]
Snowflake destination
[1]
© Copyright IBM Corporation