StreamSets Platform - Transformer Engine Guide
Index
A
B
C
D
E
F
G
H
I
J
K
L
M
O
P
Q
R
S
T
U
W
A
ADLS Gen1 destination
configuring
[1]
data formats
[1]
overview
[1]
prerequisites
[1]
retrieve authentication information
[1]
write mode
[1]
ADLS Gen1 origin
configuring
[1]
data formats
[1]
overview
[1]
partitions
[1]
prerequisites
[1]
retrieve authentication information
[1]
schema requirement
[1]
ADLS Gen2 destination
configuring
[1]
data formats
[1]
overview
[1]
prerequisites
[1]
retrieve configuration details
[1]
write mode
[1]
ADLS Gen2 origin
configuring
[1]
data formats
[1]
overview
[1]
partitions
[1]
prerequisites
[1]
retrieve configuration details
[1]
schema requirement
[1]
ADLS stages
local pipeline prerequisites
[1]
Aggregate processor
aggregate functions
[1]
configuring
[1]
default output fields
[1]
example
[1]
overview
[1]
shuffling of data
[1]
Amazon EMR EMR
[1]
Amazon Redshift destination
AWS credentials and write requirements
[1]
configuring
[1]
installing the JDBC driver
[1]
partitions
[1]
server-side encryption
[1]
write mode
[1]
Amazon S3 destination
authentication method
[1]
AWS credentials
[1]
data formats
[1]
overview
[1]
server-side encryption
[1]
write mode
[1]
Amazon S3 origin
authentication method
[1]
AWS credentials
[1]
data formats
[1]
overview
[1]
partitions
[1]
Amazon S3 stages
local pipeline prerequisites
[1]
Append Data write mode
Delta Lake destination
[1]
authentication method
Amazon S3
[1]
[2]
AWS credentials
Amazon S3
[1]
[2]
AWS Secrets Manager
credential store
[1]
properties file
[1]
stage library
[1]
AWS Secrets Manager access
overview
[1]
Azure Event Hubs destination
configuring
[1]
data formats
[1]
overview
[1]
prerequisites
[1]
Azure Event Hubs origin
configuring
[1]
default and specific offsets
[1]
overview
[1]
prerequisites
[1]
Azure Key Vault
credential store
[1]
credential store, prerequisites
[1]
properties file
[1]
stage library
[1]
[2]
Azure Key Vault access
overview
[1]
prerequisites
[1]
Azure SQL destination
partitions
[1]
Azure SQLL destination
write mode
[1]
B
Base64 functions
description
[1]
basic syntax
for expressions
[1]
batch pipelines
case study
[1]
description
[1]
bootstrap actions
EMR provisioned clusters
[1]
bulk edit mode
description
[1]
C
caching
for origins and processors
[1]
ludicrous mode
[1]
case study
batch pipelines
[1]
streaming pipelines
[1]
CDC writes
Delta lake destination
[1]
client deployment mode
Hadoop YARN cluster
[1]
cluster
callback URL
[1]
Dataproc
[1]
EMR
[1]
Hadoop YARN
[1]
running pipelines
[1]
SQL Server 2019 BDC
[1]
cluster configuration
Databricks instance pool
[1]
Databricks pipelines
[1]
cluster deployment mode
Hadoop YARN cluster
[1]
command line interface
jks-credentialstore command
[1]
stagelib-cli command
[1]
conditions
Delta Lake destination
[1]
Filter processor
[1]
Join processor
[1]
Stream Selector processor
[1]
Window processor
[1]
configuring
Snowflake origin
[1]
constants
in the StreamSets expression language
[1]
credential stores
AWS Secrets Manager
[1]
Azure Key Vault
[1]
CyberArk
[1]
enabling
[1]
functions to access
[1]
Java keystore
[1]
overview
[1]
cross join
Join processor
[1]
custom schemas
application to JSON and delimited data
[1]
DDL schema format
[1]
[2]
error handling
[1]
JSON schema format
[1]
[2]
origins
[1]
CyberArk
credential store
[1]
properties file
[1]
CyberArk access
overview
[1]
D
Databricks
init scripts for provisioned clusters
[1]
provisioned cluster configuration
[1]
provisioned cluster with instance pool
[1]
uninstalling old Transformer libraries
[1]
Databricks init scripts
access keys for ABFSS
[1]
Databricks pipelines
existing cluster
[1]
job details
[1]
provisioned cluster
[1]
[2]
data formats
ADLS Gen1 destination
[1]
ADLS Gen1 origin
[1]
ADLS Gen2 destination
[1]
ADLS Gen2 origin
[1]
Amazon S3 destination
[1]
Amazon S3 origin
[1]
Azure Event Hubs destination
[1]
File destination
[1]
File origin
[1]
Whole Directory origin
[1]
Dataproc
cluster
[1]
credentials
[1]
credentials in a file
[1]
credentials in a property
[1]
default credentials
[1]
Dataproc pipelines
existing cluster
[1]
data types
in preview
[1]
Transformer
[1]
datetime variables
in the StreamSets expression language
[1]
Deduplicate processor
configuring
[1]
overview
[1]
default output fields
Aggregate processor
[1]
default stream
Stream Selector
[1]
Delete from Table write mode
Delta Lake destination
[1]
delivery guarantee
pipelines
[1]
Delta Lake destination
ADLS Gen1 prerequisites
[1]
ADLS Gen2 prerequisites
[1]
Amazon S3 credential mode
[1]
Append Data write mode
[1]
CDC example
[1]
configuring
[1]
creating a managed table
[1]
creating a table
[1]
creating a table or managed table
[1]
Delete from Table write mode
[1]
overview
[1]
overwrite condition
[1]
Overwrite Data write mode
[1]
partitions
[1]
retrieve ADLS Gen1 authentication information
[1]
retrieve ADLS Gen2 authentication information
[1]
Update Table write mode
[1]
Upsert Using Merge write mode
[1]
write mode
[1]
writing to a local file system
[1]
Delta Lake Lookup processor
ADLS Gen2 prerequisites
[1]
Amazon S3 credential mode
[1]
configuring
[1]
overview
[1]
retrieve ADLS Gen1 authentication information
[1]
retrieve ADLS Gen2 authentication information
[1]
storage systems
[1]
[2]
using from a local file system
[1]
Delta Lake origin
ADLS Gen1 prerequisites
[1]
[2]
ADLS Gen2 prerequisites
[1]
Amazon S3 credential mode
[1]
overview
[1]
[2]
reading from a local file system
[1]
retrieve ADLS Gen1 authentication information
[1]
retrieve ADLS Gen2 authentication information
[1]
storage systems
[1]
deployment mode
Hadoop YARN cluster
[1]
destinations
ADLS G1
[1]
ADLS G2
[1]
Amazon S3
[1]
Azure Event Hubs
[1]
Delta Lake
[1]
Elasticsearch
[1]
File
[1]
JDBC
[1]
Snowflake
[1]
directories
internal
[1]
protected
[1]
Transformer
[1]
directory path
File destination
[1]
File origin
[1]
drivers
JDBC destination
[1]
JDBC Lookup processor
[1]
JDBC origin
[1]
JDBC Table origin
[1]
MySQL JDBC Table origin
[1]
Oracle JDBC Table origin
[1]
E
Elasticsearch destination
configuring
[1]
overview
[1]
overwrite partition prerequisite
[1]
partitions
[1]
write mode
[1]
EMR
authentication method
[1]
base URI and staging directory
[1]
bootstrap actions for provisioned clusters
[1]
cluster
[1]
Kerberos stage limitation
[1]
provisioned cluster
[1]
server-side encryption
[1]
SSE Key Management Service (KMS) requirement
[1]
Transformer installation location
[1]
EMR jobs
force stop
[1]
encryption zones
using KMS to access HDFS encryption zones
[1]
execution engines
Transformer
[1]
execution mode
pipelines
[1]
executors
Spark
[1]
expression language
constants
[1]
datetime variables
[1]
functions
[1]
literals
[1]
operator precedence
[1]
operators
[1]
reserved words
[1]
expressions
in pipeline and stage properties
[1]
F
Field Flattener processor
configuring
[1]
Field Order processor
configuring
[1]
overview
[1]
Field Remover processor
configuring
[1]
overview
[1]
Field Renamer processor
configuring
[1]
overview
[1]
rename methods
[1]
fields
referencing
[1]
file descriptors
increasing
[1]
File destination
configuring
[1]
data formats
[1]
directory path
[1]
overview
[1]
write mode
[1]
file functions
description
[1]
File origin
configuring
[1]
custom schema
[1]
data formats
[1]
directory path
[1]
overview
[1]
partitions
[1]
schema requirement
[1]
Filter processor
configuring
[1]
filter condition
[1]
overview
[1]
force stop
EMR jobs
[1]
full outer join
Join processor
[1]
full read
Snowflake origin
[1]
functions
Base64 functions
[1]
credential
[1]
file functions
[1]
in the StreamSets expression language
[1]
job functions
[1]
math functions
[1]
miscellaneous functions
[1]
pipeline functions
[1]
string functions
[1]
time functions
[1]
G
garbage collection
Java
[1]
Google Big Query destination
merge properties
[1]
prerequisite
[1]
write mode
[1]
Google Big Query origin
incremental and full query mode
[1]
offset column and supported types
[1]
supported data types
[1]
H
Hadoop impersonation mode
configuring KMS for encryption zones
[1]
lowercasing user names
[1]
overview
[1]
Hadoop YARN
cluster
[1]
deployment mode
[1]
directory requirements
[1]
driver requirement
[1]
impersonation
[1]
Kerberos authentication
[1]
heap size
configuring
[1]
Hive destination
additional Hive configuration properties
[1]
configuring
[1]
data drift column order
[1]
Hive origin
reading Delta Lake managed tables
[1]
HTTPS protocol
enabling
[1]
I
impersonation mode
Hadoop
[1]
incremental read
Snowflake origin
[1]
init scripts
Databricks provisioned clusters
[1]
inner join
Join processor
[1]
inputs variable
PySpark processor
[1]
Scala processor
[1]
[2]
installation
overview
[1]
requirements
[1]
Scala, Spark, and Java JDK requirements
[1]
Spark shuffle service requirement
[1]
installation package
choosing Scala version
[1]
installation requirements
system
[1]
J
Java
garbage collection
[1]
Java configuration options
heap size
[1]
Java keystore
credential store
[1]
properties file
[1]
JDBC destination
configuring
[1]
driver installation
[1]
overview
[1]
partitions
[1]
tested versions and drivers
[1]
write mode
[1]
JDBC Lookup processor
configuring
[1]
driver installation
[1]
overview
[1]
tested versions and drivers
[1]
JDBC Query origin
configuring
[1]
driver installation
[1]
overview
[1]
tested versions and drivers
[1]
JDBC Table origin
configuring
[1]
driver installation
[1]
offset column
[1]
overview
[1]
partitions
[1]
supported offset data types
[1]
tested versions and drivers
[1]
job functions
description
[1]
Join processor
condition
[1]
configuring
[1]
criteria
[1]
cross join
[1]
full outer join
[1]
inner join
[1]
join types
[1]
left anti join
[1]
left outer join
[1]
left semi join
[1]
matching fields
[1]
overview
[1]
right anti join
[1]
right outer join
[1]
shuffling of data
[1]
join types
Join processor
[1]
JSON Parser processor
configuring
[1]
custom schema
[1]
error handling
[1]
overview
[1]
schema inference
[1]
K
Kafka destination
Kerberos authentication
[1]
security
[1]
SSL/TLS encryption
[1]
Kafka origin
custom schemas
[1]
Kerberos authentication
[1]
overview
[1]
security
[1]
SSL/TLS encryption
[1]
Kafka stages
enabling SASL
[1]
enabling SASL on SSL/TLS
[1]
enabling security
[1]
enabling SSL/TLS security
[1]
providing Kerberos credentials
[1]
security prerequisite tasks
[1]
Kerberos
credentials for Kafka stages
[1]
enabling
[1]
Kerberos authentication
Hadoop YARN cluster
[1]
Kafka destination
[1]
Kafka origin
[1]
Kerberos keytab
configuring in pipelines
[1]
Kudu origin
configuring
[1]
overview
[1]
L
left anti join
Join processor
[1]
left outer join
Join processor
[1]
left semi join
Join processor
[1]
literals
in the StreamSets expression language
[1]
log files
viewing and downloading
[1]
[2]
logs
pipelines
[1]
Spark driver
[1]
Transformer
[1]
lookups
streaming example
[1]
ludicrous mode
caching
[1]
optimizing pipeline performance
[1]
pipeline statistics
[1]
M
MapR cluster
dynamic allocation requirement
[1]
MapR clusters
Hadoop impersonation prerequisite
[1]
pipeline start prerequisite
[1]
master instance
retrieving details
[1]
math functions
description
[1]
miscellaneous functions
description
[1]
monitoring
Spark web UI
[1]
MySQL JDBC Table origin
custom offset queries
[1]
default offset queries
[1]
driver installation
[1]
MySQL data types
[1]
null offset value handling
[1]
supported offset data types
[1]
O
offset column
Google Big Query origin
[1]
JDBC Table
[1]
offsets
overview
[1]
resetting for the pipeline
[1]
skipping tracking
[1]
open file limit
configuring
[1]
operators
in the StreamSets expression language
[1]
precedence
[1]
Oracle JDBC Table origin
custom offset queries
[1]
default offset queries
[1]
driver installation
[1]
null offset value handling
[1]
Oracle data types
[1]
supported offset data types
[1]
origins
ADLS Gen1
[1]
ADLS Gen2
[1]
Amazon S3
[1]
Azure Event Hubs
[1]
caching
[1]
Delta Lake
[1]
Delta Lake origin
[1]
File
[1]
JDBC Query
[1]
JDBC Table
[1]
Kafka
[1]
Kudu
[1]
Kudu origin
[1]
multiple
[1]
overview
[1]
Snowflake
[1]
Whole Directory
[1]
output variable
PySpark processor
[1]
Scala processor
[1]
[2]
Overwrite Data write mode
Delta Lake destination
[1]
P
partitioning
overview
[1]
partitions
ADLS Gen1 origin
[1]
ADLS Gen2 origin
[1]
Amazon Redshift destination
[1]
Amazon S3 origin
[1]
Azure SQL destination
[1]
based on origins
[1]
changing
[1]
Delta Lake destination
[1]
Elasticsearch destination
[1]
File origin
[1]
initial
[1]
initial number
[1]
JDBC destination
[1]
JDBC Table origin
[1]
Rank processor
[1]
pipeline functions
description
[1]
pipeline offsets offsets
[1]
pipeline properties
using expressions
[1]
pipelines
delivery guarantee
[1]
logs
[1]
Spark configuration
[1]
Spark executors
[1]
stage library match requirement
[1]
ports
default
[1]
PostgreSQL JDBC Table origin
custom offset queries
[1]
default offset queries
[1]
null offset value handling
[1]
PostgreSQL JDBC driver
[1]
supported data types
[1]
supported offset data types
[1]
post-upgrade tasks
access Databricks job details
[1]
update ADLS stages in HDInsight pipelines
[1]
update keystore and truststore location
[1]
preprocessing script
pipeline
[1]
prerequisites
[1]
requirements
[1]
Spark-Scala prerequisites
[1]
prerequisites
ADLS and Amazon S3 stages
[1]
Azure Event Hubs destination
[1]
Azure Event Hubs origin
[1]
for the Scala processor and preprocessing script
[1]
PySpark processor
[1]
stage-related
[1]
processing mode
ludicrous mode versus standard
[1]
processors
Aggregate
[1]
caching
[1]
Deduplicate
[1]
Delta Lake Lookup
[1]
Field Order
[1]
Field Remover
[1]
Field Renamer
[1]
Filter
[1]
JDBC Lookup
[1]
Join
[1]
JSON Parser
[1]
Profile
[1]
PySpark
[1]
Rank
[1]
referencing fields
[1]
Repartition
[1]
Scala
[1]
shuffling of data
[1]
Snowflake Lookup
[1]
Sort
[1]
Spark SQL Expression
[1]
Spark SQL Query
[1]
Stream Selector
[1]
Type Converter
[1]
union
[1]
Window
[1]
Profile processor
configuring
[1]
output records
[1]
overview
[1]
statistics
[1]
proxy server
Transformer
[1]
proxy users
Transformer
[1]
PySpark processor
configuring
[1]
custom code
[1]
Databricks prerequisites
[1]
EMR prerequisites
[1]
examples
[1]
input and output variables
[1]
other cluster and local pipeline prerequisites
[1]
overview
[1]
prerequisites
[1]
[2]
referencing fields
[1]
PySpark processor requirements for provisioned Databricks clusters
[1]
Q
query mode
Google Big Query origin
[1]
R
Rank processor
configuring
[1]
example
[1]
order by
[1]
overview
[1]
partition by
[1]
rank functions
[1]
shuffling of data
[1]
read mode
Snowflake origin
[1]
release notes 4.0.x
[1]
release notes 4.1.x
[1]
remote debugging
Transformer
[1]
repartitioning
methods
[1]
overview
[1]
Repartition processor
coalesce by number repartition method
[1]
configuring
[1]
methods
[1]
overview
[1]
repartition by field range repartition method
[1]
repartition by number repartition method
[1]
shuffling of data
[1]
use cases
[1]
reserved words
in the StreamSets expression language
[1]
right anti join
Join processor
[1]
right outer join
Join processor
[1]
runtime parameters
calling from scripting processors
[1]
runtime properties
calling from a pipeline
[1]
defining
[1]
overview
[1]
runtime resources
calling from a pipeline
[1]
defining
[1]
runtime values
overview
[1]
S
Scala
choosing an Transformer engine version
[1]
Scala, Spark, and Java JDK requirements
installation
[1]
Scala processor
configuring
[1]
custom code
[1]
examples
[1]
input and output variables
[1]
inputs variable
[1]
output variable
[1]
overview
[1]
prerequisites
[1]
requirements
[1]
Spark-Scala prerequisite
[1]
Spark SQL queries
[1]
scripting processors
calling runtime values
[1]
scripts
preprocessing
[1]
security
Kafka destination
[1]
Kafka origin
[1]
server-side encryption
Amazon Redshift destination
[1]
Amazon S3 destination
[1]
EMR clusters
[1]
shuffling
overview
[1]
simple edit mode
description
[1]
Slowly Changing Dimension processor
configuring
[1]
pipeline processing
[1]
Slowly Changing Dimensions processor
pipeline
[1]
Snowflake destination
configuring
[1]
merge properties
[1]
overview
[1]
required privileges
[1]
role
[1]
write mode
[1]
Snowflake Lookup processor
configuring
[1]
overview
[1]
pushdown optimization
[1]
required privileges
[1]
role
[1]
Snowflake origin
configuring
[1]
full query guidelines
[1]
incremental or full read
[1]
incremental query guidelines
[1]
overview
[1]
pushdown optimization
[1]
read mode
[1]
required privileges
[1]
role
[1]
SQL query guidelines
[1]
sorting
multiple fields
[1]
Sort processor
configuring
[1]
multiple fields
[1]
overview
[1]
Spark cluster
callback URl
[1]
Transformer URL
[1]
Spark configuration
pipelines
[1]
Spark executors
maximum
[1]
Spark processing
description
[1]
Spark SQL Expression processor
overview
[1]
Spark SQL processor
configuring
[1]
Spark SQL query
syntax
[1]
Spark SQL Query processor
configuring
[1]
examples
[1]
overview
[1]
query syntax
[1]
referencing fields
[1]
Spark web UI
monitoring
[1]
SQL query
guidelines for the Snowflake origin
[1]
SQL Server 2019 BDC
cluster
[1]
JDBC connection information
[1]
master instance details for JDBC
[1]
retrieving information
[1]
SQL Server JDBC Table origin
configuring
[1]
custom offset queries
[1]
default offset queries
[1]
null offset value handling
[1]
SQL Server JDBC driver
[1]
supported data types
[1]
supported offset data types
[1]
SSL/TLS encryption
Kafka destination
[1]
Kafka origin
[1]
stage libraries
AWS Secrets Manager Credentials Store
[1]
Azure Key Vault Credentials Store
[1]
[2]
stage library match requirement
in a pipeline
[1]
stage properties
using expressions
[1]
staging directory
EMR pipelines
[1]
statistics
Profile processor
[1]
streaming pipelines
case study
[1]
description
[1]
Stream Selector processor
conditions
[1]
configuring
[1]
default stream
[1]
overview
[1]
string functions
description
[1]
T
time functions
description
[1]
Transformer
architecture
[1]
description
[1]
directories
[1]
execution engine
[1]
Java configuration options
[1]
proxy server
[1]
proxy users
[1]
release notes
[1]
remote debugging
[1]
spark-submit
[1]
starting manually
[1]
viewing and downloading log data
[1]
[2]
Transformer libraries
removing from Databricks
[1]
troubleshooting
origin errors
[1]
pipeline errors
[1]
Type Converter processor
configuring
[1]
field type conversion
[1]
overview
[1]
U
ulimit
configuring
[1]
union processor
overview
[1]
Update Table write mode
Delta Lake destination
[1]
Upsert Using Merge write mode
Delta Lake destination
[1]
URL
cluster callback
[1]
W
Whole Directory origin
data formats
[1]
overview
[1]
Window processor
conditions
[1]
configuring
[1]
overview
[1]
window types
[1]
window types
Window processor
[1]
write mode
Azure SQL destination
[1]
Delta Lake destination
[1]
Google Big Query destination
[1]
JDBC destination
[1]
Snowflake destination
[1]
© 2023 StreamSets, Inc.