Overview

A processor stage represents a type of data processing that you want to perform. You can use as many processors in a pipeline as you need.

You can use the following processors in a Transformer pipeline:

Aggregate - Performs aggregate calculations.
Deduplicate - Removes duplicate records.
Delta Lake Lookup - Performs a lookup on a Delta Lake table.
Field Order - Orders the specified fields and drops unlisted fields from the pipeline.
Field Remover - Removes fields from a record.
Field Renamer - Renames fields in a record.
Filter - Passes only the records that match a filter condition.
JDBC Lookup - Performs a lookup on a database table.
Join - Joins data from two input streams.
Profile - Calculates descriptive statistics for string and numeric data.
PySpark - Uses custom PySpark code to transform data.
Rank - Performs rank calculations for every input record based on a group of records.
Repartition - Changes how pipeline data is partitioned.
Scala - Uses custom Scala code to transform data.
Slowly Changing Dimension - Generates updates for a slowly changing dimension.
Snowflake Lookup - Performs a lookup on a Snowflake table.
Sort - Sorts incoming data based on specified fields.
Spark SQL Expression - Performs record-level calculations using Spark SQL expressions.
Spark SQL Query - Runs a Spark SQL query to transform data.
Stream Selector - Routes data to output streams based on conditions.
Type Converter - Converts the data types of specified fields to compatible types.
Union - Combines data with matching schemas from two or more incoming data streams.
Window - Produces new batches of data from incoming batches based on a specified window type. Use in streaming pipelines only.