Overview
A processor stage represents a type of data processing that you want to perform. You can use as many processors in a pipeline as you need.
You can use the following processors in a Transformer
pipeline:
- Aggregate - Performs aggregate calculations.
- Deduplicate - Removes duplicate records.
- Delta Lake Lookup - Performs a lookup on a Delta Lake table.
- Field Order - Orders the specified fields and drops unlisted fields from the pipeline.
- Field Remover - Removes fields from a record.
- Field Renamer - Renames fields in a record.
- Filter - Passes only the records that match a filter condition.
- JDBC Lookup - Performs a lookup on a database table.
- Join - Joins data from two input streams.
- Profile - Calculates descriptive statistics for string and numeric data.
- PySpark - Uses custom PySpark code to transform data.
- Rank - Performs rank calculations for every input record based on a group of records.
- Repartition - Changes how pipeline data is partitioned.
- Scala - Uses custom Scala code to transform data.
- Slowly Changing Dimension - Generates updates for a slowly changing dimension.
- Snowflake Lookup - Performs a lookup on a Snowflake table.
- Sort - Sorts incoming data based on specified fields.
- Spark SQL Expression - Performs record-level calculations using Spark SQL expressions.
- Spark SQL Query - Runs a Spark SQL query to transform data.
- Stream Selector - Routes data to output streams based on conditions.
- Type Converter - Converts the data types of specified fields to compatible types.
- Union - Combines data with matching schemas from two or more incoming data streams.
- Window - Produces new batches of data from incoming batches based on a specified window type. Use in streaming pipelines only.