Overview
A processor stage represents a type of data processing that you want to perform. You can use as many processors in a pipeline as you need.
You can use the following processors in a Transformer
pipeline:
- Aggregate - Performs aggregate calculations.
- Deduplicate - Removes duplicate records.
- Delta Lake Lookup - Performs a lookup on a Delta Lake table.
- Field Flattener - Flattens map fields.
- Field Order - Orders the specified fields and drops unlisted fields from the pipeline.
- Field Remover - Removes fields from a record.
- Field Renamer - Renames fields in a record.
- Field Replacer- Replaces field values in a record.
- Filter - Passes only the records that match a filter condition.
- JDBC Lookup - Performs a lookup on a database table.
- Join - Joins data from two input streams.
- JSON Parser - Parses a JSON object embedded in a string field.
- Pivot - Pivots data in a list field and creates a record for each item in the field.
- Profile - Calculates descriptive statistics for string and numeric data.
- PySpark - Uses custom PySpark code to transform data.
- Rank - Performs rank calculations for every input record based on a group of records.
- Repartition - Changes how pipeline data is partitioned.
- Scala - Uses custom Scala code to transform data.
- Slowly Changing Dimension - Generates updates for a slowly changing dimension.
- Snowflake Lookup - Performs a lookup on a Snowflake table.
- Sort - Sorts incoming data based on specified fields.
- Spark SQL Expression - Performs record-level calculations using Spark SQL expressions.
- Spark SQL Query - Runs a Spark SQL query to transform data.
- Stream Selector - Routes data to output streams based on conditions.
- Surrogate Key Generator - Generates a unique surrogate key for each record.
- Type Converter - Converts the data types of specified fields to compatible types.
- Union - Combines data with matching schemas from two or more incoming data streams.
- Window - Produces new batches of data from incoming batches based on a specified window type. Use in streaming pipelines only.
- XML Parser - Parses an XML object embedded in a string field.