Overview

A processor stage represents a type of data processing that you want to perform. You can use as many processors in a pipeline as you need.

You can use the following processors in a Transformer pipeline:
  • Aggregate - Performs aggregate calculations.
  • Deduplicate - Removes duplicate records.
  • Delta Lake Lookup - Performs a lookup on a Delta Lake table.
  • Field Flattener - Flattens map fields.
  • Field Order - Orders the specified fields and drops unlisted fields from the pipeline.
  • Field Remover - Removes fields from a record.
  • Field Renamer - Renames fields in a record.
  • Field Replacer- Replaces field values in a record.
  • Filter - Passes only the records that match a filter condition.
  • JDBC Lookup - Performs a lookup on a database table.
  • Join - Joins data from two input streams.
  • JSON Parser - Parses a JSON object embedded in a string field.
  • Pivot - Pivots data in a list field and creates a record for each item in the field.
  • Profile - Calculates descriptive statistics for string and numeric data.
  • PySpark - Uses custom PySpark code to transform data.
  • Rank - Performs rank calculations for every input record based on a group of records.
  • Repartition - Changes how pipeline data is partitioned.
  • Scala - Uses custom Scala code to transform data.
  • Slowly Changing Dimension - Generates updates for a slowly changing dimension.
  • Snowflake Lookup - Performs a lookup on a Snowflake table.
  • Sort - Sorts incoming data based on specified fields.
  • Spark SQL Expression - Performs record-level calculations using Spark SQL expressions.
  • Spark SQL Query - Runs a Spark SQL query to transform data.
  • Stream Selector - Routes data to output streams based on conditions.
  • Surrogate Key Generator - Generates a unique surrogate key for each record.
  • Type Converter - Converts the data types of specified fields to compatible types.
  • Union - Combines data with matching schemas from two or more incoming data streams.
  • Window - Produces new batches of data from incoming batches based on a specified window type. Use in streaming pipelines only.
  • XML Parser - Parses an XML object embedded in a string field.