Processors Overview

Processors transform data based on the incoming data and the configuration properties that you define. Processors are named based on their primary function.

Some processors, such as Cube and Pivot, perform calculations using Snowflake functions. Some simplify common tasks, such as the Column Type Converter processor. Some help you construct the pipeline, such as the Join and Stream Selector processors. And some processors, such as Apply Functions and Snowflake SQL Query, provide ways to extend processing to any task that you might require.

You can connect processors to perform the processing that you want, in the most logical order. For example, if you want to remove columns from the data, you might use the Column Remover processor early in the pipeline to avoid performing unnecessary processing on columns that are going to be dropped.

Transformer for Snowflake currently provides the following processors:
  • Aggregate - Performs aggregate calculations.
  • Apply Functions - Applies built-in Snowflake functions or user-defined functions to all column names that match a regular expression.
  • Column Order - Moves the specified columns to the listed order.
  • Column Remover - Removes columns from rows.
  • Column Renamer - Renames columns.
  • Column Transformer - Performs row-based calculations to transform column data.
  • Column Type Converter - Converts the data types of specified columns to compatible types.
  • Cube - Performs Snowflake Group By Cube calculations on specified columns.
  • Deduplicate - Removes duplicate rows.
  • Filter - Passes only the rows that match a filter condition.
  • Join - Joins related data from two inputs, generating rows that include data from both inputs.
  • JSON Parser - Parses JSON data embedded in a column.
  • Null Handling - Replaces null values.
  • Pivot - Rotates a table by turning unique values in a column into new columns and aggregating the results.
  • Rollup - Performs Snowflake Group By Rollup aggregate calculations.
  • Sample - Generates a sample subset of the incoming data.
  • Slowly Changing Dimension - Generates updates for a slowly changing dimension.
  • Snowflake SQL Query - Performs a Snowflake SQL query to transform data.
  • Sort - Sorts data based on specified columns.
  • Stream Selector - Routes data to output streams based on conditions.
  • Union - Combines data from two or more incoming data streams.
  • Unpivot - Rotates a table by converting specified columns into rows.
  • Window Function - Performs window calculations on groups of rows.

Column Selection

When configuring processors, you can select the columns to use or you can enter them manually.

Selecting columns requires previewing pipeline data. When you select the columns to use, you specify whether to run preview for the entire pipeline or up to the stage that you are configuring.

Transformer for Snowflake caches preview data for reuse, so you can use the same preview data to specify columns in multiple stages. When necessary, you can re-run the preview to refresh preview data. You might re-run a preview if you change important upstream details, such as the table that the pipeline reads or the columns included in the read.

To select one or more columns for a processor property, click the Select column from schema icon to the right of the property. If preview data is not available, specify the type of preview to run.

If preview data is available, the columns simply display for selection. If the columns that display are not correct, click Rerun preview to update schema.

To preview data, all required properties for the pipeline must be defined. For more information about preview availability or additional details about selecting columns from preview data, see the Control Hub documentation.