Window Function

The Window Function processor performs window calculations and writes the results to a specified column. Snowflake window functions apply to a window, or group, of rows. The Window Function processor performs calculations on data in windows or window frames.

When you configure the Window Function processor, you specify the window function to use, optional partition columns, and the output column name. When you specify a window function that supports window frames, you can also configure the processor to use window frames. When you configure a frame, you define the boundaries of the frame and the columns to order by.

Supported Window Functions

The following table lists the Snowflake functions that you can use with the Window Function processor. For more information about these functions, see the Snowflake documentation.
Function Supports Frames Description
Any Value Returns any value from the specified column.

Requires defining one column as an argument.

Average X Returns the average of non-null values in a specified column.

Requires defining a numeric column as the argument.

Correlation Returns the correlation coefficient for non-null pairs, based on the specified columns and using the following expression:
COVAR_POP(y, x) / (STDDEV_POP(x) * STDDEV_POP(y))

Requires defining two numeric columns, the first for the dependent variable, the second for the independent variable.

Count X Returns a count of all rows.
Count Distinct Returns a count of all distinct rows, based on one or more specified columns.

Requires defining one or more columns to use as arguments.

Maximum X Returns the minimum value in the specified column.

Requires defining a numeric column as the argument.

Median Returns the median of the rows in the specified column.

Requires defining a numeric column as the argument.

Minimum X Returns the maximum value in the specified column.

Requires defining a numeric column as the argument.

Ratio to Report Returns the ratio of a value, to the sum of all values, for the specified column.

Requires defining a numeric column as the argument.

Standard Deviation X Returns a sample standard deviation for the specified column.

Requires defining a numeric column as the argument.

Sum X Returns the sum of the values in the specified column.

Requires defining a numeric column as the argument.

Variance X Returns a sample variance for the specified column.

Requires defining a numeric column as the argument.

Custom Calculation X Returns the results of a custom SQL expression that includes the arguments and columns to use.

Requires defining a valid Snowflake SQL expression.

Windows and Partitions

By default, the Window Function processor treats all incoming data as a single window and performs the configured calculation using the entire data set. Then, the processor writes the result of that calculation to the specified output column for every row in the window.

You can specify one or more partition columns to create logical groups of data for calculations. When you specify partition columns, the Window Function processor treats each resulting partition as a window: it performs the configured calculation for each partition, then writes the results to every row in the partition.

When needed, you can perform calculations on smaller subsets of data using window frames.

Example: Window

Say you have the following incoming data:
STORE_ID STATE MONTH M_TOTAL
10 MD 1/2022 10503
20 MD 1/2022 9055
30 CA 1/2022 6303
10 MD 2/2022 8504
20 MD 2/2022 1224
30 CA 2/2022 12355
10 MD 3/2022 9583
20 MD 3/2022 8302
30 CA 3/2022 1360

You configure the Window Function processor to find the sum of the M_TOTAL column and to write the result to a new TOTAL column.

By default, the Window Function processor performs the calculation using the entire window of data, as follows:

STORE_ID STATE MONTH M_TOTAL TOTAL
10 MD 1/2022 10503 67189
20 MD 1/2022 9055 67189
30 CA 1/2022 6303 67189
10 MD 2/2022 8504 67189
20 MD 2/2022 1224 67189
30 CA 2/2022 12355 67189
10 MD 3/2022 9583 67189
20 MD 3/2022 8302 67189
30 CA 3/2022 1360 67189

Notice how the calculation is performed across the entire data set, and the result of that single calculation is written to the TOTAL column for each row.

Example: Partition

Now, say you configure the Window Function processor to use STORE_ID as a partition column for the same incoming data.

In this case, the processor partitions the data by store IDs and generates the following output:

STORE_ID STATE MONTH M_TOTAL TOTAL
10 MD 1/2022 10503 28590
10 MD 2/2022 8504 28590
10 MD 3/2022 9583 28590
20 MD 1/2022 9055 18581
20 MD 2/2022 1224 18581
20 MD 3/2022 8302 18581
30 CA 1/2022 6303 20018
30 CA 2/2022 12355 20018
30 CA 3/2022 1360 20018

Notice how calculations are performed across each store ID partition of data. The result of each calculation is written to the TOTAL column for each row in the partition.

Window Frames

A window frame is a subset of rows with a specified start and end point that is in relationship to the current row.

You can configure the Window Function processor to perform calculations with window frames when the processor uses a window function that supports frames. When you define a window frame, you specify the boundaries of the frame, which defines where the frame conceptually starts and ends. You also specify the columns to order by, which determines the actual rows included in the frame for the calculation.

For example, you might configure the processor to calculate a moving average of sales transactions for a frame that includes ten rows before and after the current row, and is ordered by the transaction time.
Note: The processor does not support the RANGE operator with window frames.

Order Columns

Order columns determine the order of rows in a window. The order of the rows in a window determines the actual rows included in window frames, which affects the results of the window function calculation.

For example, say you configure a frame that includes the data from the start to the current row, and the processor finds the sum of the TOTAL column to generate a running total. If the data is ordered by a STREET_ADDRESS column, those results provide questionable value. But if you order by a TRANSACTION_TIME column, the processor provides a running total over time.

You can specify one or more columns to order by, as needed. When you specify order by columns, define the sort order to use. You can use the following options to order the data:
  • Ascending
  • Descending
  • Ascending - Nulls First
  • Descending - Nulls Last

Frame Definition

The boundaries of a window frame define where the frame starts and ends. In the Window Function processor, you use the From and To properties to define frame boundaries.

The From and To properties provide similar options, as follows:

  • Current Row - Uses the current row as the specified boundary.
  • Start - Uses the start of the window as the first row of the frame. Available only with the From property.
  • End - Uses the end of the window as the last row of the frame. Available only with the To property.
  • Preceding - Uses a specific row before the current row as the specified boundary.
  • Following - Uses a specific row after the current row as the specified boundary.
  • Snowflake default - Uses the Snowflake default values for the start and end of the frame.

    Snowflake defaults differ based on the function that you select. For information, see the documentation for the function in the Snowflake documentation.

Frame Definition Guidelines

Note the following guidelines and information about configuring window frames:
Frames must include one or more rows
To enable a frame with one row, you can specify the same option for the From and To properties. For example, you can define the equivalent of from the current row to the current row.
To enable a frame with two or more rows, the row associated with the From property must appear before the row associated with the To property. If this is not the case, adjust the From and To property definitions appropriately.
For example, the equivalent of from the current row to the preceding 5 rows is an invalid combination. However, the reverse, from the preceding 5 rows to the current row, creates a frame that includes the current row and the five rows before it.
Frames include the specified boundary row
When you configure the From and To properties, the boundary rows are included in the frame.
For example, if you specify the equivalent of from the preceding 10 rows to the current row, the frame includes the current row and the ten rows before it.
Similarly, from the current row to the following 10 rows generates a frame that includes the current row and the ten rows after it.
Frames can exclude the current row
You can configure the From and To properties to create a frame that does not include the current row. For example, from the following 1 row to the following 3 rows creates a frame that includes the three rows after the current row, excluding the current row.
Calculations occur for incomplete or empty frames
The processor performs a calculation for each row, even if the frame is empty or incomplete. Incomplete or empty frames can occur when the specified frame includes rows before or after the current row.
For example, say the processor calculates the sum of a TOTAL column, and uses from the preceding 2 rows to the preceding 1 row as the frame.
For the first row that is processed, the frame is empty, so the processor writes a null value to the TOTAL column. For the second row, the output is simply the value of the TOTAL column in the row before the current row. For the third row and all subsequent rows, the output is the sum of the TOTAL column for the two rows before the current row.

Examples

Let's use the following data set to look at a couple scenarios:

OFFICE_ID STATE ORDER_ID MONTH TOTAL
1 MD 000234 1/2022 1050
1 MD 000235 1/2022 9005
1 MD 000236 1/2022 8500
2 CA 500050 1/2022 6000
2 CA 500051 1/2022 12000
1 MD 000237 2/2022 3000
1 MD 000238 2/2022 5050
2 CA 500052 2/2022 1003
2 CA 500053 2/2022 2000
1 MD 000239 3/2022 9300
2 CA 500054 3/2022 8000

Frame: Start to Current Row

Say you want to generate a running total of the data based on order ID.

To do this, you configure the following processor properties:
  • Window Function: SUM
  • Argument: TOTAL
  • Window Frame: Enabled
  • Order Columns:
    • Column: ORDER_ID
    • Order: Ascending
  • From: Start
  • To: Current Row
  • Output Column: R_TOTAL
This results in the following output:
OFFICE_ID STATE ORDER_ID MONTH TOTAL R_TOTAL
1 MD 000234 1/2022 1050 1050
1 MD 000235 1/2022 9005 10055
1 MD 000236 1/2022 8500 18555
1 MD 000237 2/2022 3000 21555
1 MD 000238 2/2022 5050 26605
1 MD 000239 3/2022 9300 35905
2 CA 500050 1/2022 6000 41905
2 CA 500051 1/2022 12000 53905
2 CA 500052 2/2022 1003 54908
2 CA 500053 2/2022 2000 56908
2 CA 500054 3/2022 8000 64908

Notice how the R_TOTAL column contains the sum of the TOTAL column for the current row and the rows before it, with the data ordered by the ORDER_ID column

Partition and Window Frame

Now say you want separate running totals of the Maryland and California data. To do this, you can define the partitioning property as follows:
  • Partition Columns: STATE
This results in the following output:
OFFICE_ID STATE ORDER_ID MONTH TOTAL R_TOTAL
2 CA 500050 1/2022 6000 6000
2 CA 500051 1/2022 12000 18000
2 CA 500052 2/2022 1003 19003
2 CA 500053 2/2022 2000 21003
2 CA 500054 3/2022 8000 29003
1 MD 000234 1/2022 1050 1050
1 MD 000235 1/2022 9005 10055
1 MD 000236 1/2022 8500 18555
1 MD 000237 2/2022 3000 21555
1 MD 000238 2/2022 5050 26605
1 MD 000239 3/2022 9300 35905

Notice how the running total is now calculated separately for each state.

Configuring a Window Function Processor

Configure a Window Function processor to perform window calculations and write the results to a specified column. The processor performs calculations on data in windows or window frames.

  1. On the General tab, configure the following properties:
    General Property Description
    Name Stage name.
    Description Optional description.
    Cache Data Caches processed data.
  2. On the Functions tab, configure the following properties:
    Function Property Description
    Window Function Window function to use.
    Argument Column to use as an argument for the specified function.

    Available when using a function that requires an argument.

    First Argument Column to use as the first argument for the specified function.

    Available when using a function that requires two arguments.

    Second Argument Column to use as the second argument for the specified function.

    Available when using a function that requires two arguments.

    Arguments One or more columns to use as the arguments for the specified function.

    Available when using the Count Distinct function.

    SQL Expression Snowflake SQL expression that defines a window function and its arguments.

    Available when using Custom Calculation as the window function.

    Partition Columns Optional columns used to group data into partitions.
    Window Frame Enables using window frames for calculations.

    Available when a window function is selected that supports window frames.

    Order Columns One or more columns to order by. Specify the following properties for each column:
    • Column - Column to order by.
    • Order - Sort order to use:
      • Ascending
      • Descending
      • Ascending - Nulls First
      • Descending - Nulls Last

    The processor orders multiple columns based on the listed order.

    Available when defining a window frame.

    From Defines the first row in the frame:
    • Start - Starts the frame with the first row of the window or partition.
    • Preceding - Starts the frame with the specified row before the current row.
    • Following - Starts the frame with the specified row after the current row.
    • Current Row - Starts the frame with the current row.
    • Snowflake Default - Uses the Snowflake default. Valid only when using Snowflake defaults for both the From and To properties.

    Available when defining a window frame.

    To Defines the last row in the frame:
    • End - Ends the frame with the last row of the window or partition.
    • Preceding - Ends the frame with the specified row before the current row.
    • Following - Ends the frame with the specified row after the current row.
    • Current Row - Ends the frame with the current row.
    • Snowflake Default - Uses the Snowflake default. Valid only when using Snowflake defaults for both the From and To properties.

    Available when defining a window frame.

    Output Column Column for the results of the calculations.