Window Function
The Window Function processor performs window calculations and writes the results to a specified column. Snowflake window functions apply to a window, or group, of rows. The Window Function processor performs calculations on data in windows or window frames.
When you configure the Window Function processor, you specify the window function to use, optional partition columns, and the output column name. When you specify a window function that supports window frames, you can also configure the processor to use window frames. When you configure a frame, you define the boundaries of the frame and the columns to order by.
Supported Window Functions
Function | Supports Frames | Description |
---|---|---|
Any Value | Returns any value from the specified column. Requires defining one column as an argument. |
|
Average | X | Returns the average of non-null values in a specified
column. Requires defining a numeric column as the argument. |
Correlation | Returns the correlation coefficient for non-null pairs, based on
the specified columns and using the following
expression:
Requires defining two numeric columns, the first for the dependent variable, the second for the independent variable. |
|
Count | X | Returns a count of all rows. |
Count Distinct | Returns a count of all distinct rows, based on one or more
specified columns. Requires defining one or more columns to use as arguments. |
|
Maximum | X | Returns the minimum value in the specified column. Requires defining a numeric column as the argument. |
Median | Returns the median of the rows in the specified
column. Requires defining a numeric column as the argument. |
|
Minimum | X | Returns the maximum value in the specified column. Requires defining a numeric column as the argument. |
Ratio to Report | Returns the ratio of a value, to the sum of all values, for the
specified column. Requires defining a numeric column as the argument. |
|
Standard Deviation | X | Returns a sample standard deviation for the specified
column. Requires defining a numeric column as the argument. |
Sum | X | Returns the sum of the values in the specified column. Requires defining a numeric column as the argument. |
Variance | X | Returns a sample variance for the specified column. Requires defining a numeric column as the argument. |
Custom Calculation | X | Returns the results of a custom SQL expression that includes the
arguments and columns to use. Requires defining a valid Snowflake SQL expression. |
Windows and Partitions
By default, the Window Function processor treats all incoming data as a single window and performs the configured calculation using the entire data set. Then, the processor writes the result of that calculation to the specified output column for every row in the window.
You can specify one or more partition columns to create logical groups of data for calculations. When you specify partition columns, the Window Function processor treats each resulting partition as a window: it performs the configured calculation for each partition, then writes the results to every row in the partition.
When needed, you can perform calculations on smaller subsets of data using window frames.
Example: Window
STORE_ID | STATE | MONTH | M_TOTAL |
10 | MD | 1/2022 | 10503 |
20 | MD | 1/2022 | 9055 |
30 | CA | 1/2022 | 6303 |
10 | MD | 2/2022 | 8504 |
20 | MD | 2/2022 | 1224 |
30 | CA | 2/2022 | 12355 |
10 | MD | 3/2022 | 9583 |
20 | MD | 3/2022 | 8302 |
30 | CA | 3/2022 | 1360 |
You configure the Window Function processor to find the sum of the
M_TOTAL
column and to write the result to a new
TOTAL
column.
By default, the Window Function processor performs the calculation using the entire window of data, as follows:
STORE_ID | STATE | MONTH | M_TOTAL | TOTAL |
10 | MD | 1/2022 | 10503 | 67189 |
20 | MD | 1/2022 | 9055 | 67189 |
30 | CA | 1/2022 | 6303 | 67189 |
10 | MD | 2/2022 | 8504 | 67189 |
20 | MD | 2/2022 | 1224 | 67189 |
30 | CA | 2/2022 | 12355 | 67189 |
10 | MD | 3/2022 | 9583 | 67189 |
20 | MD | 3/2022 | 8302 | 67189 |
30 | CA | 3/2022 | 1360 | 67189 |
Notice how the calculation is performed across the entire data set, and the result of
that single calculation is written to the TOTAL
column for each
row.
Example: Partition
Now, say you configure the Window Function processor to use STORE_ID
as a partition column for the same incoming data.
In this case, the processor partitions the data by store IDs and generates the following output:
STORE_ID | STATE | MONTH | M_TOTAL | TOTAL |
10 | MD | 1/2022 | 10503 | 28590 |
10 | MD | 2/2022 | 8504 | 28590 |
10 | MD | 3/2022 | 9583 | 28590 |
20 | MD | 1/2022 | 9055 | 18581 |
20 | MD | 2/2022 | 1224 | 18581 |
20 | MD | 3/2022 | 8302 | 18581 |
30 | CA | 1/2022 | 6303 | 20018 |
30 | CA | 2/2022 | 12355 | 20018 |
30 | CA | 3/2022 | 1360 | 20018 |
Notice how calculations are performed across each store ID partition of data. The
result of each calculation is written to the TOTAL
column for each
row in the partition.
Window Frames
A window frame is a subset of rows with a specified start and end point that is in relationship to the current row.
You can configure the Window Function processor to perform calculations with window frames when the processor uses a window function that supports frames. When you define a window frame, you specify the boundaries of the frame, which defines where the frame conceptually starts and ends. You also specify the columns to order by, which determines the actual rows included in the frame for the calculation.
RANGE
operator with window frames.Order Columns
Order columns determine the order of rows in a window. The order of the rows in a window determines the actual rows included in window frames, which affects the results of the window function calculation.
For example, say you configure a frame that includes the data from the start to the
current row, and the processor finds the sum of the TOTAL
column to
generate a running total. If the data is ordered by a STREET_ADDRESS
column, those results provide questionable value. But if you order by a
TRANSACTION_TIME
column, the processor provides a running total
over time.
- Ascending
- Descending
- Ascending - Nulls First
- Descending - Nulls Last
Frame Definition
The boundaries of a window frame define where the frame starts and ends. In the Window Function processor, you use the From and To properties to define frame boundaries.
The From and To properties provide similar options, as follows:
- Current Row - Uses the current row as the specified boundary.
- Start - Uses the start of the window as the first row of the frame. Available only with the From property.
- End - Uses the end of the window as the last row of the frame. Available only with the To property.
- Preceding - Uses a specific row before the current row as the specified boundary.
- Following - Uses a specific row after the current row as the specified boundary.
- Snowflake default - Uses the Snowflake default values for the start and end of
the frame.
Snowflake defaults differ based on the function that you select. For information, see the documentation for the function in the Snowflake documentation.
Frame Definition Guidelines
- Frames must include one or more rows
- To enable a frame with one row, you can specify the same option for the From
and To properties. For example, you can define the equivalent of
from the current row to the current row
. - Frames include the specified boundary row
- When you configure the From and To properties, the boundary rows are included in the frame.
- Frames can exclude the current row
- You can configure the From and To properties to create a frame that does not
include the current row. For example,
from the following 1 row to the following 3 rows
creates a frame that includes the three rows after the current row, excluding the current row. - Calculations occur for incomplete or empty frames
- The processor performs a calculation for each row, even if the frame is empty or incomplete. Incomplete or empty frames can occur when the specified frame includes rows before or after the current row.
Examples
Let's use the following data set to look at a couple scenarios:
OFFICE_ID | STATE | ORDER_ID | MONTH | TOTAL |
1 | MD | 000234 | 1/2022 | 1050 |
1 | MD | 000235 | 1/2022 | 9005 |
1 | MD | 000236 | 1/2022 | 8500 |
2 | CA | 500050 | 1/2022 | 6000 |
2 | CA | 500051 | 1/2022 | 12000 |
1 | MD | 000237 | 2/2022 | 3000 |
1 | MD | 000238 | 2/2022 | 5050 |
2 | CA | 500052 | 2/2022 | 1003 |
2 | CA | 500053 | 2/2022 | 2000 |
1 | MD | 000239 | 3/2022 | 9300 |
2 | CA | 500054 | 3/2022 | 8000 |
Frame: Start to Current Row
Say you want to generate a running total of the data based on order ID.
- Window Function:
SUM
- Argument:
TOTAL
- Window Frame: Enabled
- Order Columns:
- Column:
ORDER_ID
- Order:
Ascending
- Column:
- From:
Start
- To:
Current Row
- Output Column:
R_TOTAL
OFFICE_ID | STATE | ORDER_ID | MONTH | TOTAL | R_TOTAL |
1 | MD | 000234 | 1/2022 | 1050 | 1050 |
1 | MD | 000235 | 1/2022 | 9005 | 10055 |
1 | MD | 000236 | 1/2022 | 8500 | 18555 |
1 | MD | 000237 | 2/2022 | 3000 | 21555 |
1 | MD | 000238 | 2/2022 | 5050 | 26605 |
1 | MD | 000239 | 3/2022 | 9300 | 35905 |
2 | CA | 500050 | 1/2022 | 6000 | 41905 |
2 | CA | 500051 | 1/2022 | 12000 | 53905 |
2 | CA | 500052 | 2/2022 | 1003 | 54908 |
2 | CA | 500053 | 2/2022 | 2000 | 56908 |
2 | CA | 500054 | 3/2022 | 8000 | 64908 |
Notice how the R_TOTAL
column contains the sum of the
TOTAL
column for the current row and the rows before it, with
the data ordered by the ORDER_ID
column
Partition and Window Frame
- Partition Columns:
STATE
OFFICE_ID | STATE | ORDER_ID | MONTH | TOTAL | R_TOTAL |
2 | CA | 500050 | 1/2022 | 6000 | 6000 |
2 | CA | 500051 | 1/2022 | 12000 | 18000 |
2 | CA | 500052 | 2/2022 | 1003 | 19003 |
2 | CA | 500053 | 2/2022 | 2000 | 21003 |
2 | CA | 500054 | 3/2022 | 8000 | 29003 |
1 | MD | 000234 | 1/2022 | 1050 | 1050 |
1 | MD | 000235 | 1/2022 | 9005 | 10055 |
1 | MD | 000236 | 1/2022 | 8500 | 18555 |
1 | MD | 000237 | 2/2022 | 3000 | 21555 |
1 | MD | 000238 | 2/2022 | 5050 | 26605 |
1 | MD | 000239 | 3/2022 | 9300 | 35905 |
Notice how the running total is now calculated separately for each state.
Configuring a Window Function Processor
Configure a Window Function processor to perform window calculations and write the results to a specified column. The processor performs calculations on data in windows or window frames.
-
On the General tab, configure the following
properties:
General Property Description Name Stage name. Description Optional description. Cache Data Caches processed data. -
On the Functions tab, configure the following
properties:
Function Property Description Window Function Window function to use. Argument Column to use as an argument for the specified function. Available when using a function that requires an argument.
First Argument Column to use as the first argument for the specified function. Available when using a function that requires two arguments.
Second Argument Column to use as the second argument for the specified function. Available when using a function that requires two arguments.
Arguments One or more columns to use as the arguments for the specified function. Available when using the Count Distinct function.
SQL Expression Snowflake SQL expression that defines a window function and its arguments. Available when using Custom Calculation as the window function.
Partition Columns Optional columns used to group data into partitions. Window Frame Enables using window frames for calculations. Available when a window function is selected that supports window frames.
Order Columns One or more columns to order by. Specify the following properties for each column: - Column - Column to order by.
- Order - Sort order to use:
- Ascending
- Descending
- Ascending - Nulls First
- Descending - Nulls Last
The processor orders multiple columns based on the listed order.
Available when defining a window frame.
From Defines the first row in the frame: - Start - Starts the frame with the first row of the window or partition.
- Preceding - Starts the frame with the specified row before the current row.
- Following - Starts the frame with the specified row after the current row.
- Current Row - Starts the frame with the current row.
- Snowflake Default - Uses the Snowflake default. Valid only when using Snowflake defaults for both the From and To properties.
Available when defining a window frame.
To Defines the last row in the frame: - End - Ends the frame with the last row of the window or partition.
- Preceding - Ends the frame with the specified row before the current row.
- Following - Ends the frame with the specified row after the current row.
- Current Row - Ends the frame with the current row.
- Snowflake Default - Uses the Snowflake default. Valid only when using Snowflake defaults for both the From and To properties.
Available when defining a window frame.
Output Column Column for the results of the calculations.