Null Handling
The Null Handling processor replaces null values using a specified function. Available functions correspond to Snowflake functions. You can also define a custom function using a Snowflake SQL expression.
When you configure the processor, you specify one or more columns to replace nulls in, the function to use, and related properties.
For example, you can use the Median Column Value function to replace the null values in a
column with the median of the values in that column. Or, on a row level, you can use the
Replace function to replace null values in a total
field with the
product of the price
and quantity
fields by defining
the following expression: price * quantity
.
Null Replacement Functions
The Null Handling processor provides several functions to replace null values. The functions correspond to Snowflake functions.
Some functions operate on individual rows of data. For example, say you use the Greatest function and specify three columns to compare. When there is a null value to be replaced, the processor evaluates the values in the three columns in that row, then uses the highest value to replace the null value.
Other functions operate on entire columns of data. For example, you can use the Frequent Column Value function to replace nulls in a column with the most frequent value that appears in the column.
Row-based Function | Snowflake Function | Description |
---|---|---|
First Not Null - Coalesce |
COALESCE |
Replaces nulls with the first not-null value of a specified column in the row, or the result of a specified expression or function. |
Replace | NVL | Replaces nulls with the value of a specified column in the row, or the result of a specified expression. |
Replace Nulls and Not-Nulls | NVL2 | Replaces all values:
|
Zero if Null | ZEROIFNULL | Replaces nulls with zeros. |
Greatest | GREATEST | Replaces nulls with the greatest value. The comparison values can
be values in specified columns, the results of specified expressions
or functions, or any combination of the above. Note: If a specified column includes a
null value, or if an expression or function returns a null
value, the column value remains null.
|
Least | LEAST | Replaces nulls with the lowest value. The comparison values can
be values in specified columns, the results of specified expressions
or functions, or any combination of the above. Note: If a specified column includes a null
value, or if an expression or function returns a null value, the
column value remains null. |
Custom Function | Not applicable | Replaces nulls with the value returned by a specified SQL expression. |
Column-based Function | Snowflake Function | Description |
---|---|---|
Min Column Value | MIN | Replaces nulls with the minimum value in the column. |
Max Column Value | MAX | Replaces nulls with the maximum value in the column. |
Median Column Value | MEDIAN | Replaces nulls with the median of all column values. |
Frequent Column Value | MODE | Replaces nulls with the most frequent column value. |
Average Column Value | AVERAGE | Replaces nulls with the average of all column values. |
Quantile Column Value | Nth QUANTILE | Replaces nulls with the specified quantile of the values in the column. |
For more information about these Snowflake functions, see the Snowflake documentation.
Configuring a Null Handling Processor
Configure the Null Handling processor to replace null values.
-
On the General tab, configure the following
properties:
General Property Description Name Stage name. Description Optional description. Cache Data Caches processed data. -
On the Null Handling tab, configure the following
properties:
Null Handling Property Description Column Column with null values to replace. Function Function to use to replace null values. Expression Column or Snowflake SQL expression to replace null values. Available for the Replace function.
Expressions One or more columns to use with the function. You can select or specify the column names. Or, you can enter a regular expression or StreamSets function that evaluates to the columns to use. Available for function options that can compare multiple results for the null replacement, such as Greatest and Least.
SQL Expression Snowflake SQL expression to replace nulls. Available for the Custom Function option.
When Not Null Column or expression to replace non-null values. Available for the Replace Nulls and Not-Nulls function option.
When Null Column or expression to replace null values. Available for the Replace Nulls and Not-Nulls function option.