Null Handling

The Null Handling processor replaces null values using a specified function. Available functions correspond to Snowflake functions. You can also define a custom function using a Snowflake SQL expression.

When you configure the processor, you specify one or more columns to replace nulls in, the function to use, and related properties.

For example, you can use the Median Column Value function to replace the null values in a column with the median of the values in that column. Or, on a row level, you can use the Replace function to replace null values in a total field with the product of the price and quantity fields by defining the following expression: price * quantity.

Null Replacement Functions

The Null Handling processor provides several functions to replace null values. The functions correspond to Snowflake functions.

Some functions operate on individual rows of data. For example, say you use the Greatest function and specify three columns to compare. When there is a null value to be replaced, the processor evaluates the values in the three columns in that row, then uses the highest value to replace the null value.

Other functions operate on entire columns of data. For example, you can use the Frequent Column Value function to replace nulls in a column with the most frequent value that appears in the column.

You can use the following row-based functions with the Null Handling processor:
Row-based Function Snowflake Function Description
First Not Null - Coalesce

COALESCE

Replaces nulls with the first not-null value of a specified column in the row, or the result of a specified expression or function.
Replace NVL Replaces nulls with the value of a specified column in the row, or the result of a specified expression.
Replace Nulls and Not-Nulls NVL2 Replaces all values:
  • Replaces not-null values with the value in the specified column in the row, or the result of a specified expression.
  • Replaces null values with the value in the specified column in the row, or the result of a specified expression.
Zero if Null ZEROIFNULL Replaces nulls with zeros.
Greatest GREATEST Replaces nulls with the greatest value. The comparison values can be values in specified columns, the results of specified expressions or functions, or any combination of the above.
Note: If a specified column includes a null value, or if an expression or function returns a null value, the column value remains null.
Least LEAST Replaces nulls with the lowest value. The comparison values can be values in specified columns, the results of specified expressions or functions, or any combination of the above.
Note: If a specified column includes a null value, or if an expression or function returns a null value, the column value remains null.
Custom Function Not applicable Replaces nulls with the value returned by a specified SQL expression.
You can use the following column-based functions with the Null Handling processor:
Column-based Function Snowflake Function Description
Min Column Value MIN Replaces nulls with the minimum value in the column.
Max Column Value MAX Replaces nulls with the maximum value in the column.
Median Column Value MEDIAN Replaces nulls with the median of all column values.
Frequent Column Value MODE Replaces nulls with the most frequent column value.
Average Column Value AVERAGE Replaces nulls with the average of all column values.
Quantile Column Value Nth QUANTILE Replaces nulls with the specified quantile of the values in the column.

For more information about these Snowflake functions, see the Snowflake documentation.

Configuring a Null Handling Processor

Configure the Null Handling processor to replace null values.

  1. On the General tab, configure the following properties:
    General Property Description
    Name Stage name.
    Description Optional description.
    Cache Data Caches processed data.
  2. On the Null Handling tab, configure the following properties:
    Null Handling Property Description
    Column Column with null values to replace.
    Function Function to use to replace null values.
    Expression Column or Snowflake SQL expression to replace null values.

    Available for the Replace function.

    Expressions One or more columns to use with the function. You can select or specify the column names. Or, you can enter a regular expression or IBM StreamSets function that evaluates to the columns to use.

    Available for function options that can compare multiple results for the null replacement, such as Greatest and Least.

    SQL Expression Snowflake SQL expression to replace nulls.

    Available for the Custom Function option.

    When Not Null Column or expression to replace non-null values.

    Available for the Replace Nulls and Not-Nulls function option.

    When Null Column or expression to replace null values.

    Available for the Replace Nulls and Not-Nulls function option.