Apply Functions

The Apply Functions processor applies a Snowflake function to one or more columns. When you configure the processor, you specify the columns to apply the function to, the function to use and related details, and the output columns for the results.

You can use this processor to apply the following types of Snowflake functions:
  • Date and Time - Manipulates datetime data, such as extracting part of the date or converting the time zone.
  • Numeric - Manipulates numbers, such as finding the cosine or square root of a number.
  • Geospatial - Manipulates geospatial data, such as returning the number of points in a geospatial object or combining Geography objects.
  • String - Manipulates strings, such as trimming trailing characters or converting case.
  • User-defined functions - Manipulates data based on a custom function.

For details about Snowflake functions, see the Snowflake documentation.

Specifying Columns

When you configure the Apply Functions processor, you define a Column property that specifies the input columns to apply functions to. You can specify a single column or enter a regular expression that evaluates to a set of columns.

You also configure the Output Column property to specify the output columns for the results of the calculations. As with the Column property, you can specify a single column or enter a regular expression that evaluates to a set of columns. If you do not specify an output column, the Apply Functions processor overwrites the data in the input columns.

When specifying regular expressions for the input and output columns, ensure that each input column has a corresponding output column and that the output column names are unique.

Output Column Variables

When you specify a regular expression in the Output Column property, you can use the following variables to generate output column names based on input column names:

Original column name variable
Use $0 as a variable to represent the original column names. This allows you to specify additional characters as a prefix or suffix for the original column names.
For example, to use customer_ as a prefix for all existing column names, enter customer_$0. If the processor applies the function to an input column named address, then it writes the results of the calculation to a customer_address output column.
Regular expression group variables
You can use the following variables to represent groups in the input column regular expression:
  • $1 - Represents the first group in the expression, reading left to right
  • $2 - Represents the second group
  • $3 - Represents the third group, and so on.
For example, say you use (sales)_(.+) to define the input columns to apply a function to. Then, when you define the output column, you can use $1 to represent (sales) and $2 to represent (.+), as follows: total_$1-$2.
If the processor applies functions to an input column named sales_spain, then it writes the results of the calculation to a total_sales-spain output column.