Aggregate
For example, you can use the Aggregate processor to calculate the sum of all
purchases in a batch grouped by state, and to write the results to a
State_Total
output column in each row.
When you configure the Aggregate processor to perform a calculation, you specify the aggregation column, aggregate function, and output column to use. You can specify multiple sets of calculations to perform. You can optionally specify columns to group by for the calculations.
Processing and Generated Rows
The Aggregate processor can perform one or more aggregate calculations on data in a batch. For each calculation that you want to perform, you specify the aggregate function, the column to use for the calculation, and an output column for the results.
By default, the processor performs calculations using all of the incoming data. To perform calculations on subsets of data, specify one or more columns to group by.
Rows generated by the Aggregate processor include the output columns and any specified groupby columns. All other columns from the incoming row are dropped.
Example
 Use the
COUNT
function and output the results to aStateCount
column.  Use the
SUM
function with theTotal
column, and output the results to aStateTotal
column.  Set the processor to group by the
State
column.
Let's say a batch contains the following data:
TransactionID  StoreID  State  Total 

0032355  35  MD  230.40 
0016433  20  IL  90.50 
0016434  20  IL  489.23 
0032356  35  MD  63.50 
0032357  35  MD  150.49 
StateCount
and
StateTotal
columns in each row as follows:State  StateCount  StateTotal 

MD  3  444.39 
IL  2  579.73 
Joining Aggregate Data
Rows generated by the Aggregate processor include the output columns and any specified groupby columns. All other columns from the incoming row are dropped.
If you want to perform aggregate calculations, but do not want to discard the rest of the data, you can perform the aggregate calculations in a separate branch in the pipeline, and then merge the results back into the data.
For example, the following pipeline converts a few column data types before passing the data to the Aggregate processor. Then, the Join processor joins the aggregate calculations with the data from the Column Type Converter processor.
Aggregate Functions
You can use the following aggregate functions in an Aggregate processor:
 Average  Returns the average of the values in a group.
 Count  Returns the number of items in a group.
 Count Distinct  Returns the number of unique items in a group.
 Max  Returns the maximum value of a group.
 Median  Returns the median of the values of a group.
 Min  Returns the minimum value of a group.
 Mode  Returns the most frequent value within a group.
 Standard Deviation  Returns the standard deviation of a value from the group. Standard deviation uses a sample of a group rather than the full population.
 Sum  Returns the sum of the values of a group.
 Variance  Returns a sample variance of nonNULL rows in a group.
 Custom  Returns the results of a custom aggregate calculation.
Configuring an Aggregate Processor

On the General tab, configure the following
properties:
General Property Description Name Stage name. Description Optional description. Cache Data Caches processed data. 
On the Aggregate tab, configure the following
properties:
Aggregate Property Description Aggregations The aggregate calculations to perform. Configure the following properties:  Aggregate Function  The aggregation
function to use in the calculation.
Default is Average.
 Aggregate Column  The column to use in the
aggregate calculation. You can enter the column name or select a column
from preview data.
Not available for all aggregate functions.
 Calculation  Aggregate calculation to use. Available when using Custom Calculation as the aggregate function.
 Output Column  The column for the results of the calculation.
Click the Add icon to add additional calculations.
Group By Columns Optional columns to group by. Use to perform calculations on subsets of the data. You can enter column names or select columns from preview data.
When you list more than one column, the processor uses the specified column order. If you specify multiple columns, you can drag them into the appropriate order.
When not used, the processor performs calculations across all rows in the batch.
 Aggregate Function  The aggregation
function to use in the calculation.