Aggregate
The Aggregate processor performs aggregate calculations on data within a batch. The processor does not perform calculations across multiple batches. To perform calculations on subsets of a batch, you can specify fields to group by.
For example, you can use the Aggregate processor to calculate the sum of all purchases in
a batch grouped by state, and to write the results to a State_Total
output field in each record.
Records generated by the Aggregate processor include the output fields and the fields to group by. All other fields from the input record are dropped.
When you configure the Aggregate processor to perform a calculation, you specify the aggregation field, aggregate function, and output field to use. You can specify multiple sets of calculations to perform. You can optionally specify fields to group by for the calculations.
Aggregate Processing
The Aggregate processor can perform one or more aggregate calculations on data in a batch.
For each calculation that you want to perform, you specify the aggregate function, the field to use for the calculation, and an output field for the results.
By default, the processor performs calculations using all of the data in the batch. To perform calculations on subsets of data, specify one or more fields to group by.
After performing the calculations, the processor passes the output fields and the fields to group by to the generated records. The processor drops all other fields from the input records.
Example
 Use the
COUNT
function and output the results to aStateCount
field.  Use the
SUM
function with theTotal
field, and output the results to aStateTotal
field.  Set the processor to group by the
State
field.
Let's say a batch contains the following data:
TransactionID  StoreID  State  Total 

0032355  35  MD  230.40 
0016433  20  IL  90.50 
0016434  20  IL  489.23 
0032356  35  MD  63.50 
0032357  35  MD  150.49 
StateCount
and
StateTotal
fields in each record as follows:State  StateCount  StateTotal 

MD  3  444.39 
IL  2  579.73 
Aggregate Functions
 Approximate Count Version  Returns the approximate number of distinct items in a group.
 Approximate Variance  Returns the variance of a values from a group. Approximate variance uses a sample of a group rather than the full population.
 Average  Returns the average of the values in a group.
 Count  Returns the number of items in a group.
 First  Returns the value of the first record in a group.
 Last  Returns the value of the last record in a group.
 Max  Returns the maximum value of a group.
 Median  Returns the median of the values of a group.
 Min  Returns the minimum value of a group.
 Sum  Returns the sum of the values of a group.
 Standard Deviation  Returns the standard deviation of a value from the group. Standard deviation uses a sample of a group rather than the full population.
Default Output Fields
When you configure the Aggregate processor, you can specify the output field to use. When you do not specify an output field, the Aggregate processor uses a default output field.
Aggregate Function  Default Output Field Name 

Approximate Count Version  approx_count_distinct 
Approximate Variance  variance 
Average  avg 
Count  count 
First  first 
Last  last 
Max  max 
Median  median 
Min  min 
Sum  sum 
Standard Deviation  std_dev 
Configuring an Aggregate Processor

In the Properties panel, on the
General tab, configure the following properties:
General Property Description Name Stage name. Description Optional description. Cache Data Caches data processed for a batch so the data can be reused for multiple downstream stages. Use to improve performance when the stage passes data to multiple stages. Caching can limit pushdown optimization when the pipeline runs in ludicrous mode.

On the Aggregate tab, configure the following
properties:
Aggregate Property Description Aggregate Type and Field The aggregate calculations to perform. Configure the following properties:  Aggregate function  The aggregation function to use
in the calculation. Use one of the following
functions:
 Approximate Count Version  Returns the approximate number of distinct items in a group.
 Approximate Variance  Returns the variance of a values from a group. Approximate variance uses a sample of a group rather than the full population.
 Average  Returns the average of the values in a group.
 Count  Returns the number of items in a group.
 First  Returns the value of the first record in a group.
 Last  Returns the value of the last record in a group.
 Max  Returns the maximum value of a group.
 Median  Returns the median of the values of a group.
 Min  Returns the minimum value of a group.
 Sum  Returns the sum of the values of a group.
 Standard Deviation  Returns the standard deviation of a value from the group. Standard deviation uses a sample of a group rather than the full population.
 Aggregate Field  The field to use in the aggregate calculation.
 Output Field  The field for the results of the
calculation.
When you do not enter an output field, the Aggregate processor uses a default output field.
Click the Add icon to add additional calculations.
Group By Fields Optional fields to group by. Use to perform calculations on subsets of the data. When you list more than one field, the processor uses the specified field order. When not used, the processor performs calculations across all records in the batch.
 Aggregate function  The aggregation function to use
in the calculation. Use one of the following
functions: