# Aggregate

The Aggregate processor performs aggregate calculations on data within a batch. The processor does not perform calculations across multiple batches. To perform calculations on subsets of a batch, you can specify fields to group by.

For example, you can use the Aggregate processor to calculate the sum of all purchases in
a batch grouped by state, and to write the results to a `State_Total`

output field in each record.

Records generated by the Aggregate processor include the output fields and the fields to group by. All other fields from the input record are dropped.

When you configure the Aggregate processor to perform a calculation, you specify the aggregation field, aggregate function, and output field to use. You can specify multiple sets of calculations to perform. You can optionally specify fields to group by for the calculations.

## Aggregate Processing

The Aggregate processor can perform one or more aggregate calculations on data in a batch.

For each calculation that you want to perform, you specify the aggregate function, the field to use for the calculation, and an output field for the results.

By default, the processor performs calculations using all of the data in the batch. To perform calculations on subsets of data, specify one or more fields to group by.

After performing the calculations, the processor passes the output fields and the fields to group by to the generated records. The processor drops all other fields from the input records.

### Example

- Use the
`COUNT`

function and output the results to a`StateCount`

field. - Use the
`SUM`

function with the`Total`

field, and output the results to a`StateTotal`

field. - Set the processor to group by the
`State`

field.

Let's say a batch contains the following data:

TransactionID | StoreID | State | Total |
---|---|---|---|

0032355 | 35 | MD | 230.40 |

0016433 | 20 | IL | 90.50 |

0016434 | 20 | IL | 489.23 |

0032356 | 35 | MD | 63.50 |

0032357 | 35 | MD | 150.49 |

`StateCount`

and
`StateTotal`

fields in each record as follows:State | StateCount | StateTotal |
---|---|---|

MD | 3 | 444.39 |

IL | 2 | 579.73 |

## Aggregate Functions

- Approximate Count Version - Returns the approximate number of distinct items in a group.
- Approximate Variance - Returns the variance of a values from a group. Approximate variance uses a sample of a group rather than the full population.
- Average - Returns the average of the values in a group.
- Count - Returns the number of items in a group.
- First - Returns the value of the first record in a group.
- Last - Returns the value of the last record in a group.
- Max - Returns the maximum value of a group.
- Median - Returns the median of the values of a group.
- Min - Returns the minimum value of a group.
- Sum - Returns the sum of the values of a group.
- Standard Deviation - Returns the standard deviation of a value from the group. Standard deviation uses a sample of a group rather than the full population.

## Default Output Fields

When you configure the Aggregate processor, you can specify the output field to use. When you do not specify an output field, the Aggregate processor uses a default output field.

Aggregate Function | Default Output Field Name |
---|---|

Approximate Count Version | approx_count_distinct |

Approximate Variance | variance |

Average | avg |

Count | count |

First | first |

Last | last |

Max | max |

Median | median |

Min | min |

Sum | sum |

Standard Deviation | std_dev |