Profile

The Profile processor calculates descriptive statistics for string and numeric data. Use the Profile processor to help you profile and understand data.

The processor calculates count, mean, standard deviation, minimum, and maximum statistics across all records in the batch. The processor calculates the statistics for string and numeric fields only, ignoring all other fields in the record.

The processor generates a total of five output records for each batch, one record for each calculated statistic. Each output record includes a summary field that lists the type statistic calculated for the record. The remaining fields contain the calculated statistic for that field across all records in the batch.

When you configure the Profile processor, you define whether the processor profiles all fields or specific fields in each record.

Tip: In streaming pipelines, you can use a Window processor upstream from this processor to generate larger batch sizes for evaluation.