Sample

The Sample processor generates a sample subset of the incoming data.

When you configure the Sample processor, you specify how you want to define the sample size: by fraction or by number of rows. Then, you specify the fraction or number of rows, respectively.

Example

Say your pipeline processes an extremely large table of data, and you want to evaluate one tenth of the data and write the results to a separate table for subsequent review. To do this, you add the Sample processor as a separate branch off of the main processing branch, and configure it as follows:
  • Sample Type property: Fraction
  • Fraction property: .1

Configuring a Sample Processor

Configure a Sample processor to generate a sample subset of the incoming data to pass downstream.

  1. On the General tab, configure the following properties:
    General Property Description
    Name Stage name.
    Description Optional description.
    Cache Data Caches processed data.
  2. On the Sample tab, configure the following properties:
    Sample Property Description
    Sample Type Method to specify the sample to take:
    • Fraction - Fraction of the incoming data.
    • Number - Number of rows from the incoming data.
    Fraction The fraction of the incoming data to pass downstream. Specify the fraction as a decimal.

    For example, to sample a quarter of the data, enter .25.

    Number of Rows The number of rows from the incoming data to pass downstream.

    The processor passes the specified number of rows downstream unless the specified number is less than the number of incoming rows.