Data Generation Functions

Data generation functions are included in the Protector stage library. You can use data generation functions only in the Protector: Expression processor.

Data generation functions generate random fake data that you can use as a replacement for sensitive data. There are two kinds of data generation functions:
  • Faker functions - Generate specific types of fake data, such as addresses, names, and credit card numbers.
  • Xeger functions - Generate fake data based on user-defined regular expressions.
Both sets of functions provide two types of output:
Random
Values are generated randomly, regardless of input value. Each value that is replaced is replaced with a new, random value.
For example, say you use randomFaker:email() to generate fake email addresses to replace user email addresses. In this case, a random email address is generated for each user email address that is replaced.
Deterministic
Generated values are reused when the same input values appear. This allows you to determine in downstream processing that values recur, while ensuring that the data is protected.
When you configure the function, you specify the input value that enables the reuse of generated values.
You can use any logical expression to define the input value, but here are some common cases:
  • To use the value that is being replaced, use the field function, f:value().

    For example, say you use the following expression to deterministically generate a replacement URL ${deterministicFaker:url(f:value())}. And say the expression replaces www.RealCompanyName.com with www.fakename.com. Each time www.RealCompanyName.com appears in a URL field, it is replaced with the same URL, www.fakename.com.

  • To use values in other fields in the record, use the record:value() function.
    For example, say you want to generate the same fake name for matching UserIDs. Then, you might use the following expressions to generate the first and last names:
    ${deterministicFaker:firstName(record:value('/UserID'))} ${deterministicFaker:lastName(record:value('/UserID'))}

    So if the function generates Mia Lakier for the R2204 user ID, each time R2204 reappears as an ID, the function replaces the real name with Mia Lakier.

Use deterministic functions to provide insight into the frequency of repeated values while protecting sensitive data.
Note: Deterministic functions reuse generated values for the exact same input value. Data with differences in formatting, such as a dash instead of parentheses for area codes, are not considered the same. For best results, ensure that the data uses uniform formatting. When necessary, you can use category functions to standardize certain categories of data.