Functions

The following table lists all available functions. For details about each function, see the related function type:

Function Type Functions
Record functions
  • record:attribute(<attribute name>)
  • record:attributeOrDefault(<attribute name>, <default value>)
  • record:creator()
  • record:eventCreation()
  • record:eventType()
  • record:eventVersion()
  • record:exists(<field path>)
  • record:fieldAttribute(<field path>, <field attribute name>)
  • record:fieldAttributeOrDefault(<field path>, <field attribute name>, <default value>)
  • record:id()
  • record:path()
  • record:type(<field path>)
  • record:value(<field path>)
  • record:valueOrDefault(<field path>, <default value>)
Delimited data record functions
  • record:dExists(<header>)
  • record:dHasDupHeader()
  • record:dIndex(<header>)
  • record:dIsDupHeader(<header>)
  • record:dToMap()
  • record:dValue(<header>)
  • record:dValueAt(<header index>)
Error record functions
  • record:errorCode()
  • record:errorCollectorId()
  • record:errorMessage()
  • record:errorPipeline()
  • record:errorStackTrace()
  • record:errorStage()
  • record:errorStageLabel()
  • record:errorTime()
Base64 functions
  • base64:decodeBytes(<string>)
  • base64:decodeString(<string>, <charset>)
  • base64:encodeBytes(<byte array>, <urlSafe: true | false>)
  • base64:encodeString(<string>, <urlSafe: true | false>, <charset>)
Category functions
  • CREDIT_CARD:type()
  • CREDIT_CARD:firstGroup()
  • CREDIT_CARD:secondGroup()
  • CREDIT_CARD:thirdGroup()
  • CREDIT_CARD:fourthGroup()
  • CREDIT_CARD:lastPart()
  • CREDIT_CARD:std()
  • EMAIL:domain()
  • EMAIL:domainTLD()
  • EMAIL:domainWithoutTLD()
  • EMAIL:localPart()
  • EMAIL:std()
  • US_PHONE:areaCode()
  • US_PHONE:areaCodeOrDefault(<default value>)
  • US_PHONE:exchangeCode()
  • US_PHONE:lineNumber()
  • US_PHONE:extension()
  • US_PHONE:extensionOrDefault()
  • US_PHONE:std()
  • US_SSN:areaNumber()
  • US_SSN:groupNumber()
  • US_SSN:serialNumber()
  • US_SSN:std()
  • US_ZIP_CODE:stateGroup()
  • US_ZIP_CODE:region()
  • US_ZIP_CODE:cityArea()
  • US_ZIP_CODE:specificLocation()
  • US_ZIP_CODE:specificLocationOrDefault(<default value>)
  • US_ZIP_CODE:std()
Credential functions
  • credential:get(<cstoreId>, <userGroup>, <name>)
  • credential:getWithOptions(<cstoreId>, <userGroup>, <name>, <storeOptions>)
Data drift functions
  • drift:names(<field path>, <ignoreWhenMissing: true | false>)
  • drift:order(<field path>, <ignoreWhenMissing: true | false>)
  • drift:size(<field path>, <ignoreWhenMissing: true | false>)
  • drift:type(<field path>, <ignoreWhenMissing: true | false>)
Data generation functions Address related faker functions:
  • deterministicFaker:country(<input>)
  • randomFaker:country()
  • deterministicFaker:UsCity(<input>)
  • randomFaker:UsCity
  • deterministicFaker:UsFullAddress(<input>)
  • randomFaker:UsFullAddress()
  • deterministicFaker:UsState(<input>)
  • randomFaker:UsState()
  • deterministicFaker:UsStreetAddress(<input>)
  • randomFaker:UsStreetAddress()
  • deterministicFaker:UsZipCode(<input>)
  • randomFaker:UsZipCode()
Finance related faker functions:
  • deterministicFaker:creditCard(<input>)
  • randomFaker:creditCard()
  • deterministicFaker:creditCardOfType(<type>, <input>)
  • randomFaker:creditCardOfType(<type>)
  • deterministicFaker:iban(<input>)
  • randomFaker:iban()
  • deterministicFaker:ibanForCountry(<country code>, <input>)
  • randomFaker:ibanForCountry(<country code>)
Internet related faker functions:
  • deterministicFaker:email(<input>)
  • randomFaker:email()
  • deterministicFaker:url(<input>)
  • randomFaker:url()
Person related faker functions:
  • deterministicFaker:race(<input>)
  • randomFaker:race()
  • deterministicFaker:UsFirstName(<input>)
  • randomFaker:UsFirstName()
  • deterministicFaker:UsLastName(<input>)
  • randomFaker:UsLastName()
  • deterministicFaker:UsPhone(<input>)
  • randomFaker:UsPhone()
  • deterministicFaker:UsSsn(<input>)
  • randomFaker:UsSsn()
Xeger functions:
  • xeger:deterministic(<regular expression>, <input>)
  • xeger:random(<regular expression>)
Field functions
  • f:attribute(<attribute name>)
  • f:getSiblingWithName(<field name>)
  • f:hasSiblingWithName(<field name>)
  • f:hasSiblingWithValue(<field name>, <field value>)
  • f:index()
  • f:name()
  • f:parent()
  • f:parentPath()
  • f:path()
  • f:type()
  • f:value()
File functions
  • file:fileExtension(<filepath>)
  • file:fileName(<filepath>)
  • file:parentPath(<filepath>)
  • file:pathElement(<filepath>, <integer>)

  • file:removeExtension(<filepath>)
Job functions
  • job:id()
  • job:name()
  • job:startTime()
  • job:user()
Math functions
  • math:abs(<number>)
  • math:ceil(<number>)
  • math:floor (<number>)
  • math:max(<number1>, <number2>)
  • math:min(<number1>, <number2>)
  • math:round(<number>)
Pipeline functions
  • pipeline:email()
  • pipeline:id()
  • pipeline:name()
  • pipeline:startTime()
  • pipeline:title()
  • pipeline:user()
  • pipeline:version()
String functions
  • str:concat(<string1>, <string2>)
  • str:contains(<string>, <subset>)
  • str:endsWith(<string>, <subset>)
  • str:escapeXML10(<string>)
  • str:escapeXML11(<string>)
  • str:indexOf(<string>, <subset>)
  • str:isNullOrEmpty(<string>)
  • str:lastIndexOf(<string>, <subset>)
  • str:length(<string>)
  • str:matches(<string>, <regEx>)
  • str:regExCapture(<string>, <regEx>, <group>)
  • str:replace(<string>, <oldChar>, <newChar>)
  • str:replaceAll(<string>, <regEx>, <newString>)
  • str:split(<string>, <separator>)
  • str:splitKV(<string>, <pairSeparator>, <keyValueSeparator>)
  • str:startsWith(<string>, <subset>)
  • str:substring(<string>, <beginIndex>, <endIndex>)
  • str:toLower(<string>)
  • str:toUpper(<string>)
  • str:trim(<string>)
  • str:truncate(<string>, <length>)
  • str:unescapeJava(<string>)
  • str:unescapeXML(<string>)
  • str:urlDecode(<URL>, <charset>)
  • str:urlEncode(<infoforURL>, <charset>)
Time functions
  • time:createDateFromStringTZ(<string>, <time zone>, <date format>)
  • time:dateAddition(<intervalType>, <number>, <date>)
  • time:dateDifference(<intervalType>, <datetime1>, <datetime2>)
  • time:dateTimeToMilliseconds(<Date object>)
  • time:dateTimeZoneOffset(<Date object>, <time zone>)
  • time:extractDateFromString(<string>, <format string>)
  • time:extractLongFromDate(<Date object>, <format string>)
  • time:extractStringFromDate(<Date object>, <format string>)
  • time:extractStringFromDateTZ(<Date object>, <time zone>, <format string>)
  • time:millisecondsToDateTime(<long>)
  • time:now()
  • time:nowNanoInstant()
  • time:nowNanoTimestampNumber()
  • time:nowNanoTimestampString()
  • time:nowNanoZonedInstant()
  • time:nowNumber()
  • time:timeZoneOffset(<time zone>)
  • time:trimDate(<datetime>)
  • time:trimTime(<datetime>)
Timestamp functions
  • timestamp:createDateFromString(<date>, <format>)
  • timestamp:createDateFromStringAndZone(<date>, <zone>, <format>)
  • timestamp:extractLongFromDate(<date>, <format>)
  • timestamp:extractStringFromDate(<date>, <format>)
  • timestamp:extractStringFromDateAndZone(<date>, <zone>, <format>)
  • timestamp:nowDate()
  • timestamp:nowLocal()
  • timestamp:nowMilliseconds()
  • timestamp:nowMillisecondsString()
  • timestamp:nowNanoseconds()
  • timestamp:nowNanosecondsString()
  • timestamp:nowZoned()
Miscellaneous functions
  • alert:info()
  • avro:decode(<schema>, <byte array>)
  • emptyList()
  • emptyMap()
  • every(<interval>, <mm() | ss()>)
  • field:field()
  • isEmptyList()
  • isEmptyMap()
  • jvm:maxMemoryMB()
  • length()
  • list:join(<list field>, <separator>)
  • list:joinSkipNulls(<list field>, <separator>)
  • offset:column(<position>)
  • runtime:availableProcessors()
  • runtime:conf(<runtime property>)
  • runtime:loadResource(<file name>, <restricted: true | false>)
  • runtime:loadResourceRaw(<file name>, <restricted: true | false>)
  • runtime:resourcesDirPath()
  • sdc:hostname()
  • sdc:id()
  • size()
  • uuid:uuid()
  • vault:read(<path>, <key>) - Deprecated
  • vault:readWithDelay(<path>, <key>, <delay>) - Deprecated

Record Functions

Use record functions to determine information about a record, such as the stage that created it or whether a field exists in the record.

You can replace any argument with a literal or an expression that evaluates to the argument. String literals must be enclosed in single or double quotation marks.

The expression language provides the following general record functions:

record:attribute(<attribute name>)
Returns the value of the specified record attribute. Use to return data, for example, for the "tag" attribute that you set using the File Tail origin.
Uses the following argument:
  • attribute name - Name of the record header attribute.
Return type: String.
For example, the following expression returns the value of a tag attribute added by a File Tail origin:
${record:attribute('tag')}
record:attributeOrDefault(<attribute name> , <default value>)
Returns the value of the specified record header attribute. When the attribute does not exist or has no value, returns the specified default value.
Uses the following arguments:
  • attribute name - Name of the record header attribute.
  • default value - Value to use when the record header attribute does not exist or has no value.
Return type: Returns a string value when returning the field attribute value. Otherwise, returns the type of the specified default value.
For example, the following expression returns the value of the tag attribute, and returns NA when no value exists:
${record:attributeOrDefault('tag', 'NA')}
record:creator()
Returns the instance name of the stage that created the record, typically an origin stage. Data is stored in the stageCreator field of a record.
Return type: String.
For example, the following expression returns the instance name of the stage that creates each record:
${record:creator()}
record:eventCreation()
Returns the epoch timestamp when an event occurred. Use to return the creation timestamp from an event record. Returns null values when applied to other records.
Return type: String.
For example, the following expression returns the time that the event-generating stage generated an event:
${record:eventCreation()}
record:eventType()
Returns the event type from an event record. Returns null values when applied to other records.
Event generating stages create different event types. For information about the possible event types, see "Event Records" in the stage documentation.
Return type: String.
For example, the following expression returns the event type from an event record:
${record:eventType()}
record:eventVersion()
Returns the event version from an event record. Returns null values when applied to other records.
Return type: String.
For example, the following expression returns the event version from an event record:
${record:eventVersion()}
record:exists(<field path>)
Determines if the specified field is in a record. When the field is not in the record, returns false. When the field is in the record, returns true.
Return type: Boolean.
For example, the following expression returns true when the Name field exists in a record:
${record:exists('/Name')}
It returns false when the Name field is not part of the record.
record:fieldAttribute(<field path>, <field attribute name>)
Returns the value for the specified field attribute.
Uses the following arguments:
  • field path - Path to the field with the attribute.
  • field attribute name - Name of the field attribute.
Return type: String
For example, the following expression returns the value for the pk attribute on the ID field:
record:fieldAttribute('/ID', 'pk')
record:fieldAttributeOrDefault(<field path>, <field attribute name>, <default value>)
Returns the value for the field attribute when it exists. If the attribute does not exist, returns the specified default value.
Uses the following arguments:
  • field path - Path to the field with the attribute.
  • field attribute name - Name of the field attribute.
  • default value - Value to use when the field attribute does not exist or has no value.
Return type: Returns a string value when returning the field attribute value. Otherwise, returns the type of the specified default value.
For example, the following expression returns the value for the pk attribute on the ID field when available, otherwise, it returns "n".
record:fieldAttributeOrDefault('/ID', 'pk', 'n')
record:id()
Returns the record ID. The record ID is also stored in the sourceId record header attribute.
Return type: String.
record:path()
Returns the stages in the pipeline that processed a record. Data is stored in the stagesPath field of a record. The field path uses the following format:
<origin instance name>:<next stage instance name>:<next stage instance name>:... 
Return type: String.
record:type(<field path>)
Checks the data type of a field. Possible return values are:
  • Boolean
  • Char
  • Byte
  • Short
  • Integer
  • Long
  • Float
  • Double
  • Date
  • Datetime
  • Decimal
  • String
  • Time
  • Byte-Array
  • Map
  • List
Return type: String.
record:value(<field path>)
Returns the value of the field in the appropriate data type.
Return type: When the field includes a value, returns the type based on the field value. When the field is null, the function returns no value.

Nulls are coerced based on the expected return type of the function. For example, if an EL is expected to return a string, null is coerced to an empty string. However, a null integer is coerced to 0.

record:valueOrDefault(<field path> , <default value>)
Returns the value of the field in the appropriate data type. If the field does not exist or if the field is null, returns the default value.
Use to provide a default value when a field does not exist or when a field is null.
Return type: Variable, based on whether the function returns the field value or default value.

Delimited Data Record Functions

Use delimited data record functions to process delimited data with the list root field type. If you configured an origin to process the delimited data with the list-map root field type, you can use standard record functions.

For more information about the delimited data root field types, see Delimited Data Root Field Type.

You can replace any argument with a literal or an expression that evaluates to the argument. String literals must be enclosed in single or double quotation marks.

The expression language provides the following delimited data record functions:
record:dExists(<header name>)
Determines if the specified header name exists in the record.
Return type: Boolean.
record:dHasDupHeader()
Determines if the record has duplicate header names.
Return type: Boolean.
record:dIndex(<header name>)
Returns the index associated with the specified header name. Use to find the position of a header in the record.
Return type: Integer, starting with 0 for the first position.
record:dIsDupHeader(<header name>)
Determines if the specified header name is used more than once in a record.
Return type: Boolean.
record:dToMap()
Converts a record with a List root field to a List-Map root field. Use to convert delimited List records to List-Map to allow easier use with standard record functions. For more information about root field types, see Delimited Data Root Field Type.
Return type: Records with a List-Map root field.
To use record:dToMap(), use a slash as the root output field as follows:
record:dValue(<header name>)
Returns the value associated with the specified header name.
Return type: String.
record:dValueAt(<header index>)
Returns the value at the specified header index position.
Return type: String.

Error Record Functions

Error record functions provide information about error records. Use error functions to process error records.

For example, you might use error functions in a Stream Selector to pass data to different error handling pipelines based on the type of error that occurred.

The expression language provides the following error functions:

record:errorCode()
Returns the error code for the error record.
Return type: String.
record:errorCollectorId()
Returns the ID of the Data Collector that sent the record to error.
By default, the Data Collector ID uses the format <hostname>:<port>. You can verify the Data Collector ID in the Data Collector Configuration page.
Return type: String.
record:errorMessage()
Returns the error message for the error record.
Return type: String.
record:errorPipeline()
The name of pipeline that sent the record to error.
Return type: String.
record:errorStackTrace()
Returns the error stack trace for the error record.
Return type: String.
record:errorStage()
Returns the instance name of the stage that sent the record to error.
You can verify the instance name on the General tab of the stage.
Return type: String.
record:errorStageLabel()
Returns the user-defined label for the stage that sent the record to error.
Return type: String.
record:errorTime()
The time the record was sent to error.
Return type: Long.

Base64 Functions

Use Base64 functions to encode or decode data using Base64.

You can replace any argument with a literal or an expression that evaluates to the argument. String literals must be enclosed in single or double quotation marks.

The expression language provides the following Base64 functions:

base64:decodeBytes(<string>)
Returns a decoded byte array from a Base64 encoded string.
Return type: Byte Array.
Uses the following argument:
  • string - The Base64 encoded string to decode.
For example, ${base64:decodeBytes(record:value("/ID"))} decodes the Base64 encoded string in the ID field as a byte array value.
base64:decodeString(<string>, <charset>)
Returns a decoded string from a Base64 encoded string using the specified character set.
Return type: String.
Uses the following arguments:
  • string - The Base64 encoded string to decode.
  • charset - The character set to use to decode the data.
For example, ${base64:decodeString(record:value("/ID"), "UTF-8")} decodes the Base64 encoded string in the ID field using the UTF-8 character set as a string value.
base64:encodeBytes(<byte array>, <urlSafe: true | false>)
Returns a Base64 encoded string value of the specified byte array.
Return type: String.
Uses the following arguments:
  • byte array - The byte array to encode using Base64.
  • urlSafe - When set to true, encodes the data so that it can be safely sent in a URL.
For example, ${base64:encodeBytes(record:value("/ID"), true)} uses Base64 to encode the byte array in the ID field such that the encoded data is URL safe.
base64:encodeString(<string>, <urlSafe: true | false>, <charset>)

Returns a Base64 encoded string value of the specified string.

Return type: String.
Uses the following arguments:
  • string - The string to encode using Base64.
  • urlSafe - When set to true, encodes the data so that it can be safely sent in a URL.
  • charset - The character set to use to encode the data.
For example, ${base64:encodeString(record:value("/ID"), false, "UTF-8")} uses Base64 to encode the string in the ID field using the UTF-8 character set such that the encoded data is not URL safe.

Credential Functions

Use credential functions to retrieve sensitive information stored as secrets from supported credential stores.

Before you use a credential function, you must configure a credential store. For more information about credential stores, see Credential Stores in the Data Collector documentation..

You can use credential functions in stages that require sensitive information, such as user names or passwords, to securely access data in external systems. Credential functions allow pipelines to access external systems without exposing sensitive details.

You cannot use credential functions in all stages. For example, you cannot use credential functions in the Expression Evaluator processor. If credential functions were allowed in stages such as the Expression Evaluator, any user with access to the pipeline could access or print sensitive values, compromising the security of the external system.

Important: When you use a credential function in a stage property, the function must be the only value defined in the property.

String literals must be enclosed in single or double quotation marks.

The expression language provides the following credential functions:

credential:get(<cstoreId>, <userGroup>, <name>)
Returns the secret from the credential store. Uses the following arguments:
  • cstoreId - Unique ID of the credential store to use. Use the ID specified in the $SDC_CONF/credential-stores.properties file. For more information, see Enabling Credential Stores in the Data Collector.
  • userGroup - Group that a user must belong to in order to access the secret. Only users that have execute permission on the pipeline and that belong to this group can validate, preview, or run the pipeline that retrieves the secret.

    If working with Control Hub, specify the group using the required naming convention: <group ID>@<organization ID>.

    To grant access to all users, specify the default all group when working only with Data Collector. When working with Control Hub and Data Collector version 3.16.0 or later, you can specify the default group using all or all@<organization ID>. StreamSets recommends using all so that you do not need to modify credential functions when migrating pipelines from Data Collector to Control Hub.
    Note: When working with Control Hub and a Data Collector version earlier than 3.16.0, you must use the default all@<organization ID> group.
  • name - Name of the secret to retrieve from the credential store. Use the required format for the credential store:
    • AWS Secrets Manager - Enter the secret name using the following format:

      "<name><separator><key>"

      • <name> is the name of the secret to read.
      • <separator> is the separator defined in the $SDC_CONF/credential-stores.properties file.
      • <key> is the key for the value that you want returned.
    • CyberArk - Enter the secret name using the following format:

      "<safe><separator><folder><separator><object name>[<separator><element name>]"

      • <safe> is the CyberArk safe to read. <separator> is the separator defined in the $SDC_CONF/credential-stores.properties file.
      • <folder> is the CyberArk folder to read.
      • <object name> is the CyberArk object or secret to read.
      • <element name> is an optional name for the value that you want returned.

        If you do not specify <element name>, Data Collector uses Content.

    • Google Secret Manager - Enter the secret name using the following format:

      "<name><delimiter><version ID>"

      • <name> is the secret name.
      • <delimiter> is the delimiter defined in the $SDC_CONF/credential-stores.properties file.
      • <version ID> is the version of the secret to return.
    • Hashicorp Vault - Enter the secret name using the following format:

      "<path><separator><key>"

      • <path> is the path in Vault to read,
      • <separator> is the separator defined in the $SDC_CONF/credential-stores.properties file.
      • <key> is the key for the value that you want returned.
    • Java keystore - Enter the name of the secret added to the Java keystore file using the jks-credentialstore add command.
    • Azure Key Vault - Enter the name of the key or secret to retrieve from Azure Key Vault.
Return type: String.
For example, the following expression returns the value of the OracleDBPassword credential defined in the devjks Java keystore credential store and allows any user belonging to the devops group access to the credential when validating, previewing, or running the pipeline:
${credential:get("devjks", "devops@MyCompany", "OracleDBPassword")}
credential:getWithOptions(<cstoreId>, <userGroup>, <name>, <storeOptions>)
Returns the secret from the credential store using additional options to communicate with the credential store. Not applicable for the Java keystore or Google Secret Manager credential stores.
For example, use this function with the CyberArk credential store to define the separator for the secret name within the function. Or use this function with the Hashicorp Vault credential store when you might need a delay of 10 seconds or more before the credentials can be used successfully.
Uses the following arguments:
  • cstoreId - Unique ID of the credential store to use. Use the ID specified in the $SDC_CONF/credential-stores.properties file. For more information, see Enabling Credential Stores in the Data Collector.
  • userGroup - Group that a user must belong to in order to access the secret. Only users that have execute permission on the pipeline and that belong to this group can validate, preview, or run the pipeline that retrieves the secret.

    If working with Control Hub, specify the group using the required naming convention: <group ID>@<organization ID>.

    To grant access to all users, specify the default all group when working only with Data Collector. When working with Control Hub and Data Collector version 3.16.0 or later, you can specify the default group using all or all@<organization ID>. StreamSets recommends using all so that you do not need to modify credential functions when migrating pipelines from Data Collector to Control Hub.
    Note: When working with Control Hub and a Data Collector version earlier than 3.16.0, you must use the default all@<organization ID> group.
  • name - Name of the secret to retrieve from the credential store. Use the required format for the credential store:
    • AWS Secrets Manager - Enter the secret name using the following format:

      "<name><separator><key>"

      • <name> is the name of the secret to read.
      • <separator> is the separator defined in the $SDC_CONF/credential-stores.properties file or in the store options passed with the function.
    • <key> is the key for the value that you want returned.

    • CyberArk - Enter the secret name using the following format:

      "<safe><separator><folder><separator><object name><separator><element name>"

      • <safe> is the CyberArk safe to read.
      • <separator> is the separator defined in the $SDC_CONF/credential-stores.properties file.
      • <folder> is the CyberArk folder to read.
      • <object name> is the CyberArk object to read.
      • <element name> is an optional name for the value that you want returned.

        If you do not specify <element name>, Data Collector uses Content.

    • Hashicorp Vault - Enter the secret name using the following format:

      "<path><separator><key>"

      • <path> is the path in Vault to read.
      • <separator> is the separator defined in the $SDC_CONF/credential-stores.properties file.
      • <key> is the key for the value that you want returned.
    • Azure Key Vault - Enter the name of the key or secret to retrieve from Azure Key Vault.
  • storeOptions - Additional options to communicate with the credential store.
    For AWS Secrets Manager, you can use the following options:
    • separator - Specifies the separator for name and key values in the credential functions, overriding the credentialStore.aws.config.nameKey.separator property.
    • alwaysRefresh - When set to true, forces the key to refresh its cached value before Data Collector retrieves the value, overriding the credentialStore.aws.config.cache.ttl.millis property. Be aware that always refreshing the cached value significantly increases the pipeline run time.

    For CyberArk, you can use the following options:

    • separator - Separator to use for the secret name.
    • ConnectionTimeout - Connection timeout value.
    • FailRequestOnPasswordChange - Whether to fail the request on a password change, set to true or false.

    For Hashicorp Vault, you can use the delay option to enter a delay in milliseconds to allow time for external processing. Use the delay option when using the Vault AWS secret backend to generate AWS access credentials based on IAM policies. According to Vault documentation, you might need a delay of 10 seconds or more before the credentials can be used successfully.

    For Azure Key Vault, you can use the following options:
    • url - Overrides the credentialStore.azure.config.vault.url property in the $SDC_CONF/credential-stores.properties file.
    • retry - Overrides the credentialStore.azure.config.credential.retry.millis property in the $SDC_CONF/credential-stores.properties file.
    • refresh - Overrides the credentialStore.azure.config.credential.refresh.millis property in the $SDC_CONF/credential-stores.properties file.
    • credentialType=certificate - Instructs Azure Key Vault to retrieve the stored PEM certificate as a certificate rather than as a secret. Use to retrieve a PEM certificate stored in Azure Key Vault when you configure a stage to use a remote keystore or truststore for SSL/TLS encryption.
    Use the following format to specify options:
    "<option1>=<value>,<option2>=<value>"
    For example, to use the pipe symbol (|) as the CyberArk separator, enter the following for the options argument:
    "separator=|"
    Or to set the Vault delay to 1,000 milliseconds, enter the following for the options argument:
    "delay=1000"
Return type: String.
For example, the following expression returns the value of the key password stored in the Vault path /secret/databases/oracle from the vault credential store after waiting for a delay of 1,000 milliseconds. The name argument uses the default ampersand (&) as the separator. The expression allows any user belonging to the devops group access to the secret when validating, previewing, or running the pipeline:
${credential:getWithOptions("vault", "devops@MyCompany", "/secret/databases/oracle&password", "delay=1000")}

Data Drift Functions

Use data drift functions to create alerts when data drift occurs. You can use these functions in data drift rules.

You can replace any argument with a literal or an expression that evaluates to the argument. String literals must be enclosed in single or double quotation marks.

The expression language provides the following data drift functions:
drift:names(<field path>, <ignoreWhenMissing: true | false>)
Triggers an alert when key values change in a map or list-map field. When key values are field names, triggers an alert when field names change. Use with list-map or map fields.

Use the ignoreWhenMissing flag to determine the behavior when the field is missing. When set to "true", a missing field causes no errors. When set to "false", a missing field generates an alert for the record missing the field, and for the next record that includes the field.

For example, to trigger an alert when any field name in the record changes and ignore empty records, you can use the root field in the expression as follows:
${drift:names('/', true)}
drift:order(<field path>, <ignoreWhenMissing: true | false>)
Triggers an alert when the field order changes for fields within a list-map field. Use only with list-map fields.
Note that this function triggers alerts when field names change as well.

Use the ignoreWhenMissing flag to determine the behavior when the field is missing. When set to "true", a missing field causes no errors. When set to "false", a missing field generates an alert for the record missing the field, and for the next record that includes the field.

To alert on field order or name changes within a delimited record and ignore empty records, use the root field as follows:
${drift:order('/', true)}
drift:size(<field path>, <ignoreWhenMissing: true | false>)
Triggers an alert when the number of fields changes within a list, list-map, or map field. Use with list, list-map, or map fields.

Use the ignoreWhenMissing flag to determine the behavior when the field is missing. When set to "true", a missing field causes no errors. When set to "false", a missing field generates an alert for the record missing the field, and for the next record that includes the field.

To alert when the number of columns change within the Depts map field and when a record is missing the Depts field, use the following expression:
${drift:size('/Depts', false)}
drift:type(<field path>, <ignoreWhenMissing: true | false>)
Triggers an alert when the data type of the specified field changes. Evaluates only the specified field.

Use the ignoreWhenMissing flag to determine the behavior when the field is missing. When set to "true", a missing field causes no errors. When set to "false", a missing field generates an alert for the record missing the field, and for the next record that includes the field.

For example, to receive alerts when the data type of the ID field changes and when the ID field does not exist in the record, use the following expression:
${drift:type('/ID', false)}

Data Generation Functions

Data generation functions are included in the Protector stage library. You can use data generation functions only in the Protector: Expression processor.

Data generation functions generate random fake data that you can use as a replacement for sensitive data. There are two kinds of data generation functions:
  • Faker functions - Generate specific types of fake data, such as addresses, names, and credit card numbers.
  • Xeger functions - Generate fake data based on user-defined regular expressions.
Both sets of functions provide two types of output:
Random
Values are generated randomly, regardless of input value. Each value that is replaced is replaced with a new, random value.
For example, say you use randomFaker:email() to generate fake email addresses to replace user email addresses. In this case, a random email address is generated for each user email address that is replaced.
Deterministic
Generated values are reused when the same input values appear. This allows you to determine in downstream processing that values recur, while ensuring that the data is protected.
When you configure the function, you specify the input value that enables the reuse of generated values.
You can use any logical expression to define the input value, but here are some common cases:
  • To use the value that is being replaced, use the field function, f:value().

    For example, say you use the following expression to deterministically generate a replacement URL ${deterministicFaker:url(f:value())}. And say the expression replaces www.RealCompanyName.com with www.fakename.com. Each time www.RealCompanyName.com appears in a URL field, it is replaced with the same URL, www.fakename.com.

  • To use values in other fields in the record, use the record:value() function.
    For example, say you want to generate the same fake name for matching UserIDs. Then, you might use the following expressions to generate the first and last names:
    ${deterministicFaker:firstName(record:value('/UserID'))} ${deterministicFaker:lastName(record:value('/UserID'))}

    So if the function generates Mia Lakier for the R2204 user ID, each time R2204 reappears as an ID, the function replaces the real name with Mia Lakier.

Use deterministic functions to provide insight into the frequency of repeated values while protecting sensitive data.
Note: Deterministic functions reuse generated values for the exact same input value. Data with differences in formatting, such as a dash instead of parentheses for area codes, are not considered the same. For best results, ensure that the data uses uniform formatting. When necessary, you can use category functions to standardize certain categories of data.

Faker Functions

The Protector stage library includes the following faker functions to generate fake data. You can use these functions only in stages included in the Protector stage library.

Faker functions have two types of output, random and deterministic.

The following faker functions are listed by category.

Address-Related Data

For address-related data, you can use the following faker functions:
deterministicFaker:country(<input>)
Generates a fake country name. Reuses the same generated country name for the same input value. Uses the following argument:
  • input - Value to determine when to reuse generated country names.
Return type: String.
For example, the following expression generates fake country names and reuses the same fake name for the same input value:
${deterministicFaker:country(f:value)}
So, if the expression generates Indonesia to replace Mexico, each time that Mexico appears in the data, it is replaced by Indonesia.
randomFaker:country()
Generates a fake country name. Uses a random country name for every input value.
Return type: String.
deterministicFaker:UsCity(<input>)
Generates a fake city in the United States. Reuses the same city name for the same input value. Uses the following argument:
  • input - Value to determine when to reuse generated city names.
Return type: String.
randomFaker:UsCity()
Generates a fake city in the United States. Uses a random city name for every input value.
Return type: String.
deterministicFaker:UsFullAddress(<input>)
Generates a fake full address in the United States. Reuses the same address for the same input value. Uses the following argument:
  • input - Value to determine when to reuse generated addresses.
Return type: String.
randomFaker:UsFullAddress()
Generates a fake full address in the United States. Uses a random address for every input value.
Return type: String.
deterministicFaker:UsState(<input>)
Generates a fake United States state name. Reuses the same name for the same input value. Uses the following argument:
  • input - Value to determine when to reuse generated state names.
Return type: String.
randomFaker:UsState()
Generates a fake United States state name. Uses a random state name for every input value.
Return type: String.
deterministicFaker:UsStreetAddress(<input>)
Generates a fake street address in the United States. Reuses the same address for the same input value. Uses the following argument:
  • input - Value to determine when to reuse generated addresses.
Return type: String
randomFaker:UsStreetAddress()
Generates a fake street address in the United States. Uses a random street address for every input value.
Return type: String.
deterministicFaker:UsZipCode(<input>)
Generates a fake United States 9-digit zip code. Reuses the same zip code for the same input value. Uses the following argument:
  • input - Value to determine when to reuse generated zip codes.
Return type: String.
randomFaker:UsZipCode()
Generates a fake United States 9-digit zip code. Uses a random zip code for every input value.
Return type: String.

Finance-Related Data

For finance-related data, you can use the following faker functions:
deterministicFaker:creditCard(<input>)
Generates a fake credit card number. Reuses the same generated number for the same input value. Uses the following argument:
  • input - Value to determine when to reuse generated numbers.
Return type: String.
For example, the following expression generates a fake credit card number and reuses the fake number when the id field in the record has the same ID:
${deterministicFaker:creditCard(record:value('/id'))}
So say a record has an ID of 0034 and a fake credit card number replaces the real card number in the record. Then, each time a subsequent record has the 0034 ID, the function uses the same fake credit card number to replace the real credit card number.
randomFaker:creditCard()
Generates a fake credit card number. Uses a random number for every input value.
Return type: String.
deterministicFaker:creditCardOfType(<type>, <input>)
Generates a fake credit card number of the specified type. Reuses the same generated number for the same input value. Uses the following arguments:
  • type - Credit card type to generate. Use one of the following values: AMERICAN_EXPRESS, MASTERCARD, or VISA.
  • input - Value to determine when to reuse generated numbers. Uses a random number for every input value.
Return type: String.
For example, the following expression generates a fake American Express credit card number and reuses the same number each time the same credit card number appears:
${deterministicFaker:creditCard('AMERICAN_EXPRESS', 'f:value()')}
randomFaker:creditCardOfType(<type>)
Generates a fake credit card number of the specified type. Uses a random number for every input value. Uses the following argument:
  • type - Credit card type to generate. Use of the following values: AMERICAN_EXPRESS, MASTERCARD, or VISA.
Return type: String.
For example, the following expression generates a random Visa credit card number:
${randomFaker:creditCardOfType('VISA')}
deterministicFaker:iban(<input>)
Generates a fake international bank account number (IBAN). Reuses the same generated number for the same input value. Uses the following argument:
  • input - Value to determine when to reuse generated numbers.
Return type: String.
randomFaker:iban()
Generates a fake IBAN. Uses a random number for every input value.
Return type: String.
deterministicFaker:ibanForCountry(<country code>, <input>)
Generates a fake IBAN for the specified country. Reuses the same generated number for the same input value. Uses the following arguments:
  • country code - An ISO-3166-1 alpha-2 country code to indicate the IBAN country format to generate. Note that only a subset of all countries have their own IBAN format. For a list, see this Wikipedia page.
    Tip: For the country code, you can also use the first two characters listed in the IBAN Fields column in the second link. For example, the country code for Albania is AL, as shown below:

  • input - Value to determine when to reuse generated codes.
Return type: String.
For example, the following expression creates a fake IBAN for Costa Rica and reuses the same IBAN each time the same country code appears:
${deterministicFaker:ibanForCountry('CR','f:value()')}
randomFaker:ibanForCountry(<country code>)
Generates a fake IBAN for the specified country. Uses a random IBAN for every input value. Uses the following argument:
  • country code - An ISO-3166-1 alpha-2 country code to indicate the IBAN country format to generate. Note that only a subset of all countries have their own IBAN format. For a list, see this Wikipedia page.
    Tip: For the country code, you can also use the first two characters listed in the IBAN Fields column in the second link. For example, the country code for Albania is AL, as shown below:

Return type: String.
For example, the following expression generates a random Albanian IBAN for every input value:
${randomFaker:ibanForCountry('AL')}

Internet-Related Data

For internet-related data, you can use the following faker functions:
deterministicFaker:email(<input>)
Generates a fake email address. Reuses the same generated address for the same input value. Uses the following argument:
  • input - Value to determine when to reuse generated addresses.
Return type: String.
For example, the following expression generates fake email addresses and reuses the fake address when it sees the same input email address:
${deterministicFaker:email(f:value)}
So, if the expression generates jo@nys.com to replace an email address, each time that same email address appears in the data, it is replaced by jo@nys.com.
randomFaker:email()
Generates a fake email address. Uses a random email address for every input value.
Return type: String.
deterministicFaker:url(<input>)
Generates a fake URL. Reuses the same generated URL for the same input value. Uses the following argument:
  • input - Value to determine when to reuse generated URLs.
Return type: String.
randomFaker:url()
Generates a fake URL. Uses a random URL for every input value.
Return type: String.

Person-Related Data

For person-related data, you can use the following faker functions:

deterministicFaker:race(<input>)
Generates a fake race. Reuses the same generated race for the same input value. Uses the following argument:
  • input - Value to determine when to reuse generated values.
Return type: String.
For example, the following expression generates fake races and uses the same fake value when it sees the same input value:
${deterministicFaker:race(f:value)}
So, if the expression generates Japanese to replace Thai, each time that Thai appears in the data, it is replaced by Japanese.
randomFaker:race()
Generates a fake race. Uses a random race for every input value.
Return type: String.
deterministicFaker:UsFirstName(<input>)
Generates a fake United States first name. Reuses the same generated name for the same input value. Uses the following argument:
  • input - Value to determine when to reuse generated names.
Return type: String.
For example, the following expression generates a fake first name and reuses the fake name when the ID field in the record has the same ID:
${deterministicFaker:UsFirstName(record:value('/id'))}
So if the expression generates Joan in a record with an ID of 1234, then each time a subsequent record has the 1234 ID, the function returns Joan.
randomFaker:UsFirstName()
Generates a fake United States first name. Uses a random name for every input value.
Return type: String.
deterministicFaker:UsLastName(<input>)
Generates a fake United States last name. Reuses the same generated name for the same input value. Uses the following argument:
  • input - Value to determine when to reuse generated names.
Return type: String.
randomFaker:UsLastName()
Generates a fake United States last name. Uses a random name for every input value.
Return type: String.
deterministicFaker:UsPhone(<input>)
Generates a fake United States phone number. Reuses the same number for the same input value. Uses the following argument:
  • input - Value to determine when to reuse generated numbers.
Return type: String.
randomFaker:UsPhone()
Generates a fake United States phone number. Uses a random number for every input value.
Return type: String.
deterministicFaker:UsSsn(<input>)
Generates a fake United States social security number. Reuses the same number for the same input value. Uses the following argument:
  • input - Value to determine when to reuse generated numbers.
Return type: String.
randomFaker:UsSsn()
Generates a fake United States social security number. Uses a random number for every input value.
Return type: String.

Xeger Functions

Xeger functions generate fake data based on user-defined regular expressions. You can use xeger functions only in stages included in the Protector stage library.

Before using a xeger function, check if a faker function generates the type of data that you are looking for.

You can use xeger functions with category functions to generate random data while retaining some parts of the original data. For example, the following expression generates a fake user name based on [a-z]{5,10} and reuses it when the same email address appears in the email field. Then, it appends the fake user name to the real domain associated with the input email address:
${xeger:deterministic(‘[a-z]{5,10}’, record:value('/email')}@${EMAIL:domain()}

This generates user names as follows: <fake username>@<real domain>.

You can use the following xeger functions:

xeger:deterministic(<regular expression>, <input>)
Generates fake data based on the structure of the data defined by the regular expression. Reuses the same fake data when the input value is the same. Uses the following arguments:
  • regular expression - A regular expression that defines the structure of the fake data to generate.
  • input - Value to determine when to reuse generated values.
Return type: String.
For example, the following expression generates a fake social security number and reuses the number in records that have the same user ID:
${xeger:deterministic('[0-9]{3}-[0-9]{2}-[0-9]{4}', record:value('/userId')}

So when the expression generates a fake social security number for a record with a user ID of 55343, each time the 55343 user ID appears, the expression returns the same fake social security number. Though the social security numbers may not be strictly valid, the data is protected.

xeger:random(<regular expression>)
Generates fake data based on the structure of the data defined by the regular expression. Generates new fake data for every input value. Uses the following argument:
  • regular expression - A regular expression that defines the structure of the fake data to generate.
Return type: String.

Field Functions

You can use field functions in field path expressions that determine the set of fields that a processor uses. Each function is evaluated against a set of matching fields individually.

You can use the following field functions in Protector stages, such as the Protector: Expression processor:
  • f:attribute()
  • f:name()
  • f:type()
  • f:path()
  • f:value()

For more information about field path expressions, see Field Path Expressions. For a list of stages where you can use field functions, see Supported Stages.

f:attribute(<attribute name>)
Returns the value of the specified field attribute.
Uses the following argument:
  • attribute name - A string that specifies a field attribute name.
Return type: String.
The following field path expression returns only the fields where the region attribute is set to west:
/*[${f:attribute('region') == 'west'}]
f:getSiblingWithName(<field name>)
Returns the sibling field - a field with the same parent field - that has the name matching <field name>, if the field exists.
Uses the following argument:
  • field name - A string that specifies a field name.
Return type: Field.
Note: The returned field object supports the value() and type() field functions.
For example, the following expression returns the value from the sibling field named SalePrice:
${f:getSiblingWithName('SalePrice').value}
f:hasSiblingWithName(<field name>)
Returns true if there is a sibling field with a name matching <field name>, otherwise returns false.
Uses the following argument:
  • field name - A string that specifies a field name.
Return type: Boolean.
For example, the following conditional expression returns true if there is a sibling field named SalePrice:
${f:hasSiblingWithName('SalePrice')}
f:hasSiblingWithValue(<field name>, <field value>)
Returns true if there is a sibling field with a name matching <field name> that has value matching <field value>, otherwise returns false.
Uses the following arguments:
  • field name - A string that specifies a field name.
  • field value - The value stored in the field. The field determines the data type.
Return type: Boolean.
For example, the following conditional expression returns true if there is a sibling field named SalePrice that has a Float value of 0.99:
${f:hasSiblingWithValue('SalePrice',0.99)}
f:index()
Returns the index within the parent list field. Returns -1 if the field is not in a list.
Return type: Integer.
For example, the following conditional expression returns true for the first field - the field with an index of 0 - from a RankedItem list field:
${str:endsWith(f:parentPath(),"/RankedItem") && f:index() == 0}
f:name()
Returns the field name.
Return type: String.
For example, you might use this function in a Protector: Expression processor to replace sensitive data such as social security numbers with the name of the field, e.g. ssn.
f:parent()
Returns the parent field.
Return type: Field.
Note: The returned field object supports the value() and type() field functions.
For example, the following conditional expression returns true if the parent field is a map field:
${f:parent().type =='MAP')}
f:parentPath()
Returns the path of the parent field.
Return type: String.
For example, the following conditional expression returns true if the parent field path ends in Item:
${str:endsWith(f:parentPath(),'Item')}
f:path()
Returns the path of a field. Use to return the fields with a specified path or where the path is as defined in the expression.
Return type: String.
For example, the following field path expression returns all datetime fields except for the audit_date field:
/*[${f:type() == 'DATETIME' && f:path() != '/audit_date'}]
f:type()
Returns the data type of a field. Use to return the fields with a specified type or where the field type is as defined in the expression.
Important: Use all caps when specifying a data type in an expression and enclose the string in quotation marks. For a list of valid data types, see Data Collector Data Types.
Return type: String.
For example, the following field path expression returns all boolean fields:
/*[${f:type() == 'BOOLEAN'}]
f:value()
Returns the value of the field. Use to return the fields with a specified value or where the value is as defined in the expression.
Return type: Dependent on the field type.
For example, the following field path expression returns only the fields where the value is greater than 0.
/*[${f:value() > 0}]

File Functions

Use file functions to return information about a file name or path. For example, you might use a file function to remove a file extension from a file path or to return part of the path.

You can replace any argument with a literal or an expression that evaluates to the argument. String literals must be enclosed in single or double quotation marks.

The expression language provides the following file functions:
file:fileExtension(<filepath>)
Returns the file extension from a file path. Uses the following argument:
  • filepath - An absolute path to a file.
Return type: String.
For example, the following expression returns "txt":
${file:fileExtension('/logs/weblog.txt')}
file:fileName(<filepath>)
Returns the file name from a file path. Uses the following argument:
  • filepath - An absolute path to a file.
Return type: String.
For example, the following expression returns the filename, "weblog.txt":
${file:fileName('/logs/weblog.txt')}
file:parentPath(<filepath>)
When used with a path to a file, returns the path to the file without the final separator, such as /files for /files/file.log.
When used with a path to a directory, returns the path to the directory without the final separator, such as /serverA/logs for /serverA/logs/2016.
Uses the following argument:
  • filepath - An absolute path to a file.
Return type: String.
For example, the following expression, which includes a path to a file, returns "/serverB/logs":
${file:parentPath('/serverB/logs/weblog.txt')}
Similarly, the following expression, which includes a path to a directory, returns the parent directory, "/serverB/logs":
${file:parentPath('/serverB/logs/weblogs')}
file:pathElement(<filepath>, <integer>)
Returns the part of a path based on the specified integer. Uses the following arguments:
  • filepath - An absolute path to a file.
  • integer - The section of a path to return. Can return parts starting from the left or right side of the path:
    • To return a section of a path, counting from the left side of the path, use 0 and positive integers and start with 0.
    • To return a section of a path, counting from the right side of the path, use negative integers and start with -1.
Return type: String.
For example, to return the logs portion of the path, you can use either of the following expressions:
${file:pathElement('/logs/weblog.txt',0)}
${file:pathElement('/logs/weblog.txt',-2)}
file:removeExtension(<filepath>)
Returns the file path without the file extension. Uses the following argument:
  • filepath - An absolute path to a file.
Return type: String.
For example, the following expression returns "/logs/weblog":
${file:removeExtension('/logs/weblog.txt')}

Job Functions

Use job functions to return information about a Control Hub job. For example, you might use a job function to return the name of the job running a pipeline.

You can replace any argument with a literal or an expression that evaluates to the argument. String literals must be enclosed in single or double quotation marks.

The expression language includes the following job functions:
job:id()
Returns the ID of the job if the pipeline was run from a Control Hub job. Otherwise, returns "UNDEFINED".
Return type: String.
job:name()
Returns the name of the job if the pipeline was run from a Control Hub job. Otherwise, returns "UNDEFINED".
Return type: String.
job:startTime()
Returns the start time of the job if the pipeline was run from a Control Hub job. Otherwise, returns the start time of the pipeline.
Return type: Datetime.
job:user()
Returns the user who started the job if the pipeline was run from a Control Hub job. Otherwise, returns "UNDEFINED".
Return type: String.

Math Functions

Use math functions to perform math on numeric values.

You can replace any argument with a literal or an expression that evaluates to the argument. String literals must be enclosed in single or double quotation marks.

You can use the following data types with math functions:
  • Double
  • Float
  • Integer
  • Long
  • String
    Note: Data Collector parses a string as a Double value if the numeric value in a String field contains a dot ( . ). Otherwise, Data Collector parses the string as a Long value.
The expression language provides the following math functions:
math:abs(<number>)
Returns the absolute value, or positive version, of the argument. If the argument is already positive, returns the original number.
Return type: Double, Float, Int, or Long, based on the data type of the argument.
math:ceil(<number>)
Returns the smallest integer greater than or equal to the argument.
For example, ${math:ceil(8.0+3.6)} returns 12.0.
Return type: Double.
math:floor (<number>)
Returns the largest integer less than or equal to the argument.
Return type: Double.
math:max(<number1>, <number2>)
Returns the greater of two arguments.
For example, if the value of the tip_amount field is 2.5 and the value of the fare_amount field is 12.5, then ${math:max(record:value("/tip_amount"), record:value("/fare_amount"))} returns 12.5.
Return type: Double, Float, Int, or Long, based on the data type of the argument.
math:min(<number1>, <number2>)
Returns the lesser of two arguments.
Return type: Double, Float, Int, or Long, based on the data type of the argument.
math:round(<number>)
Returns the closest number to the argument, rounding up for ties.
Return type: Double or Long.

Pipeline Functions

Use pipeline functions to determine information about a pipeline, such as the pipeline title or ID. The expression language provides the following pipeline functions:

pipeline:email()
Returns the email address of the user who started the pipeline.
Return type: String.
pipeline:id()
Returns the ID of the pipeline. The ID is a UUID automatically generated when the pipeline is created and is used by Data Collector to identify the pipeline. The pipeline ID cannot be changed.
Return type: String.
For example, you can include the function in the directory template for the Hadoop FS destination to include the pipeline ID in the output directory name. The following expression creates output directories based on the pipeline ID and timestamp of a record:
/outputfiles/${pipeline:id()}/${YYYY()}-${MM()}-${DD()}-${hh()}
pipeline:name()
Like pipeline:id, this function returns the ID of the pipeline. The ID is a UUID automatically generated when the pipeline is created and is used by Data Collector to identify the pipeline. The pipeline ID cannot be changed.
In Data Collector versions earlier than 2.3.0.0, this function returned the pipeline name. This function is retained for backward compatibility. Use pipeline:id for new development.
Return type: String.
For example, you can include the function in the directory template for the Hadoop FS destination to include the pipeline ID in the output directory name. The following expression creates output directories based on the pipeline ID and timestamp of a record:
/outputfiles/${pipeline:name()}/${YYYY()}-${MM()}-${DD()}-${hh()}
pipeline:startTime()
Returns the start time of the pipeline.

Return type: Datetime.

pipeline:title()
Returns the title or name of the pipeline.
Return type: String.
pipeline:user()
Returns the user who started the pipeline.
Return type: String.
pipeline:version()
Returns the pipeline version when the pipeline has been published to StreamSets Control Hub. Returns "UNDEFINED" if the pipeline has not been published to Control Hub. Use this function when you have registered Data Collector to work with Control Hub.
Return type: String.

String Functions

Use string functions to transform string data.

You can replace any argument with a literal or an expression that evaluates to the argument. String literals must be enclosed in single or double quotation marks.

The expression language provides the following string functions:

str:concat(<string1>, <string2>)
Concatenates two strings together.
Uses the following arguments:
  • string1 - The first string to concatenate.
  • string2 - The second string to concatenate.
Use the str:concat function to concatenate strings within an expression. For example, to concatenate the FirstName field with the LastName field and then check if that result matches a pattern, use the following expression:
${str:matches(str:concat(record:value("/FirstName"), record:value("/LastName")), "[A-Z][a-z]+[A-Z][a-z]+")}
You can concatenate more than two strings by nesting multiple str:concat functions in a single expression. For example, to concatenate the FirstName, MiddleName, and LastName fields, use the following expression:
${str:concat(str:concat(record:value("/FirstName"), record:value("/MiddleName")), record:value("/LastName"))}
str:contains(<string>, <subset>)
Returns true or false based on whether the string contains the configured subset of characters.
Uses the following arguments:
  • string - The string to evaluate.
  • subset - The subset of characters to look for.
For example, ${str:contains("Jane", "boo")} returns: false.
str:endsWith(<string>, <subset>)
Returns true or false based on whether the string ends with the configured subset of characters.
Uses the following arguments:
  • string - The string to evaluate.
  • subset - The subset of characters to look for.
For example, ${str:endsWith("32403-1001", "1001")} returns: true.
str:escapeXML10(<string>)
Returns a string that you can embed in an XML 1.0 or 1.1 document.
Uses the following argument:
  • string - The string to escape.
Return type: String.
For example, you can use the following expression to generate XML-safe string data based on the XML in the XMLdata field: ${str:escapeXML10(record:value('/XMLdata'))}
str:escapeXML11(<string>)
Returns a string that you can embed in an XML 1.1 document.
Uses the following argument:
  • string - The string to escape.
Return type: String.
For example, you can use the following expression to generate XML-safe string data based on the XML in the XMLdata field: ${str:escapeXML11(record:value('/XMLdata'))}
str:indexOf(<string>, <subset>)
Returns the index within a string of the first occurrence of the specified subset of characters.
Return type: Integer.
Uses the following arguments:
  • string - The string to return the index of.
  • subset - The subset of characters to look for.
For example, ${str:indexOf("pepper", "pe")} returns: 0.
str:isNullOrEmpty(<string>)
Returns true or false based on whether a string is null or is the empty string.
Uses the following argument:
  • string - The string to evaluate.
For example, ${str:isNullOrEmpty(record:value("/work_phone"))} returns true if the work_phone field is the empty string.
str:lastIndexOf(<string>, <subset>)
Returns the index within a string of the last occurrence of the specified subset of characters.
Return type: Integer.
Uses the following arguments:
  • string - The string to return the index of.
  • subset - The subset of characters to look for.
For example, ${str:lastIndexOf("pepper", "pe")} returns: 3.
str:length(<string>)
Returns the length of a string.
Return type: Integer.
Uses the following argument:
  • string - The string to return the length for.
For example, ${str:length("tomorrow")} returns: 8.
str:matches(<string>, <regEx>)
Returns true or false based on whether a string matches a Java regex pattern.
Uses the following arguments:
  • string - The string to evaluate.
  • regEx - Regular expression that describes the pattern of the string.
For example, ${str:matches(record:value("/HostName"), ".*\.com$")} checks if the string in the HostName field ends with the characters ".com".
str:regExCapture(<string>, <regEx>, <group>)
Parses a complex string into groups based on a Java regex pattern and returns the specified group.
Uses the following arguments:
  • string - The string that contains a pattern of characters.
  • regEx - Regular expression that describes the pattern of the string, separating it into groups. Use the backslash as an escape character for special characters in the expression. For example, to represent a digit in the expression with the characters \d, use \\d.
  • group - The number of the group to return, where 1 represents the first group, 2 represents the second group, etc. 0 returns the entire string.

For example, to extract the month and day from the Date field that uses the format 'mm-dd-yyyy', use the following function: ${str:regExCapture(record:value('/Date'), '(\\w{2}\\-\\w{2})-\\w*', 1)}

str:replace(<string>, <oldChar>, <newChar>)
Replaces all instances of a specified character in a string with a new character.
Uses the following arguments:
  • string - The string that contains the character to replace.
  • oldChar - Character to replace.
  • newChar - Character to use for replacement.
For example, ${str:replace("lecucereche", "e", "a")} returns: lacucaracha.
str:replaceAll(<string>, <regEx>, <newString>)
Replaces a set of characters in a string with a new set of characters.
Uses the following arguments:
  • string - The string that contains the group of characters to replace.
  • regEx - A regular expression that describes the string to replace.
  • newString - The set of characters to use for replacement.

For example, ${str:replaceAll("shoes_sandals","^shoes","footwear")} returns: footwear_sandals.

str:split(<string>, <separator>)
Splits a string into a list of strings based on the specified separator.
Uses the following arguments:
  • string - An input string.
  • separator - The set of characters that designate a string split.
For example, suppose a record contains a dimensions string field that records a height, width, and length: "height=32 width=24 length=36". The function ${str:split(record:value("/dimensions"), " ")} returns the following list of strings:
"height-32", "width=24", "length=36"
str:splitKV(<string>, <pairSeparator>, <keyValueSeparator>)
Splits key-value pairs in a string into a map of string values.
Uses the following arguments:
  • string - The string containing the key-value pairs.
  • pairSeparator - The set of characters that separate the key-value pairs.
  • keyValueSeparator - The set of characters that separate each key and value.
For example, suppose a record contains a dimensions string field that records a height, width, and length: "height=32 width=24 length=36". The function ${str:splitKV(record:value("/dimensions"), " ", "=")} returns the following map of string values:
"dimensions": {
      "height": "32",
      "width": "24",
      "length": "36"
    }
str:startsWith(<string>, <subset>)
Returns true or false based on whether the string starts with the configured subset of characters.
Uses the following arguments:
  • string - The string to evaluate.
  • subset - The subset of characters to look for.
For example, ${str:startsWith("Data Collector", "Data")} returns: true.
str:substring(<string>, <beginIndex>, <endIndex>)
Returns a subset of the string value that starts with the beginIndex character and ends one character before the endIndex.
Uses the following arguments:
  • string - The string that contains the return substring that you want.
  • beginIndex - An integer that represents the beginning position of the returned substring. Start the count from the left with 0.
  • endIndex - An integer that represents one position past the end of the substring.
For example, ${str:substring("Chewing Gum", 0, 5)} returns: Chew.
str:toLower(<string>)
Converts string data to all lowercase letters.
Uses the following argument:
  • string - The string to lower case.
For example, ${str:toLower("FALSE")} returns: false.
str:toUpper(<string>)
Converts string data to all capital letters.
Uses the following argument:
  • string - The string to capitalize.
For example, ${str:toUpper("true")} returns: TRUE.
str:trim(<string>)
Trims leading and trailing white space characters from a string, including spaces and return characters.
Uses the following argument:
  • string - The string to return without additional white space characters.
For example, ${str:trim(record:value("/Address"))} trims leading and trailing white space characters from the Address field.
str:truncate(<string>, <length>)
Returns a string truncated to the specified length. Use an integer to specify the length.
Uses the following arguments:
  • string - The string to truncate.
  • length - An integer that represents the number of characters to keep.
For example, when a phone number field includes string numbers such as 415-555-5555, ${str:truncate(record:value('/phone'),3)} returns the area code of the phone number.
str:unescapeJava(<string>)
Returns an unescaped string from a string with special Java characters. Use to include binary or non-printable characters in any location where you can enter an expression.
Uses the following argument:
  • string - The string to process.
str:unescapeXML(<string>)
Returns an unescaped string from a string that had XML data escaped.
Uses the following argument:
  • string - The string that includes escaped XML data.
For example, say you have escaped XML data embedded within an XML document in an additionalXML field. The following expression unescapes the XML: ${str:unescapeXML(record:value('/additionalXML'))}
str:urlDecode(<URL>, <charset>)
Converts characters from a URL to the specified character set, such as UTF-8.
Uses the following arguments:
Return value: String.
For example, to convert a URL to UTF-8, use the following expression:
${str:urlDecode(record:value('/URL'), 'UTF-8')}
str:urlEncode(<infoforURL>, <charset>)
Converts invalid characters to help create a valid URL based on the specified character set, such as UTF-8. You might use this function when using record data to add additional information, like a fragment, to a URL.
Uses the following arguments:
Return value: String.
For example, to encode a search result for use in a URL, converting spaces to valid characters, you might use the following expression:
${str:urlEncode(record:value('/result'), 'UTF-8')}

Time Functions

Use time functions to return the current time or to transform datetime data.

You can replace any datetime argument with an expression that evaluates to a datetime value. You cannot replace a datetime argument with a datetime literal.

You can replace any long or string argument with a literal or an expression that evaluates to the argument. String literals must be enclosed in single or double quotation marks.

The expression language provides the following time functions:

time:createDateFromStringTZ(<string>, <time zone>, <date format>)
Creates a Date object based on a datetime in a String field and using the specified time zone. The datetime string should not include the time zone.
Uses the following arguments:
  • string - String with datetime values, not including the time zone.
  • time zone - Time zone associated with the datetime values. The time zone is used when creating the Date object.
    You can use the following time zone formats:
    • <area>/<location> - For example, America/Chicago or Europe/Madrid.
    • Numeric time zones with the GMT prefix, such as GMT-0500 or GMT-8:00. Note that numeric-only time zones such as -500 are not supported.
    • Short time zone IDs such as EST and CST - These time zones should generally be avoided because they can stand for multiple time zones, e.g. CST stands for both Central Standard Time and China Standard Time.
  • date format - The date format used by the string data. For information about date formats, see https://docs.oracle.com/javase/8/docs/api/java/text/SimpleDateFormat.html.
For example, the following expression converts string values in the Datetime field to a Date object using the time zone specified in the TZ field. It assumes the date is in the format, "03/30/2016 12:15:33 AM":
${time:createDateFromStringTZ(record:value('/Datetime'), 'record:value('/TZ')', 'MM/dd/yyyy hh:mm:ss a')}
time:dateAddition(<intervalType>, <number>, <date>)
Adds an interval to a datetime value.
Uses the following arguments:
  • intervalType- String that defines the type of interval to add. Specify one of the following interval types:
    • YEAR

    • QUARTER
    • MONTH
    • DAY_OF_YEAR
    • DAY

    • WEEK

    • WEEKDAY

    • HOUR

    • MINUTE

    • SECOND

    • MILLISECOND
  • number - Number to add to the datetime value. Specify a negative number to subtract from the datetime value.
  • date - Datetime value to add the interval to. Use one of the following functions to specify the datetime value:
    • Enter the time:createDateFromStringTZ(<string>, <time zone>, <date format>) function to specify a fixed datetime.
    • Enter the time:now() function to use the current time of the Data Collector machine.
    • Enter the record:value(<field path>) function to use a date that is in a record field.
Return type: Date.
For example, the following expression adds ten months to the datetime value and returns the value 2012-03-05 00:00:00:
${time:dateAddition("MONTH", 10, time:createDateFromStringTZ("2011-05-05 00:00:00", "Pacific/Honolulu","yyyy-MM-dd HH:mm:ss"))}
The following expression adds ten months to the datetime value and returns a value that is ten months from the time of the Data Collector machine when the expression is run:
${time:dateAddition("MONTH", 10, time:now())}
time:dateDifference(<intervalType>, <datetime1>, <datetime2>)
Determines the interval between two datetime values.
Uses the following arguments:
  • intervalType - String that defines the type of interval to return. Specify one of the following interval types:
    • YEAR

    • QUARTER
    • MONTH
    • DAY_OF_YEAR
    • DAY

    • WEEK

    • WEEKDAY

    • HOUR

    • MINUTE

    • SECOND

    • MILLISECOND
  • datetime1 - Starting datetime value.
  • datetime2 - Ending datetime value.
Use one of the following functions to specify both datetime values:
  • Enter the time:createDateFromStringTZ(<string>, <time zone>, <date format>) function to specify a fixed datetime.
  • Enter the time:now() function to use the current time of the Data Collector machine.
  • Enter the record:value(<field path>) function to use a date that is in a record field.
Return type: Long.
For example, the following expression calculates the number of years between the two datetime values and returns the value 12:
${time:dateDifference("YEAR", time:createDateFromStringTZ("2023-05-05 00:00:00", "Pacific/Honolulu","yyyy-MM-dd HH:mm:ss"), time:createDateFromStringTZ("2011-05-05 00:00:00", "Pacific/Honolulu","yyyy-MM-dd HH:mm:ss"))}
time:dateTimeToMilliseconds(<Date object>)

Converts a Date object to an epoch or UNIX time in milliseconds.

For example, the following expression converts the current time to epoch or UNIX time in seconds, and then multiplies the value by 1000 to convert the value to milliseconds:
${time:dateTimeToMilliseconds(time:now())}

Return type: Long.

time:dateTimeZoneOffset(<Date object>, <time zone>)

Returns the time zone offset in milliseconds for the specified date and time zone. The time zone offset is the difference in hours and minutes from Coordinated Universal Time (UTC).

Uses the following arguments:
  • Date object - Date object to use.
  • time zone - Time zone associated with the Date object.
    You can use the following time zone formats:
    • <area>/<location> - For example, America/Chicago or Europe/Madrid.
    • Numeric time zones with the GMT prefix, such as GMT-0500 or GMT-8:00. Note that numeric-only time zones such as -500 are not supported.
    • Short time zone IDs such as EST and CST - These time zones should generally be avoided because they can stand for multiple time zones, e.g. CST stands for both Central Standard Time and China Standard Time.
For example, the following expression returns the time zone offset of the Date object stored in the Date field using the time zone specified in the TZ field:
${time:dateTimeZoneOffset(record:value('/Date'), record:value('/TZ'))}

Return type: Long.

time:extractDateFromString(<string>, <format string>)

Extracts a Date object from a String, based on the specified date format.

Uses the following arguments:
For example, the following expression converts the string '2017-05-01 20:15:30.915' to a Date object:
${time:extractDateFromString('2017-05-01 20:15:30.915','yyyy-MM-dd HH:mm:ss.SSS')}
Return type: Date object.
time:extractLongFromDate(<Date object>, <format string>)
Extracts a long value from a Date object, based on the specified date format.
Uses the following arguments:
For example, the following expression returns the month as a long value from the Date object stored in the Timestamp field:
${time:extractLongFromDate(record:value('/Timestamp'), "MM")}
Return type: Long.
Note: Because the function returns a long value, you cannot specify non-numeric data in the data format string. For example, the date format "MMM" returns the three character abbreviation for the month (such as "Jul"), which causes the function to return incorrect results.
time:extractStringFromDate(<Date object>, <format string>)
Extracts a string value from a Date object based on the specified date format.
Uses the following arguments:
For example, the following expression returns the month as a string value from the Date object stored in the Timestamp field:
${time:extractStringFromDate(record:value('/Timestamp'), "MMM")}
Return type: String.
time:extractStringFromDateTZ(<Date object>, <time zone>, <format string>)
Extracts a string value from a Date object, converting the GMT time in the Date object to the specified date format and time zone. The function adjusts for daylight savings when given the time zone in the appropriate format.
Uses the following arguments:
  • Date object - Date object to use.
  • time zone - Time zone to use in the conversion.

    To convert the time zone and adjust for daylight savings, use the <Area>/<Location> format. For example, America/Chicago or Europe/Madrid.

    Short time zone IDs such as EST and CST return data, but note that these time zones do not adjust for daylight savings, and should be avoided because they can stand for multiple time zones, e.g. CST stands for both Central Standard Time and China Standard Time.

    You can use numeric time zones with the GMT prefix, such as GMT-0500 or GMT-8:00, but these time zones also do not account for daylight savings. Note that numeric-only time zones such as -500 are not supported.

  • format string - String that specifies the format to use to express the date. For information about creating a date format, see https://docs.oracle.com/javase/8/docs/api/java/text/SimpleDateFormat.html.
Return type: String.
Returns an empty string when the time zone is not specified or invalid.
For example, the following expression converts a Date object to a string date while adjusting to the America/New_York time zone and daylight savings. It expresses the date in the format, "03/30/2016 12:15:33 AM":
${time:extractStringFromDateTZ(record:value('/datetime'), 'America/New_York', 'MM-dd-yyyy hh:mm:ss a')}
time:millisecondsToDateTime(<long>)
Converts an epoch or UNIX time in milliseconds to a Date object.
If the epoch or UNIX time is in seconds, multiply the value by 1000 to produce a value in the milliseconds range. For example, the following expression multiplies the epoch time in seconds stored in the epochTime field and then converts the value to a Date object:
${time:millisecondsToDateTime(record:value('/epochTime')* 1000)}
Return type: Date object.
time:now()
Returns the current time of the Data Collector machine as a java.util.Date object.
For example, you might use the function in the Expression Evaluator to create a field with the time of processing.
Return type: Datetime.
time:nowNanoInstant()
Returns a LocalDateTime object set to the current date and time of the Data Collector machine with nanoseconds precision.
Return type: LocalDateTime object.
time:nowNanoTimestampNumber()
Returns a Number object set to the current date and time of the Data Collector machine, specified as epoch time, with nanoseconds precision.
Return type: Number object.
time:nowNanoTimestampString()
Returns a String object set to the current date and time of the Data Collector machine, specified as epoch time, with nanoseconds precision.
Return type: String object.
time:nowNanoZonedInstant()
Returns a ZonedDateTime object set to the current date and time of the Data Collector machine with nanoseconds precision.
Return type: ZonedDateTime object.
time:nowNumber()
Returns the current time of the Data Collector machine as a Long with milliseconds precision.
Return type: Long.
time:timeZoneOffset(<time zone>)

Returns the time zone offset in milliseconds for the specified time zone. The time zone offset is the difference in hours and minutes from Coordinated Universal Time (UTC).

Uses the following argument:
  • time zone - Time zone to use.
    You can use the following time zone formats:
    • <area>/<location> - For example, America/Chicago or Europe/Madrid.
    • Numeric time zones with the GMT prefix, such as GMT-0500 or GMT-8:00. Note that numeric-only time zones such as -500 are not supported.
    • Short time zone IDs such as EST and CST - These time zones should generally be avoided because they can stand for multiple time zones, e.g. CST stands for both Central Standard Time and China Standard Time.
For example, the following expression returns the time zone offset using the time zone specified in the TZ field:
${time:timeZoneOffset(record:value('/TZ'))}

Return type: Long.

time:trimDate(<datetime>)
Trims the date portion of a datetime value by setting the date portion to January 1, 1970.
For example, if the current time of the Data Collector machine is Jul 25, 2016 5:18:05 PM, then ${time.trimDate(time:now())} returns: Jan 1, 1970 5:18:05 PM.
Return type: Datetime.
time:trimTime(<datetime>)
Trims the time portion of a datetime value by setting the time portion to 00:00:00.
For example, if the value of the pickup_datetime field is Jul 22, 2016 11:16:10 AM, then ${time:trimTime(record:value("/pickup_datetime"))} returns: Jul 22, 2016 12:00:00 AM.
Return type: Datetime.

Timestamp Functions

Use timestamp functions to return the current time or to transform datetime data.

Some of these functions have corresponding time functions. StreamSets recommends using timestamp functions for all new development for better performance.

You can replace any argument with a literal or an expression that evaluates to the argument. String literals must be enclosed in single or double quotation marks.

The expression language provides the following timestamp functions:

timestamp:createDateFromString(<date>, <format>)
Creates a Date object from a String, based on the specified date format.
Uses the following arguments:
For example, the following expression converts the string 2017-05-01 20:15:30.915 to a Date object:
${timestamp:createDateFromString('2017-05-01 20:15:30.915','yyyy-MM-dd HH:mm:ss.SSS')}
Return type: Date object.
timestamp:createDateFromStringAndZone(<date>, <zone>, <format>)
Creates a Date object based on a datetime in a String field and using the specified time zone. The datetime string should not include the time zone.
Uses the following arguments:
  • string - String with datetime values, not including the time zone.
  • zone - Time zone associated with the datetime values. The time zone is used when creating the Date object.
    You can use the following time zone formats:
    • <area>/<location> - For example, America/Chicago or Europe/Madrid.
    • Numeric time zones with the GMT prefix, such as GMT-0500 or GMT-8:00. Note that numeric-only time zones such as -500 are not supported.
    • Short time zone IDs such as EST and CST - These time zones should generally be avoided because they can stand for multiple time zones, e.g. CST stands for both Central Standard Time and China Standard Time.
  • format - The date format used by the string data. For information about date formats, see https://docs.oracle.com/javase/8/docs/api/java/text/SimpleDateFormat.html.
For example, the following expression converts string values in the Datetime field to a Date object using the time zone specified in the TZ field. It assumes the date is in the format, "03/30/2016 12:15:33 AM":
${timestamp:createDateFromStringAndZone(record:value('/Datetime'), 'record:value('/TZ')', 'MM/dd/yyyy hh:mm:ss a')}
Return type: Date object.
timestamp:extractLongFromDate(<date>, <format>)
Creates a long value from a Date object, based on the specified date format.
Uses the following arguments:
For example, the following expression returns the month as a long value from the Date object stored in the Timestamp field:
${timestamp:extractLongFromDate(record:value('/Timestamp'), "MM")}
Return type: Long.
Note: Because the function returns a long value, you cannot specify non-numeric data in the data format string. For example, the date format MMM returns the three character abbreviation for the month (such as "Jul"), which causes the function to return incorrect results.
timestamp:extractStringFromDate(<date>, <format>)
Creates a string value from a Date object based on the specified date format.
Uses the following arguments:
For example, the following expression returns the month as a string value from the Date object stored in the Timestamp field:
${timestamp:extractStringFromDate(record:value('/Timestamp'), "MMM")}
Return type: String.
timestamp:extractStringFromDateAndZone(<date>, <zone>, <format>)
Creates a string value from a Date object, converting the GMT time in the Date object to the specified date format and time zone. The function adjusts for daylight savings when given the time zone in the appropriate format.
Uses the following arguments:
  • date - Date object to use.
  • zone - Time zone to use in the conversion.

    To convert the time zone and adjust for daylight savings, use the <Area>/<Location> format. For example, America/Chicago or Europe/Madrid.

    Short time zone IDs such as EST and CST return data, but note that these time zones do not adjust for daylight savings, and should be avoided because they can stand for multiple time zones, e.g. CST stands for both Central Standard Time and China Standard Time.

    You can use numeric time zones with the GMT prefix, such as GMT-0500 or GMT-8:00, but these time zones also do not account for daylight savings. Note that numeric-only time zones such as -500 are not supported.

  • format - String that specifies the format to use to express the date. For information about creating a date format, see https://docs.oracle.com/javase/8/docs/api/java/text/SimpleDateFormat.html.
Returns an empty string when the time zone is not specified or invalid.
For example, the following expression converts a Date object to a string date while adjusting to the America/New_York time zone and daylight savings. It expresses the date in the format, 03/30/2016 12:15:33 AM:
${timestamp:extractStringFromDateTZ(record:value('/datetime'), 'America/New_York', 'MM-dd-yyyy hh:mm:ss a')}
Return type: String.
timestamp:nowDate()
Returns the current time of the Data Collector machine as a java.util.Date object.
For example, you might use the function in the Expression Evaluator processor to create a field with the time of processing.
Return type: Datetime.
timestamp:nowLocal()
Returns the current time of the Data Collector machine as a LocalDateTime object.
Return type: LocalDateTime.
timestamp:nowMilliseconds()
Returns the current time of the Data Collector machine as a Long with milliseconds precision.
Return type: Long.
timestamp:nowMillisecondsString()
Returns a String object set to the current date and time of the Data Collector machine with milliseconds precision.
Return type: String object.
timestamp:nowNanoseconds()
Returns the current time of the Data Collector machine as a Double with nanoseconds precision.
Return type: Double.
timestamp:nowNanosecondsString()
Returns a String object set to the current date and time of the Data Collector machine, with nanoseconds precision.
Return type: String object.
timestamp:nowZoned()
Returns a ZonedDateTime object set to the current date and time of the Data Collector machine with nanoseconds precision.
Return type: ZonedDateTime object.

Miscellaneous Functions

In miscellaneous functions, you can replace any argument with a literal or an expression that evaluates to the argument. String literals must be enclosed in single or double quotation marks.

The expression language provides the following miscellaneous functions:

alert:info()
Returns information about the trigger for a data drift alert. Use only in alert text for data drift alerts.
avro:decode(<schema>, <byte array>)

Returns an Avro record by using the specified schema to decode the specified byte array. You can use this function in the Kafka Message Key property when the Kafka stage processes Avro data.

Uses the following arguments:
  • schema - Avro schema to use to decode the specified byte array.
  • byte array - Byte array to decode.
emptyList()
Creates an empty list.
emptyMap()
Creates an empty map.
every(<interval>, < hh() | mm() | ss() >)
Represents the interval of hours, minutes, or seconds for generating output directories for the Hadoop FS, Local FS, or MapR FS destination.
When used, a destination generates output directories for the specified interval beginning on the hour. For example, when generating directories every 30 minutes, it generates a directory on the hour and on the half-hour.
You can use the function once in the Directory Template property to replace the hour, minute, or second datetime variables.
Use the function to replace the smallest time interval in the directory template.
Note: Destinations generate a directory for the smallest unit of measure by default, so do not use the every function to generate a directory every hour, minute, or second. For more information, see Directory Templates.
Uses the following arguments:
  • interval - An integer factor or submultiple of 60 that represents the interval of minutes or seconds to wait between directory generation. Use one of the following values: 1, 2, 3, 4, 5, 6, 10, 12, 15, 20, or 30.
  • < hh() | mm() | ss() > - Use hh() for hours, mm() for minutes, and ss() for seconds.
For example, the following Directory Template generates a new directory every twelve hours, beginning on the hour:
/outputfiles/${YY()}-${MM()}-${DD()}-${every(12,hh())}
The following Directory Template generates a new directory every fifteen minutes, beginning on the hour:
/outputfiles/${YY()}-${MM()}-${DD()}-${hh()}-${every(15,mm())}
The following Directory Template generates a new directory every 30 seconds, beginning on the hour:
/outputfiles/${YY()}-${MM()}-${DD()}-${hh()}-${mm()}-${every(30,ss())}
field:field()
Returns the field name. Available only in the Decimal field expression properties of the Hive Metadata processor.
Return type: String.
isEmptyList()
Returns true or false based on whether a list is empty.
Return type: Boolean.
isEmptyMap()
Returns true or false based on whether a map is empty.
Return type: Boolean.
jvm:maxMemoryMB()
Returns the Java heap size allocated to the Data Collector in MB.
length()
Returns the length of a list.
Return type: Integer.
list:join(<list field>, <separator>)
Merges elements in a List field into a String field, using the specified separator between elements.
Uses the following arguments:
  • list field - The List field that you want to merge.
  • separator - The string to use to separate the elements in the merged field.
For example, to merge the list in a colors field using a semicolon as a separator character, you can use the following expression:
${list:join(record:value('/colors'), ";")}
And if the list field includes "red", "blue", and "yellow" as elements, the expression produces the following string data:
red;blue;yellow
Return type: String.
list:joinSkipNulls(<list field>, <separator>)
Merges elements in a List field into a String field, using the specified separator between elements and skipping null values.
Uses the following arguments:
  • list field - The List field that you want to merge.
  • separator - The string to use to separate the elements in the merged field.
For example, say you use the following expression to merge the list in the colors field:
${list:join(record:value('/colors'), ";")}
And if the list field includes "red", "blue, null, "yellow", the expression ignores the null value and produces the following string data:
red;blue;yellow
Return type: String.
offset:column(<position>)
Returns the value of the positioned offset column for the current table. Available only in the additional offset column conditions of the JDBC Multitable Consumer origin.
Uses the following argument:
  • position - Position of the offset column. For example, enter 0 for the first offset column defined in the table configuration. Enter 1 for the second defined offset column.
Return type: String.
runtime:availableProcessors()

Returns the number of processors available to the Java virtual machine. You can use this function when you want to configure multithreaded processing based on the number of processors available to Data Collector.

Return type: Integer.

runtime:conf(<runtime property name>)
Returns the value for the specified runtime property. Use to call a runtime property.
Uses the following argument:
  • runtime property name - Name of the runtime property to call. The property must be defined in the Data Collector configuration file or in a separate runtime properties file specified in the Data Collector configuration file.
For more information, see Runtime Properties.
runtime:loadResource(<file name>, <restricted: true | false>)
Returns the value in the specified file, trimming any leading or trailing whitespace characters from the file. Use to call a runtime resource.
Uses the following arguments:
  • file name - Name of the file that contains the information to be loaded. The file must reside in the $SDC_RESOURCES directory.
  • restricted - Whether the file has restricted permissions. If set to true, the file must be owned by the system user who runs the Data Collector and read and writable only by the owner.
For example, the following expression returns the contents of the restricted JDBCpassword.txt file, trimming any leading or trailing whitespace characters:
${runtime:loadResource("JDBCpassword.txt", true)}
For more information about runtime resources, see Runtime Resources.
runtime:loadResourceRaw(<file name>, <restricted: true | false>)
Returns the entire contents in the specified file, including any leading or trailing whitespace characters in the file. Use to call a runtime resource.
Uses the following arguments:
  • file name - Name of the file that contains the information to be loaded. The file must reside in the $SDC_RESOURCES directory.
  • restricted - Whether the file has restricted permissions. If set to true, the file must be owned by the system user who runs the Data Collector and read and writable only by the owner.
For example, the following expression returns the entire contents of the restricted JDBCpassword.txt file, including any leading or trailing whitespace characters:
${runtime:loadResourceRaw("JDBCpassword.txt", true)}
For more information about runtime resources, see Runtime Resources.
runtime:resourcesDirPath()
Returns the full path to the directory for runtime resource files.
For example, when configuring a stage to use SSL/TLS encryption, use the following expression to define the name and location of the keystore file stored in the $SDC_RESOURCES directory:
${runtime:resourcesDirPath()}/keystore.jks
Return type: String.
sdc:hostname()
Returns the host name of the Data Collector machine.
For example, you might use the function in the directory template for the Hadoop FS destination to write to a directory that includes the Data Collector host name.
Return type: String.
sdc:id()
Returns the Data Collector ID.

For a pipeline that runs in standalone execution mode, the ID is a unique identifier associated with the Data Collector, such as 58efbb7c-faf4-4d8e-a056-f38667e325d0. The ID is stored in the following file: $SDC_DATA/sdc.id.

For a pipeline that runs in cluster mode, the ID is the Data Collector worker partition ID generated by a cluster application, such as Spark or MapReduce.

size()
Returns the size of a map.
Return type: Integer.
uuid:uuid()
Returns a randomly generated UUID. For example, you might use the function in an Expression Evaluator processor to generate a UUID for an ID field added to each record.
The uuid:uuid() function uses a lot of entropy on Linux systems and can cause your entropy pools to run dry. When this happens, your pipelines slow to a halt but continue to run. Throughput effectively goes to zero while the system waits for entropy to again become available. As a best practice, we recommend running the haveged daemon on any Data Collector node where the uuid:uuid() function is used. The haveged daemon regenerates your entropy pools.
Return type: String.
vault:read(<path>, <key>)
Returns the value for the key on the specified path. You can use the function in username, password, and similar properties such as AWS access key IDs and secret access keys. You can also use the function in HTTP headers and bodies when using HTTPS.
Important: This function is now deprecated and will be removed in a future release. We recommend using the credential functions available with the Vault credential store integration in pipelines that include JDBC stages. For more information, see Credential Stores in the Data Collector documentation.
Return type: String.
Uses the following arguments:
  • path - The path in Vault to read.
  • key - The key for the value that you want returned.
vault:readWithDelay(<path>, <key>, <delay>)
Returns the value for the key on the specified path after waiting the specified amount of time. Use when you want a delayed response to allow time for external processing.
Important: This function is now deprecated and will be removed in a future release. We recommend using the credential functions available with the Vault credential store integration in pipelines that include JDBC stages. For more information, see Credential Stores in the Data Collector documentation.

You can use the function in username, password, and similar properties such as AWS access key IDs and secret access keys. You can also use the function in HTTP headers and bodies when using HTTPS.

Return type: String.
Uses the following arguments:
  • path - The path in Vault to read.
  • key - The key for the value that you want returned.
  • delay - Milliseconds to wait before returning the value.