Functions
When using a StreamSets function, you can replace any argument with a literal or an expression that evaluates to the argument. String literals must be enclosed in single or double quotation marks.
The following table lists all available functions. For details about each function, see the related function type:
Function Type | Functions |
---|---|
Base64 functions |
|
Batch functions |
|
Credential functions |
|
File functions |
|
Job functions |
|
Math functions |
|
Pipeline functions |
|
String functions |
|
Time functions |
|
Miscellaneous functions |
|
Base64 Functions
Use Base64 functions to encode or decode information using Base64.
You can replace any argument with a literal or an expression that evaluates to the argument. String literals must be enclosed in single or double quotation marks.
The StreamSets expression language provides the following Base64 functions:
- base64:decodeBytes(<string>)
- Returns a decoded byte array from a Base64 encoded string.
- base64:decodeString(<string>, <charset>)
- Returns a decoded string from a Base64 encoded string using the specified character set.
- base64:encodeBytes(<byte array>, <urlSafe: true | false>)
- Returns a Base64 encoded string value of the specified byte array.
- base64:encodeString(<string>, <urlSafe: true | false>, <charset>)
-
Returns a Base64 encoded string value of the specified string.
Batch Functions
Use batch functions to retrieve information about a batch when writing to most destinations.
In pipelines that have an origin configured to read from more than one table, a batch attribute stores the name of the table that the origin reads for the batch. You can use batch functions to retrieve that name and use in the name of the table or directory where the destination writes data from that batch, such as in the Directory Path property of the File or ADLS destinations or in the Table property of the Hive or JDBC destinations.
You can replace any argument with a literal or an expression that evaluates to the argument. String literals must be enclosed in single or double quotation marks.
- batch:attribute(<attribute name>)
- Returns the value of the specified batch attribute. Uses the following argument:
- attribute name - Name of the batch attribute.
Return type: String
For example, the following expression returns the name of the table processed, specified in thejdbc.table
attribute:${batch:attribute("jdbc.table")}
- batch:attributeOrDefault(<attribute name>, <default value>)
- Returns the value of the specified batch attribute. When the attribute does
not exist or has no value, returns the specified default value.Uses the following arguments:
- attribute name - Name of the batch attribute.
- default value - Value to return when the batch attribute does not exist or has no value.
Return type: String when returning the batch attribute value. The data type of the default value when returning the specified default value.
For example, the following expression returns the name of the table processed or returnsNA
if no value exists:${batch:attribute("jdbc.table", 'NA')}
file:///Transformer/output/${batch:attribute("jdbc.table")}
Credential Functions
Credential functions provide access to sensitive information, such as user names and passwords, that is secured in a credential store. Use credential functions in pipeline and stage properties to enable Transformer to access external systems without exposing those values.
Before you use a credential function, you must configure Transformer to use one of the supported credential stores.
You can use credential functions in any property that displays a key icon next to the property name, as follows:
- credential:get(<cstoreId>, <userGroup>, <name>)
- Returns the secret from the credential store. Uses the following
arguments:
- cstoreId - Unique ID of the credential store to use. Use the ID specified in the $TRANSFORMER_CONF/credential-stores.properties file. For more information, see Enabling Credential Stores.
- userGroup - Group that a user must belong to in order to
access the secret. Only users that have execute permission on the pipeline and that
belong to this group can validate, preview, or run the pipeline that retrieves the
secret.
If working with Control Hub, specify the group using the required naming convention:
<group ID>@<organization ID>
.To grant access to all users, specify the defaultall
group when working only with Transformer. When working with Control Hub and Transformer version 3.14.0 or later, you can specify the default group usingall
orall@<organization ID>
. StreamSets recommends usingall
so that you do not need to modify credential functions when migrating pipelines from Transformer to Control Hub.Note: When working with Control Hub and a Transformer version earlier than 3.14.0, you must use the defaultall@<organization ID>
group. - name - Name of the secret to retrieve from the credential store. Use
the required format for the credential store:
- AWS Secrets Manager - Enter the name of the secret to
retrieve from Secrets Manager. Use the following format:
"<name><separator><key>"
, where:<name>
is the name of the secret in Secrets Manager to read.<separator>
is the separator defined in the $TRANSFORMER_CONF/credential-stores.properties file.<key>
is the key for the value that you want returned.
- Azure Key Vault - Enter the name of the key or secret to retrieve from Azure Key Vault.
- CyberArk - Enter the name of the secret to retrieve from
CyberArk. Use the following format:
"<safe><separator><folder><separator><object name>[<separator><element name>]"
<safe>
is the CyberArk safe to read.<separator>
is the separator defined in the $TRANSFORMER_CONF/credential-stores.properties file.<folder>
is the CyberArk folder to read.<object name>
is the CyberArk object or secret to read.<element name>
is an optional name for the value that you want returned.If you do not specify
<element name>
, Transformer usesContent
.
- Google Secret Manager - Enter the secret name using the
following format:
"<name><delimiter><version ID>"
<name>
is the secret name.<delimiter>
is the delimiter defined in the $TRANSFORMER_CONF/credential-stores.properties file.<version ID>
is the version of the secret to return.
- Hashicorp Vault - Enter the secret name using the following
format:
"<path><separator><key>"
<path>
is the path in Vault to read.<separator>
is the separator defined in the $TRANSFORMER_CONF/credential-stores.properties file.<key>
is the key for the value that you want returned.
- Java keystore - Enter the name of the secret added to the
Java keystore file using the
jks-cs add
command.
- AWS Secrets Manager - Enter the name of the secret to
retrieve from Secrets Manager. Use the following format:
- credential:getWithOptions(<cstoreId>, <userGroup>, <name>, <storeOptions>)
- Returns the secret from the credential store using additional options to communicate with the credential store. Not applicable for the Java keystore or Google Secret Manager credential stores.
File Functions
Use file functions to return information about a file name or path. For example, you might use a file function to remove a file extension from a file path or to return part of the path.
You can replace any argument with a literal or an expression that evaluates to the argument. String literals must be enclosed in single or double quotation marks.
- file:fileExtension(<filepath>)
- Returns the file extension from a file path. Uses the following argument:
- filepath - An absolute path to a file.
- file:fileName(<filepath>)
- Returns the file name from a file path. Uses the following argument:
- filepath - An absolute path to a file.
- file:parentPath(<filepath>)
- When used with a path to a file, returns the path to the file without the
final separator, such as
/files
for/files/file.log
. - file:pathElement(<filepath>, <integer>)
- Returns the part of a path based on the specified integer. Uses the
following arguments:
- filepath - An absolute path to a file.
- integer - The section of a path to return. Can return parts starting
from the left or right side of the path:
- To return a section of a path, counting from the left side of the path, use 0 and positive integers and start with 0.
- To return a section of a path, counting from the right side of the path, use negative integers and start with -1.
- file:removeExtension(<filepath>)
- Returns the file path without the file extension. Uses the following
argument:
- filepath - An absolute path to a file.
Job Functions
Use job functions to return information about a Control Hub job. For example, you might use a job function to return the name of the job running a pipeline.
You can replace any argument with a literal or an expression that evaluates to the argument. String literals must be enclosed in single or double quotation marks.
- job:id()
- Returns the ID of the job if the pipeline was run from a Control Hub job.
Otherwise, returns
UNDEFINED
. - job:name()
- Returns the name of the job if the pipeline was run from a Control Hub job.
Otherwise, returns
UNDEFINED
. - job:startTime()
- Returns the start time of the job if the pipeline was run from a Control Hub job. Otherwise, returns the start time of the pipeline.
- job:user()
- Returns the user who started the job if the pipeline was run from a Control
Hub job. Otherwise, returns
UNDEFINED
.
Math Functions
Use math functions to perform math on numeric values.
You can replace any argument with a literal or an expression that evaluates to the argument. String literals must be enclosed in single or double quotation marks.
- Double
- Float
- Integer
- Long
- String
- math:abs(<number>)
- Returns the absolute value, or positive version, of the argument. If the argument is already positive, returns the original number.
- math:ceil(<number>)
- Returns the smallest integer greater than or equal to the argument.
- math:floor (<number>)
- Returns the largest integer greater than or equal to the argument.
- math:max(<number1>, <number2>)
- Returns the greater of two arguments.
- math:min(<number1>, <number2>)
- Returns the lesser of two arguments.
- math:round(<number>)
- Returns the closest number to the argument, rounding up for ties.
Pipeline Functions
Use pipeline functions to determine information about a pipeline, such as the pipeline title or ID. The StreamSets expression language provides the following pipeline functions:
- pipeline:id()
- Returns the ID of the pipeline. The ID is a UUID automatically generated when the pipeline is created and is used by Transformer to identify the pipeline. The pipeline ID cannot be changed.
- pipeline:name()
- Like
pipeline:id
, this function returns the ID of the pipeline. The ID is a UUID automatically generated when the pipeline is created and is used by Transformer to identify the pipeline. The pipeline ID cannot be changed. - pipeline:startTime()
- Returns the start time of the pipeline.
Return type: Datetime.
- pipeline:title()
- Returns the title or name of the pipeline.
- pipeline:user()
- Returns the user who started the pipeline.
- pipeline:version()
- Returns the pipeline version when the pipeline has been published to StreamSets
Control Hub. Returns
UNDEFINED
if the pipeline has not been published to Control Hub. Use this function only when you have registered Transformer to work with Control Hub.
String Functions
Use string functions to transform string data.
You can replace any argument with a literal or an expression that evaluates to the argument. String literals must be enclosed in single or double quotation marks.
The StreamSets expression language provides the following string functions:
- str:concat(<string1>, <string2>)
- Concatenates two strings together.
- str:contains(<string>, <subset>)
- Returns
true
orfalse
based on whether the string contains the configured subset of characters. - str:endsWith(<string>, <subset>)
- Returns
true
orfalse
based on whether the string ends with the configured subset of characters. - str:escapeXML10(<string>)
- Returns a string that you can embed in an XML 1.0 or 1.1 document.
- str:escapeXML11(<string>)
- Returns a string that you can embed in an XML 1.1 document.
- str:indexOf(<string>, <subset>)
- Returns the index within a string of the first occurrence of the specified subset of characters.
- str:isNullOrEmpty(<string>)
- Returns
true
orfalse
based on whether a string is null or is the empty string. - str:lastIndexOf(<string>, <subset>)
- Returns the index within a string of the last occurrence of the specified subset of characters.
- str:length(<string>)
- Returns the length of a string.
- str:matches(<string>, <regEx>)
- Returns
true
orfalse
based on whether a string matches a Java regex pattern. - str:regExCapture(<string>, <regEx>, <group>)
- Parses a complex string into groups based on a Java regex pattern and returns the specified group.
- str:replace(<string>, <oldChar>, <newChar>)
- Replaces all instances of a specified character in a string with a new character.
- str:replaceAll(<string>, <regEx>, <newString>)
- Replaces a set of characters in a string with a new set of characters.
- str:split(<string>, <separator>)
- Splits a string into a list of strings based on the specified
separator.Uses the following arguments:
- string - An input string.
- separator - The set of characters that designate a string split.
- str:splitKV(<string>, <pairSeparator>, <keyValueSeparator>)
- Splits key-value pairs in a string into a map of string values.
- str:startsWith(<string>, <subset>)
- Returns
true
orfalse
based on whether the string starts with the configured subset of characters. - str:substring(<string>, <beginIndex>, <endIndex>)
- Returns a subset of the string value that starts with the beginIndex character and ends one character before the endIndex.
- str:toLower(<string>)
- Converts string data to all lowercase letters.
- str:toUpper(<string>)
- Converts string data to all capital letters.
- str:trim(<string>)
- Trims leading and trailing white space characters from a string, including spaces and return characters.
- str:truncate(<string>, <length>)
- Returns a string truncated to the specified length. Use an integer to specify the length.
- str:unescapeJava(<string>)
- Returns an unescaped string from a string with special Java characters. Use to include binary or non-printable characters in any location where you can enter an expression.
- str:unescapeXML(<string>)
- Returns an unescaped string from a string that had XML data escaped.
- str:urlDecode(<URL>, <charset>)
- Converts characters from a URL to the specified character set, such as UTF-8.
- str:urlEncode(<infoforURL>, <charset>)
- Converts invalid characters to help create a valid URL based on the specified character set, such as UTF-8. You might use this function when using record data to add additional information, like a fragment, to a URL.
Time Functions
Use time functions to return the current time or to transform datetime information.
You can replace any datetime argument with an expression that evaluates to a datetime value. You cannot replace a datetime argument with a datetime literal.
You can replace any argument with a literal or an expression that evaluates to the argument. String literals must be enclosed in single or double quotation marks.
The StreamSets expression language provides the following time functions:
- time:createDateFromStringTZ(<string>, <time zone>, <date format>)
- Creates a Date object based on a datetime in a String field and using the specified time zone. The datetime string should not include the time zone.
- time:dateTimeToMilliseconds(<Date object>)
-
Converts a Date object to an epoch or UNIX time in milliseconds.
For example, the following expression converts the current time to epoch or UNIX time in seconds, and then multiplies the value by 1000 to convert the value to milliseconds:${time:dateTimeToMilliseconds(time:now())}
Return type: Long.
- time:dateTimeZoneOffset(<Date object>, <time zone>)
-
Returns the time zone offset in milliseconds for the specified date and time zone. The time zone offset is the difference in hours and minutes from Coordinated Universal Time (UTC).
Uses the following arguments:- Date object - Date object to use.
- time zone - Time zone associated with the Date object.You can use the following time zone formats:
- <area>/<location> - For example, America/Chicago or Europe/Madrid.
- Numeric time zones with the GMT prefix, such as GMT-0500 or GMT-8:00. Note that numeric-only time zones such as -500 are not supported.
- Short time zone IDs such as EST and CST - These time zones should generally be avoided because they can stand for multiple time zones, e.g. CST stands for both Central Standard Time and China Standard Time.
- time:extractDateFromString(<string>, <format string>)
-
Extracts a Date object from a String, based on the specified date format.
Uses the following arguments:- string - String to extract the Date object from.
- format string - String that specifies the date format of the data in the <string> argument. For information about creating a date format, see https://docs.oracle.com/javase/8/docs/api/java/text/SimpleDateFormat.html.
- time:extractLongFromDate(<Date object>, <format string>)
- Extracts a long value from a Date object, based on the specified date format.
- time:extractStringFromDate(<Date object>, <format string>)
- Extracts a string value from a Date object based on the specified date format.
- time:extractStringFromDateTZ(<Date object>, <time zone>, <format string>)
- Extracts a string value from a Date object, converting the GMT time in the Date object to the specified date format and time zone. The function adjusts for daylight savings when given the time zone in the appropriate format.
- time:millisecondsToDateTime(<long>)
- Converts an epoch or UNIX time in milliseconds to a Date object.
- time:now()
- Returns the current time of the Transformer machine as a
java.util.Date
object. - time:timeZoneOffset(<time zone>)
-
Returns the time zone offset in milliseconds for the specified time zone. The time zone offset is the difference in hours and minutes from Coordinated Universal Time (UTC).
Uses the following argument:- time zone - Time zone to use.You can use the following time zone formats:
- <area>/<location> - For example, America/Chicago or Europe/Madrid.
- Numeric time zones with the GMT prefix, such as GMT-0500 or GMT-8:00. Note that numeric-only time zones such as -500 are not supported.
- Short time zone IDs such as EST and CST - These time zones should generally be avoided because they can stand for multiple time zones, e.g. CST stands for both Central Standard Time and China Standard Time.
Return type: Long.
- time zone - Time zone to use.
- time:trimDate(<datetime>)
- Trims the date portion of a datetime value by setting the date portion to January 1, 1970.
- time:trimTime(<datetime>)
- Trims the time portion of a datetime value by setting the time portion to
00:00:00
.
Miscellaneous Functions
You can replace any argument with a literal or an expression that evaluates to the argument. String literals must be enclosed in single or double quotation marks.
The StreamSets expression language provides the following miscellaneous functions:
- runtime:availableProcessors()
-
Returns the number of processors available to the Java virtual machine. You can use this function when you want to configure multithreaded processing based on the number of processors available to Transformer.
Return type: Integer.
- runtime:conf(<runtime property name>)
- Returns the value for the specified runtime property. Use to call a runtime property.
- runtime:loadResource(<file name>, <restricted: true | false>)
- Returns the value in the specified file, trimming any leading or trailing whitespace characters from the file. Use to call a runtime resource.
- runtime:loadResourceRaw(<file name>, <restricted: true | false>)
- Returns the entire contents in the specified file, including any leading or trailing whitespace characters in the file. Use to call a runtime resource.
- sdc:hostname()
- Returns the host name of the Transformer machine.
- sdc:id()
- Returns the Transformer ID.
For a pipeline that runs in standalone execution mode, the ID is a unique identifier associated with the Transformer, such as
58efbb7c-faf4-4d8e-a056-f38667e325d0
. The ID is stored in the following file: $TRANSFORMER_DATA/transformer.id.For a pipeline that runs in cluster mode, the ID is the Transformer worker partition ID generated by a cluster application, such as Spark or MapReduce.
- size()
- Returns the size of a map.
- uuid:uuid()
- Returns a randomly generated UUID.