In miscellaneous functions, you can
replace any argument with a literal or an expression that evaluates to the argument. String
literals must be enclosed in single or double quotation marks.
Some functions may not be valid in Data Collector Edge pipelines.
The expression language provides the following miscellaneous functions:
- alert:info()
- Returns information about the trigger for a data drift alert. Use only in alert text for
data drift alerts.
- avro:decode(<schema>, <byte array>)
-
Returns an Avro record by using the specified schema to decode the specified byte
array. You can use this function in the Kafka Message Key property when the Kafka stage
processes Avro data.
Uses the following arguments:
- schema - Avro schema to use to decode the specified byte array.
- byte array - Byte array to decode.
- emptyList()
- Creates an empty list.
- emptyMap()
- Creates an empty map.
- every(<interval>, < hh() | mm() | ss() >)
- Represents the interval of hours, minutes, or seconds for generating output directories
for the Hadoop FS, Local FS, or MapR FS destination.
- When used, a destination generates output directories for the specified interval
beginning on the hour. For example, when generating directories every 30 minutes, it
generates a directory on the hour and on the half-hour.
- You can use the function once in the Directory Template property to replace the hour,
minute, or second datetime variables.
- Use the function to replace the smallest time interval in the directory template.
-
Note: Destinations generate a directory for the smallest unit of measure by default, so do
not use the
every
function to generate a directory every hour, minute,
or second. For more information, see
Directory Templates.
- Uses the following arguments:
- interval - An integer factor or submultiple of 60 that represents the interval of
minutes or seconds to wait between directory generation. Use one of the following
values: 1, 2, 3, 4, 5, 6, 10, 12, 15, 20, or 30.
- < hh() | mm() | ss() > - Use
hh()
for hours,
mm()
for minutes, and ss()
for seconds.
- For example, the following Directory Template generates a new directory every twelve
hours, beginning on the
hour:
/outputfiles/${YY()}-${MM()}-${DD()}-${every(12,hh())}
- The following Directory Template generates a new directory every fifteen minutes,
beginning on the
hour:
/outputfiles/${YY()}-${MM()}-${DD()}-${hh()}-${every(15,mm())}
- The following Directory Template generates a new directory every 30 seconds, beginning
on the
hour:
/outputfiles/${YY()}-${MM()}-${DD()}-${hh()}-${mm()}-${every(30,ss())}
- field:field()
- Returns the field name. Available only in the Decimal field expression properties of the
Hive Metadata processor.
- Return type: String.
- isEmptyList()
- Returns true or false based on whether a list is empty.
- Return type: Boolean.
- isEmptyMap()
- Returns true or false based on whether a map is empty.
- Return type: Boolean.
- jvm:maxMemoryMB()
- Returns the Java heap size allocated to the Data Collector in MB.
You can use this function in an expression to specify the maximum amount of memory a
pipeline can use.
- For example, since 65% of the Java heap size is the recommended maximum, the following
expression is the default memory limit for a pipeline:
${jvm:maxMemoryMB() * 0.65}
- length()
- Returns the length of a list.
- Return type: Integer.
- list:join(<list field>, <separator>)
- Merges elements in a List field into a String field, using the specified separator
between elements.
- Uses the following arguments:
- list field - The List field that you want to merge.
- separator - The string to use to separate the elements in the merged field.
- For example, to merge the list in a colors field using a semicolon as a separator
character, you can use the following
expression:
${list:join(record:value('/colors'), ";")}
- And if the list field includes "red", "blue", and "yellow" as elements, the expression
produces the following string data:
red;blue;yellow
- Return type: String.
- list:joinSkipNulls(<list field>, <separator>)
- Merges elements in a List field into a String field, using the specified separator
between elements and skipping null values.
- Uses the following arguments:
- list field - The List field that you want to merge.
- separator - The string to use to separate the elements in the merged field.
- For example, say you use the following expression to merge the list in the colors
field:
${list:join(record:value('/colors'), ";")}
- And if the list field includes "red", "blue, null, "yellow", the expression ignores the
null value and produces the following string
data:
red;blue;yellow
- Return type: String.
- offset:column(<position>)
- Returns the value of the positioned offset column for the current table. Available only
in the additional offset column conditions of the JDBC Multitable Consumer origin.
- Uses the following argument:
- position - Position of the offset column. For example, enter 0 for the first offset
column defined in the table configuration. Enter 1 for the second defined offset
column.
- Return type: String.
- runtime:availableProcessors()
-
Returns the number of processors available to the Java virtual machine. You can use
this function when you want to configure multithreaded processing based on the number of
processors available to Data Collector.
Return type: Integer.
- runtime:conf(<runtime property>)
- Returns the value for the specified runtime configuration. Use to call a runtime
property.
- Uses the following argument:
- runtime configuration name - Name of the configuration property to use. The property
must defined in the Data Collector
configuration fileconfiguration properties or in a runtime configuration file specified in the
sdc.properties file.
- For more information, see Using Runtime Properties.
- runtime:loadResource(<file name>, <restricted: true | false>)
- Returns the value in the specified file, trimming any leading or trailing whitespace
characters from the file. Use to call a runtime resource.
- Uses the following arguments:
- file name - Name of the file that contains the information to be loaded. The file
must reside in the
$SDC_RESOURCES
directory:
- restricted - Whether the file has restricted permissions. If set to true, the file
must be owned by the system user who runs the Data Collector and
read and writable only by the owner.
- For example, the following expression returns the contents of the restricted
JDBCpassword.txt file, trimming any leading or trailing whitespace
characters:
${runtime:loadResource("JDBCpassword.txt", true)}
- For more information about runtime resources, see Using Runtime Resources.
- runtime:loadResourceRaw(<file name>, <restricted: true | false>)
- Returns the entire contents in the specified file, including any leading or trailing
whitespace characters in the file. Use to call a runtime resource.
- Uses the following arguments:
- file name - Name of the file that contains the information to be loaded. The file
must reside in the
$SDC_RESOURCES
directory.
- restricted - Whether the file has restricted permissions. If set to true, the file
must be owned by the system user who runs the Data Collector and
read and writable only by the owner.
- For example, the following expression returns the entire contents of the restricted
JDBCpassword.txt file, including any leading or trailing whitespace
characters:
${runtime:loadResourceRaw("JDBCpassword.txt", true)}
- For more information about runtime resources, see Using Runtime Resources.
- runtime:resourcesDirPath()
- Returns the full path to the directory for runtime resource files.
- For example, when configuring a stage to use SSL/TLS encryption, use the following
expression to define the name and location of the keystore file stored in the
$SDC_RESOURCES
Data Collector
resources directory:
${runtime:resourcesDirPath()}/keystore.jks
- Return type: String.
- sdc:hostname()
- Returns the host name of the Data Collector or Data Collector Edge machine.
- For example, you might use the function in the directory template for the Hadoop FS
destination to write to a directory that includes the Data Collector host
name.
- Return type: String.
- sdc:id()
- Returns the Data Collector ID.
For a pipeline that runs in standalone execution mode, the ID is a unique identifier
associated with the Data Collector, such
as 58efbb7c-faf4-4d8e-a056-f38667e325d0. The ID is stored in the following file:
$SDC_DATA/sdc.id.
For a pipeline that runs in
cluster mode, the ID is the Data Collector
worker partition ID generated by a cluster application, such as Spark or
MapReduce.
- size()
- Returns the size of a map.
- Return type: Integer.
- uuid:uuid()
- Returns a randomly generated UUID. For example, you might use the function in an
Expression Evaluator processor to generate a UUID for an ID field added to each
record.
- The uuid:uuid() function uses a lot of entropy on Linux systems and can cause your
entropy pools to run dry. When this happens, your pipelines slow to a halt but continue to
run. Throughput effectively goes to zero while the system waits for entropy to again
become available. As a best practice, we recommend running the haveged daemon on any Data Collector node
where the uuid:uuid() function is used. The haveged daemon regenerates your entropy pools.
- Return type: String.
- vault:read(<path>, <key>)
- Returns the value for the key on the specified path. You can use the function in username, password, and similar properties such as
AWS access key IDs and secret access keys. You can also use the function in
HTTP headers and bodies when using HTTPS.
Important: This function is now deprecated and will be removed in a future
release. We recommend using the
credential functions
available with the Vault
credential storecredential store integration in
pipelines that include JDBC stages.
For more information, see Credential Stores in the Data Collector
documentation.
- Return type: String.
- Uses the following arguments:
- path - The path in Vault to read.
- key - The key for the value that you want returned.
- vault:readWithDelay(<path>, <key>, <delay>)
- Returns the value for the key on the specified path after waiting the specified amount
of time. Use when you want a delayed response to allow time for external processing.
Important: This function is now deprecated and will be removed in a future
release. We recommend using the
credential functions
available with the Vault
credential storecredential store integration in
pipelines that include JDBC stages.
For more information, see Credential Stores in the Data Collector
documentation. You can use the function in username, password, and similar properties such as
AWS access key IDs and secret access keys. You can also use the function in
HTTP headers and bodies when using HTTPS.
- Return type: String.
- Uses the following arguments:
- path - The path in Vault to read.
- key - The key for the value that you want returned.
- delay - Milliseconds to wait before returning the value.