Customization with Environment Variables

Transformer includes several environment variables that you can modify to customize the following areas:

Modifying Environment Variables

The method that you use to modify environment variables depends on the Transformer installation type:
Tarball installation started manually from the command line
When you start Transformer manually from the command line on any operating system, edit the $TRANSFORMER_DIST/libexec/transformer-env.sh file to modify environment variables.

Use a text editor to edit the transformer-env.sh file. Some of the environment variables in the file are commented out and do not reflect the default values. Be sure to uncomment the line when you change a variable value.

After you edit the file, restart Transformer from the command prompt to enable the changes.

Note: Do not restart Transformer from the user interface after modifying environment variables.
RPM installation started as a service on operating systems that use the SysV init system
When you start Transformer as a service on CentOS 6, Oracle Linux 6, or Red Hat Enterprise Linux 6, edit the $TRANSFORMER_DIST/libexec/transformerd-env.sh file to modify environment variables.

Use a text editor to edit the transformerd-env.sh file.

After you edit the file, restart Transformer to enable the changes.

RPM installation started as a service on operating systems that use the systemd init system
When you start Transformer as a service on CentOS 7, Oracle Linux 7, or Red Hat Enterprise Linux 7, edit the /usr/lib/systemd/system/transformer.service file to modify environment variables.
Override the default values in the transformer.service file using the same procedure that you use to override unit configuration files on a systemd init system. For an example, see "Example 2. Overriding vendor settings" in this systemd.unit manpage.
After overriding the default values, use the following command to reload the systemd manager configuration:
systemctl daemon-reload

Then restart Transformer to enable the changes.

Transformer Directories

Transformer includes environment variables that define the directories used to store files used by Transformer, such as configuration files, log files, and runtime resources.

Note: StreamSets does not recommend using NFS or NAS to store Transformer files.

The TRANSFORMER_DIST environment variable defines the Transformer runtime directory. The runtime directory is the base Transformer directory that stores the executables and related files. This environment variable is set during installation.

When you start Transformer manually, the default values of the remaining directory variables are relative to the $TRANSFORMER_DIST runtime directory. When you start Transformer as a service, the default values of the remaining directory variables are absolute paths that are outside of the $TRANSFORMER_DIST runtime directory.

Modify environment variables using the method required by your installation type.

You can configure the following environment variables that define directories:

Environment Variable Description
TRANSFORMER_CONF

Defines the configuration directory for the Transformer configuration file, transformer.properties, and related realm properties files and keystore files. Also includes the logj4 properties file.

Default directories:

  • Manual start: $TRANSFORMER_DIST/etc
  • Service start: /etc/transformer
TRANSFORMER_DATA

Defines the data directory for pipeline configuration and run details.

Default directories:

  • Manual start: $TRANSFORMER_DIST/data
  • Service start: /var/lib/transformer
TRANSFORMER_LOG

Defines the log directory.

Default directories:

  • Manual start: $TRANSFORMER_DIST/log
  • Service start: /var/log/transformer
TRANSFORMER_EXTERNAL_RESOURCES
Defines an optional external resources directory. By default, this directory contains the following directories:
  • TRANSFORMER_RESOURCES
  • STREAMSETS_LIBRARIES_EXTRA_DIR

To define this directory, you must add the environment variable to the appropriate file. Set the variable to a directory outside of the $TRANSFORMER_DIST runtime directory.

Default directory: $TRANSFORMER_DIST/externalResources

TRANSFORMER_RESOURCES Defines the directory for runtime resource files.

To configure this environment variable, you must uncomment the variable in the appropriate file. Set the variable to a directory outside of the $TRANSFORMER_DIST runtime directory.

Default directories:

  • Manual start for tarball installations:

    $TRANSFORMER_EXTERNAL_RESOURCES/resources

    This resolves to the following directory unless you define the

    TRANSFORMER_EXTERNAL_RESOURCES environment variable:

    $TRANSFORMER_DIST/externalResources/resources

  • Service start: /var/lib/transformer-resources
STREAMSETS_LIBRARIES_EXTRA_DIR Defines the directory for external libraries.

To configure this environment variable, you must add the environment variable to the appropriate file. Set the variable to a directory outside of the $TRANSFORMER_DIST runtime directory.

Default directory:

$TRANSFORMER_EXTERNAL_RESOURCES/streamsets-libs-extras

This resolves to the following directory unless you define the TRANSFORMER_EXTERNAL_RESOURCES environment variable:

$TRANSFORMER_DIST/externalResources/streamsets-libs-extras

User and Group for Service Start

When you run Transformer as a service, Transformer runs as the system user account and group defined in environment variables. The default system user and group are named transformer.

You can modify the values of the environment variables to point to another system user or group. Modify environment variables using the method required by your installation type.

If you change the system user, you must make the new system user the owner of all Transformer directories:
  • $TRANSFORMER_DIST
  • $TRANSFORMER_CONF
  • $TRANSFORMER_DATA
  • $TRANSFORMER_LOG
  • $TRANSFORMER_RESOURCES
For example, if you change the system user and group to myuser, use the following command to change the owner of the configuration directory, $TRANSFORMER_CONF, and all files in the directory to myuser:myuser:
chown -R myuser:myuser /etc/transformer
Note: When you run Transformer manually, Transformer runs as the system user account logged into the command prompt when the launch command is run. To run as another user account, see Starting Transformer Manually.

Java Configuration Options

You define the Java configuration options in the TRANSFORMER_JAVA_OPTS environment variable.

When defining Java configuration options, avoid defining duplicate options. If you do define duplicates, the last option passed to the JVM usually takes precedence.

Java Heap Size

Modify the Transformer Java heap size as necessary, based on the resources available on the host machine. By default, the Java heap size is 1024 MB.

The Java heap size determines the heap size allocated to Transformer and affects the amount of memory Transformer can use when it runs a pipeline. For example, with a heap size of 2048 MB, you can configure a pipeline to use up to 65% - that's 1331 MB of memory.

Use the following Java options to define the Java heap size:
  • Xmx - Defines the maximum heap size.
  • Xms - Defines the minimum heap size.
Tip: To avoid constant recalculation of the allocated heap size, set both the minimum and maximum properties to the same value. To define the unit of measure, use m for MB and g for GB.
Define the heap size based on your installation:
Tarball or RPM installation

Define the heap size in the TRANSFORMER_JAVA_OPTS environment variable.

For example, to double the heap size, increase the Xmx and Xms settings as follows:

export TRANSFORMER_JAVA_OPTS="${TRANSFORMER_JAVA_OPTS} -Xmx2048m -Xms2048m -server"
Modify environment variables using the method required by your installation type.
Consider the following guidelines when you define the heap size:
  • By default, Java 8 and Java 11 enable the UseCompressedOops option, which allows a maximum of 32 GB of heap size regardless of the configured size. To allocate more than 32 GB, disable the option by adding the following Java option:

    -XX:-UseCompressedOops

  • In the pipeline properties, you can use the jvm:maxMemoryMB() function to help define the percentage of the heap size the pipeline uses.

Remote Debugging

You can enable remote debugging to debug a Transformer instance running on a remote machine.

Enable remote debugging based on your installation:
Tarball or RPM installation

Define debugging options in the TRANSFORMER_JAVA_OPTS environment variable.

Add the following debugging options to the environment variable, where port_number is an open port number on the remote machine running Transformer:
-Xdebug -Xrunjdwp:server=y,transport=dt_socket,address=<port_number>,suspend=n
For example, to debug Transformer on a remote machine using port number 2005, define TRANSFORMER_JAVA_OPTS as follows:
export TRANSFORMER_JAVA_OPTS="${TRANSFORMER_JAVA_OPTS} -Xdebug -Xrunjdwp:server=y,transport=dt_socket,address=2005,suspend=n"
Modify environment variables using the method required by your installation type.

Garbage Collector

You can define the Java garbage collector that Transformer uses. The default garbage collector depends on the Java version installed on the Transformer machine:
  • Java 8 - Default is the Concurrent Mark Sweep (CMS) garbage collector.
  • Java 11 or later - Default is the G1 garbage collector.

If you define another garbage collector, test and evaluate Transformer performance before making the same change in a production environment. Garbage collector performance depends on each particular use case.

Define the garbage collector based on your installation:
Tarball or RPM installation
Define the garbage collector in the TRANSFORMER_JAVA_OPTS environment variable.

For example, the default garbage collector is defined as follows:

export TRANSFORMER_JAVA_OPTS=${TRANSFORMER_JAVA_OPTS:-"-XX:+UseConcMarkSweepGC -XX:+UseParNewGC"}

To use the G1 garbage collector, set the option as follows:

export TRANSFORMER_JAVA_OPTS=${TRANSFORMER_JAVA_OPTS:-"-XX:+UseG1GC"}
Modify environment variables using the method required by your installation type.

Logging

Transformer enables garbage collector logging by default to facilitate troubleshooting. Log files are written to $TRANSFORMER_LOG/gc.log. You can disable logging.

Disable garbage collector logging based on your installation:

Tarball or RPM installation
Set the TRANSFORMER_GC_LOGGING environment variable to false. For example:
export TRANSFORMER_GC_LOGGING=false
Modify environment variables using the method required by your installation type.

Heap Dump Creation

By default, when Transformer encounters an out of memory error (OOME), it creates a heap dump.

By default, heap dump files are written to the file defined in the TRANSFORMER_LOG environment variable and use a naming convention that allows generating multiple heap dump files, as follows: $TRANSFORMER_LOG/transformer_heapdump_${timestamp}.hprof.

You can change the name of the heap dump files, but we recommend using the ${timestamp} or similar variable to ensure that the heap dump name is unique.

Note that Java Virtual Machine, and therefore Transformer, does not overwrite existing heap dump files. For example, if you use $TRANSFORMER_LOG/transformer_heapdump.hprof as the file name, after Transformer creates the first heap dump file, it will not create another until you remove the existing file.

Note: Depending on the number and size of the generated heap dump files, you might want to increase the Transformer Java heap size.
You can configure the following heap dump environment variables:
Heap Dump Environment Variable Description
TRANSFORMER_HEAPDUMP_ON_OOM Specifies whether Transformer generates a heap dump upon encountering an out of memory error.

Default is true.

TRANSFORMER_HEAPDUMP_PATH Specifies the file name and location to use for heap dump files.

By default, heap dumps are written to $TRANSFORMER_LOG/transformer_heapdump_${timestamp}.hprof.

To specify a different file name or location, uncomment the property and enter the location and file name to use.

Tip: To write multiple heap dump files to a directory, use a function or variable to ensure that the file name is unique. If a file of the same name exists in the directory, Transformer does not create a new heap dump file.

Modify environment variables using the method required by your installation type.

Security Manager

Transformer includes a Java Security Manager that is enabled by default. For enhanced security, you can enable the Transformer Security Manager which prevents stages from accessing files in protected Transformer directories.

Transformer can use one of the following security managers:
Java Security Manager

By default, Transformer uses the Java Security Manager. The Java Security Manager restricts the runtime permissions of user libraries. This allows administrators to control user libraries actions on production systems. For example, by default, user libraries cannot call out to network resources and potentially cause denial-of-service (DDoS) attacks.

The security policy is defined in the $TRANSFORMER_CONF/transformer-security.policy file. The file syntax is java standard.

Transformer Security Manager
For enhanced security, enable the Transformer Security Manager. The Transformer Security Manager prevents stages from accessing files in protected Transformer directories, regardless of how the $TRANSFORMER_CONF/transformer-security.policy file is defined.
To enable the Transformer Security Manager, uncomment the security_manager.transformer_manager.enable property in the Transformer configuration file, $TRANSFORMER_CONF/transformer.properties.
Note: If you use an older JVM version, the Transformer Security Manager might encounter some JVM known issues.

If needed, you can configure Transformer to use neither security manager by setting the TRANSFORMER_SECURITY_MANAGER_ENABLED environment variable to false.

Modify environment variables using the method required by your installation type.

Protected Directories

When the Transformer Security Manager is enabled, the following Transformer directories are protected directories:
  • $TRANSFORMER_CONF - Stages cannot access files in the configuration directory.
  • $TRANSFORMER_DATA - Stages cannot access files in the data directory.
  • $TRANSFORMER_EXTERNAL_RESOURCES - Stages can read files in the resources directory, but cannot write to files in the directory.
  • $TRANSFORMER_RESOURCES - Stages can read files in the resources directory, but cannot write to files in the directory.

If needed, you can allow stages to access specific files in these protected directories by modifying Transformer Security Manager exception properties in the $TRANSFORMER_CONF/transformer-security.policy file. However, use caution when configuring exceptions to these protected directories.

You can configure exceptions to protected directories as follows:
Exceptions for all stage libraries
To allow all stage libraries access to files in protected directories, modify the security_manager.transformer_dirs.exceptions property to define files that can be accessed.
Exceptions for specific stage libraries
To allow a specific stage library access to files in protected directories, add the following property and then define the files that the stage library can access:
security_manager.transformer_dirs.exceptions.<stage_library_name>=<file_path>
For example, the default Transformer configuration file includes an exception for the Java keystore credential store stage library defined as follows:
security_manager.transformer_dirs.exceptions.lib.streamsets-transformer-jks-credentialstore-lib=$TRANSFORMER_CONF/jks-credentialStore.pkcs12

When you configure a Security Manager exception property, use the appropriate directory environment variable in the file path: $TRANSFORMER_CONF, $TRANSFORMER_DATA, or $TRANSFORMER_RESOURCES. You can enter multiple file paths separated by commas.

Root Classloader

You can edit the TRANSFORMER_ROOT_CLASSPATH environment variable to define the path to JAR files to be added to the Transformer root classloader.

Use the variable for components that must be in the root classloader, such as Snappy. Default is $TRANSFORMER_DIST/root-lib/'*'.

Modify environment variables using the method required by your installation type.