Customization with Environment Variables
- Transformer directories
- User and group used to start Transformer as a service
- Java configuration options, including the Java heap size, remote debugging, and garbage collection
- Security Manager that restricts the runtime permissions of user libraries
- Path to JAR files to be added to the root classloader
- Heap dump creation and file location
- Directory for external libraries
Modifying Environment Variables
- Tarball installation started manually from the command line
- When you start Transformer manually from the command line on any operating system, edit the
$TRANSFORMER_DIST/libexec/transformer-env.sh
file to modify environment variables.
- RPM installation started as a service on operating systems that use the SysV init system
- When you start Transformer as a service on CentOS 6, Oracle Linux 6, or Red Hat Enterprise Linux 6, edit the
$TRANSFORMER_DIST/libexec/transformerd-env.sh
file to modify environment variables. - RPM installation started as a service on operating systems that use the systemd init system
- When you start Transformer as a service on CentOS 7, Oracle Linux 7, or Red Hat Enterprise Linux 7, edit the
/usr/lib/systemd/system/transformer.service
file to modify environment variables.
Transformer Directories
Transformer includes environment variables that define the directories used to store files used by Transformer, such as configuration files, log files, and runtime resources.
The TRANSFORMER_DIST environment variable defines the Transformer runtime directory. The runtime directory is the base Transformer directory that stores the executables and related files. This environment variable is set during installation.
When you start Transformer
manually, the default values of the remaining directory variables are relative to the
$TRANSFORMER_DIST
runtime directory. When you start Transformer as a
service, the default values of the remaining directory variables are absolute paths that
are outside of the $TRANSFORMER_DIST
runtime directory.
Modify environment variables using the method required by your installation type.
You can configure the following environment variables that define directories:
Environment Variable | Description |
---|---|
TRANSFORMER_CONF |
Defines the configuration directory for the Transformer configuration file, Default directories:
|
TRANSFORMER_DATA |
Defines the data directory for pipeline configuration and run details. Default directories:
|
TRANSFORMER_LOG |
Defines the log directory. Default directories:
|
TRANSFORMER_EXTERNAL_RESOURCES |
Defines an optional external resources directory. By default, this
directory contains the following directories:
To define this directory, you must add the environment variable to
the appropriate file. Set the variable to a directory outside of the
Default directory:
|
TRANSFORMER_RESOURCES | Defines the directory for runtime resource files. To configure
this environment variable, you must uncomment the variable in the
appropriate file. Set the variable to a directory outside of the
Default directories:
|
STREAMSETS_LIBRARIES_EXTRA_DIR | Defines the directory for external
libraries. To configure this environment variable, you
must add the environment variable to the appropriate file. Set the
variable to a directory outside of the
Default directory:
This resolves to the following directory unless you define the TRANSFORMER_EXTERNAL_RESOURCES environment variable:
|
User and Group for Service Start
When you run Transformer as a
service, Transformer
runs as the system user account and group defined in environment variables. The default
system user and group are named transformer
.
You can modify the values of the environment variables to point to another system user or group. Modify environment variables using the method required by your installation type.
$TRANSFORMER_DIST
$TRANSFORMER_CONF
$TRANSFORMER_DATA
$TRANSFORMER_LOG
$TRANSFORMER_RESOURCES
myuser
, use the
following command to change the owner of the configuration directory,
$TRANSFORMER_CONF
, and all files in the directory to
myuser:myuser
:chown -R myuser:myuser /etc/transformer
Java Configuration Options
You define the Java configuration options in the TRANSFORMER_JAVA_OPTS environment variable.
When defining Java configuration options, avoid defining duplicate options. If you do define duplicates, the last option passed to the JVM usually takes precedence.
Java Heap Size
Modify the Transformer Java heap size as necessary, based on the resources available on the host machine. By default, the Java heap size is 1024 MB.
The Java heap size determines the heap size allocated to Transformer and affects the amount of memory Transformer can use when it runs a pipeline. For example, with a heap size of 2048 MB, you can configure a pipeline to use up to 65% - that's 1331 MB of memory.
- Xmx - Defines the maximum heap size.
- Xms - Defines the minimum heap size.
- Tarball or RPM installation
-
Define the heap size in the TRANSFORMER_JAVA_OPTS environment variable.
For example, to double the heap size, increase the Xmx and Xms settings as follows:
export TRANSFORMER_JAVA_OPTS="${TRANSFORMER_JAVA_OPTS} -Xmx2048m -Xms2048m -server"
-
By default, Java 8 and Java 11 enable the
UseCompressedOops
option, which allows a maximum of 32 GB of heap size regardless of the configured size. To allocate more than 32 GB, disable the option by adding the following Java option:-XX:-UseCompressedOops
- In the pipeline properties, you can use the
jvm:maxMemoryMB()
function to help define the percentage of the heap size the pipeline uses.
Remote Debugging
You can enable remote debugging to debug a Transformer instance running on a remote machine.
- Tarball or RPM installation
-
Define debugging options in the TRANSFORMER_JAVA_OPTS environment variable.
Garbage Collector
- Java 8 - Default is the Concurrent Mark Sweep (CMS) garbage collector.
- Java 11 or later - Default is the G1 garbage collector.
If you define another garbage collector, test and evaluate Transformer performance before making the same change in a production environment. Garbage collector performance depends on each particular use case.
- Tarball or RPM installation
- Define the garbage collector in the TRANSFORMER_JAVA_OPTS environment variable.
Logging
Transformer
enables garbage collector logging by default to facilitate troubleshooting. Log
files are written to $TRANSFORMER_LOG/gc.log
. You can disable
logging.
Disable garbage collector logging based on your installation:
- Tarball or RPM installation
- Set the TRANSFORMER_GC_LOGGING environment variable to false. For example:
Heap Dump Creation
By default, when Transformer encounters an out of memory error (OOME), it creates a heap dump.
By default, heap dump files are written to the file defined in the TRANSFORMER_LOG
environment variable and use a naming convention that allows generating multiple heap
dump files, as follows:
$TRANSFORMER_LOG/transformer_heapdump_${timestamp}.hprof
.
You can change the name of the heap dump files, but we recommend using the
${timestamp}
or similar variable to ensure that the heap dump name
is unique.
Note that Java Virtual Machine, and therefore Transformer, does not overwrite existing heap dump files. For example, if you use
$TRANSFORMER_LOG/transformer_heapdump.hprof
as the file name, after
Transformer creates the first heap dump file, it will not create another until you remove the
existing file.
Heap Dump Environment Variable | Description |
---|---|
TRANSFORMER_HEAPDUMP_ON_OOM | Specifies whether Transformer generates a heap dump upon encountering an out of memory error.
Default is true. |
TRANSFORMER_HEAPDUMP_PATH | Specifies the file name and location to use for heap dump files.
By default, heap dumps are written to
To specify a different file name or location, uncomment the property and enter the location and file name to use. Tip: To write multiple heap dump files to a
directory, use a function or variable to ensure that the
file name is unique. If a file of the same name exists in
the directory, Transformer does not create a new heap dump file.
|
Modify environment variables using the method required by your installation type.
Security Manager
Transformer includes a Java Security Manager that is enabled by default. For enhanced security, you can enable the Transformer Security Manager which prevents stages from accessing files in protected Transformer directories.
- Java Security Manager
-
By default, Transformer uses the Java Security Manager. The Java Security Manager restricts the runtime permissions of user libraries. This allows administrators to control user libraries actions on production systems. For example, by default, user libraries cannot call out to network resources and potentially cause denial-of-service (DDoS) attacks.
The security policy is defined in the
$TRANSFORMER_CONF/transformer-security.policy
file. The file syntax is java standard. - Transformer Security Manager
- For enhanced security, enable the Transformer Security Manager. The Transformer Security Manager prevents stages from accessing files in protected
Transformer directories, regardless of how the
$TRANSFORMER_CONF/transformer-security.policy
file is defined.
If needed, you can configure Transformer to use neither security manager by setting the TRANSFORMER_SECURITY_MANAGER_ENABLED environment variable to false.
Modify environment variables using the method required by your installation type.
Protected Directories
$TRANSFORMER_CONF
- Stages cannot access files in the configuration directory.$TRANSFORMER_DATA
- Stages cannot access files in the data directory.$TRANSFORMER_EXTERNAL_RESOURCES
- Stages can read files in the resources directory, but cannot write to files in the directory.$TRANSFORMER_RESOURCES
- Stages can read files in the resources directory, but cannot write to files in the directory.
If needed, you can allow stages to access specific files in these protected directories
by modifying Transformer
Security Manager exception properties in the $TRANSFORMER_CONF/transformer-security.policy
file. However, use caution when configuring exceptions to these protected directories.
- Exceptions for all stage libraries
- To allow all stage libraries access to files in protected directories,
modify the
security_manager.transformer_dirs.exceptions
property to define files that can be accessed. - Exceptions for specific stage libraries
- To allow a specific stage library access to files in protected directories,
add the following property and then define the files that the stage library
can
access:
security_manager.transformer_dirs.exceptions.<stage_library_name>=<file_path>
When you configure a Security Manager exception property, use the appropriate directory
environment variable in the file path: $TRANSFORMER_CONF
,
$TRANSFORMER_DATA
, or $TRANSFORMER_RESOURCES
. You
can enter multiple file paths separated by commas.
Root Classloader
You can edit the TRANSFORMER_ROOT_CLASSPATH environment variable to define the path to JAR files to be added to the Transformer root classloader.
Use the variable for components that must be in the root classloader, such as Snappy.
Default is $TRANSFORMER_DIST/root-lib/'*'
.
Modify environment variables using the method required by your installation type.