Java and Security Configuration

Transformer includes several advanced properties that you can modify to customize the following areas:

Java Configuration Options

You define the Java configuration options in the deployment.

In Control Hub, edit the deployment. In the Configure Engine section, click Advanced Configuration. Then, click Java Configuration.

When defining Java configuration options, avoid defining duplicate options. If you do define duplicates, the last option passed to the JVM usually takes precedence.

Java Heap Size

Modify the Transformer Java heap size as necessary, based on the resources available. By default, Transformer uses 50 percent of the available memory as the Java heap size. In most cases, the default percentage value is sufficient.

The Java heap size determines the heap size allocated to Transformer and affects the amount of memory Transformer can use when it runs a pipeline. For example, with a heap size of 2048 MB, you can configure a pipeline to use up to 65% - that's 1331 MB of memory.

To modify the Java heap size, first select the JVM memory strategy to use:
  • Percentage - For a tarball installation, allocates a percentage of the available memory on the host machine as the Java heap size. For a Docker image installation, allocates a percentage of the available memory in the Docker container as the Java heap size.
  • Absolute - Allocates an absolute number in megabytes as the Java heap size.

Based on your selection, configure the minimum and maximum Java heap size in percentage or in an absolute value in megabytes. To avoid constant recalculation of the allocated heap size, set both the minimum and maximum properties to the same value.

Consider the following guidelines when you define the heap size:
  • Transformer requires a heap size of at least 1024 MB to run. As a result, the engine always uses a minimum of 1024 MB for the heap size, regardless of the configured size.
  • By default, Java 8 and Java 11 enable the UseCompressedOops option, which allows a maximum of 32 GB of heap size regardless of the configured size. To allocate more than 32 GB, disable the option by adding the following Java option to the Java Options property in the Java configuration properties:

    -XX:-UseCompressedOops

  • In the pipeline properties, you can use the jvm:maxMemoryMB() function to help define the percentage of the heap size the pipeline uses.

Remote Debugging

You can enable remote debugging to debug a Transformer instance running on a remote machine.

Enable remote debugging by modifying the Java Options property in the Java configuration properties. Add the following debugging options to the property, where port_number is an open port number on the remote machine running Transformer:

-Xdebug -Xrunjdwp:server=y,transport=dt_socket,address=<port_number>,suspend=n

For example, to debug Transformer on a remote machine using port number 2005, define the Java options as follows:

-Xdebug -Xrunjdwp:server=y,transport=dt_socket,address=2005,suspend=n

Garbage Collector

You can define the Java garbage collector that Transformer uses. The default garbage collector depends on the Java version installed on the Transformer machine:
  • Java 8 - Default is the Concurrent Mark Sweep (CMS) garbage collector.
  • Java 11 or later - Default is the G1 garbage collector.

Define the garbage collector by modifying the Java Options property in the Java configuration properties. If you define another garbage collector, test and evaluate Transformer performance before making the same change in a production environment. Garbage collector performance depends on each particular use case.

To enable the G1 garbage collector when Transformer uses Java 8, specify the UseConcMarkSweepGC, UseParNewGC, and UseG1GC options. For example, add all of the following options to the property:

-XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+UseG1GC

Java Version

When you create an Azure VM deployment or edit a deactivated Azure VM deployment, you can define the Java version that Control Hub sets up on the provisioned Azure VM instance. You cannot define the Java version while a deployment is active.

In most cases, you can use the default Java version. Some stage libraries and use cases require specific Java versions, as described in Scala, Spark, and Java JDK Requirements.

To define the Java version, select a version from the Java Version property.
Note: Deployments for Transformer support selecting the Java version required for the Scala version associated with the Transformer engine version.

For a description of how all other deployment types define the Java version to deploy along with the engine, see the Control Hub documentation.

Security Manager

Transformer includes a Java Security Manager that is enabled by default. For enhanced security, you can enable the Transformer Security Manager which prevents stages from accessing files in protected Transformer directories.

Transformer can use one of the following security managers:
Java Security Manager

By default, Transformer uses the Java Security Manager. The Java Security Manager restricts the runtime permissions of user libraries. This allows administrators to control user libraries actions on production systems. For example, by default, user libraries cannot call out to network resources and potentially cause denial-of-service (DDoS) attacks.

The security policy is defined in the Security Policy configuration properties of the deployment. The file syntax is java standard.

Transformer Security Manager
For enhanced security, enable the Transformer Security Manager. The Transformer Security Manager prevents stages from accessing files in protected Transformer directories, regardless of how the Security Policy configuration properties of the deployment are defined.
To enable the Transformer Security Manager, uncomment the security_manager.transformer_manager.enable property in the Transformer configuration properties of the deployment.
Note: If you use an older JVM version, the Transformer Security Manager might encounter some JVM known issues.

Protected Directories

When the Transformer Security Manager is enabled, the following Transformer directories are protected directories:
  • $TRANSFORMER_CONF - Stages cannot access files in the configuration directory.
  • $TRANSFORMER_DATA - Stages cannot access files in the data directory.
  • $TRANSFORMER_EXTERNAL_RESOURCES - Stages can read files in the resources directory, but cannot write to files in the directory.
  • $TRANSFORMER_RESOURCES - Stages can read files in the resources directory, but cannot write to files in the directory.

If needed, you can allow stages to access specific files in these protected directories by modifying Transformer Security Manager exception properties in the Security Policy configuration properties of the deployment. However, use caution when configuring exceptions to these protected directories.

You can configure exceptions to protected directories as follows:
Exceptions for all stage libraries
To allow all stage libraries access to files in protected directories, modify the security_manager.transformer_dirs.exceptions property to define files that can be accessed.
Exceptions for specific stage libraries
To allow a specific stage library access to files in protected directories, add the following property and then define the files that the stage library can access:
security_manager.transformer_dirs.exceptions.<stage_library_name>=<file_path>
For example, the default Transformer configuration properties includes an exception for the Java keystore credential store stage library defined as follows:
security_manager.transformer_dirs.exceptions.lib.streamsets-transformer-jks-credentialstore-lib=$TRANSFORMER_CONF/jks-credentialStore.pkcs12

When you configure a Security Manager exception property, use the appropriate directory environment variable in the file path: $TRANSFORMER_CONF, $TRANSFORMER_DATA, or $TRANSFORMER_RESOURCES. You can enter multiple file paths separated by commas.