Java and Security Configuration

Data Collector includes several advanced properties that you can modify to customize the following areas:
  • Java configuration options
  • Security Manager that restricts the runtime permissions of user libraries

Java Configuration Options

You define Java configuration options used by Data Collector in the deployment.

In Control Hub, edit the deployment. In the Configure Engine section, click Advanced Configuration. Then, click Java Configuration.

When defining Java configuration options, avoid defining duplicate options. If you do define duplicates, the last option passed to the JVM usually takes precedence.

Java Heap Size

Modify the Data Collector Java heap size as necessary, based on the resources available. By default, Data Collector uses 50 percent of the available memory as the Java heap size. In most cases, the default percentage value is sufficient.

The Java heap size determines the heap size allocated to Data Collector and affects the amount of memory Data Collector can use when it runs a pipeline. Running a pipeline can use up to 65% of the allocated heap size. For example, with a heap size of 2048 MB, you can configure a pipeline to use up to 65% - that's 1331 MB of memory.

To modify the Java heap size, first select the JVM memory strategy to use:
  • Percentage - For a tarball installation, allocates a percentage of the available memory on the host machine as the Java heap size. For a Docker image installation, allocates a percentage of the available memory in the Docker container as the Java heap size.
  • Absolute - Allocates an absolute number in megabytes as the Java heap size.

Based on your selection, configure the minimum and maximum Java heap size in percentage or in an absolute value in megabytes. To avoid constant recalculation of the allocated heap size, set both the minimum and maximum properties to the same value.

Data Collector requires a heap size of at least 1024 MB to run. As a result, the engine always uses a minimum of 1024 MB for the heap size, regardless of the configured size.

Note: By default, Java 8 and Java 11 enable the UseCompressedOops option, which allows a maximum of 32 GB of heap size regardless of the configured size. To allocate more than 32 GB, disable the option by adding the following Java option to the Java Options property in the Java configuration properties: -XX:-UseCompressedOops

Remote Debugging

You can enable remote debugging to debug a Data Collector instance running on a remote machine.

Enable remote debugging by modifying the Java Options property in the Java configuration properties. Add the following debugging options to the property, where port_number is an open port number on the remote machine running Data Collector:

-Xdebug -Xrunjdwp:server=y,transport=dt_socket,address=<port_number>,suspend=n

For example, to debug Data Collector on a remote machine using port number 2005, define the Java options as follows:

-Xdebug -Xrunjdwp:server=y,transport=dt_socket,address=2005,suspend=n

Garbage Collector

You can define the Java garbage collector that Data Collector uses. The default garbage collector depends on the Java version installed on the Data Collector machine:
  • Java 8 - Default is the Concurrent Mark Sweep (CMS) garbage collector.
  • Java 11 or later - Default is the G1 garbage collector.

Define the garbage collector by modifying the Java Options property in the Java configuration properties. If you define another garbage collector, test and evaluate Data Collector performance before making the same change in a production environment. Garbage collector performance depends on each particular use case.

To enable the G1 garbage collector when Data Collector uses Java 8, specify the UseConcMarkSweepGC, UseParNewGC, and UseG1GC options. For example, add all of the following options to the property:

-XX:-UseConcMarkSweepGC -XX:-UseParNewGC -XX:+UseG1GC

Java Version

When you create an Azure VM deployment or edit a deactivated Azure VM deployment, you can define the Java version that Control Hub sets up on the provisioned Azure VM instance. You cannot define the Java version while a deployment is active.

In most cases, you can use the default Java version. Some stage libraries and use cases require specific Java versions, as described in Java Versions and Available Features.

To define the Java version, select a version from the Java Version property.
Note: Deployments for Data Collector 5.9.x and earlier support selecting Java 8 only.

For a description of how all other deployment types define the Java version to deploy along with the engine, see the Control Hub documentation.

Security Manager for Java 8

When using Java 8, Data Collector includes a Java Security Manager that is enabled by default. For enhanced security, you can enable the Data Collector Security Manager which prevents stages from accessing files in protected Data Collector directories.

Important: Oracle has deprecated and marked Java Security Manager for removal. As a result, when using Java 9 or later, Data Collector cannot use either security manager.
When using Java 8, Data Collector can use one of the following security managers:
Java Security Manager

By default, Data Collector uses the Java Security Manager. The Java Security Manager restricts the runtime permissions of user libraries. This allows administrators to control user libraries actions on production systems. For example, by default, user libraries cannot call out to network resources and potentially cause denial-of-service (DDoS) attacks.

The security policy is defined in the Security Policy configuration properties of the deployment. The file syntax is java standard.

Data Collector Security Manager
For enhanced security, enable the Data Collector Security Manager. The Data Collector Security Manager prevents stages from accessing files in protected Data Collector directories, regardless of how you define the Security Policy configuration properties of the deployment.
To enable the Data Collector Security Manager, uncomment the security_manager.sdc_manager.enable property in the Data Collector configuration properties.

Disabling Java Security Manager

If needed, you can disable Java Security Manager for Data Collector in the Data Collector configuration properties.

Note: Disabling Java Security Manager also disables Data Collector Security Manager if it is enabled.

To disable Java Security Manager, perform the following steps:

  1. In Control Hub, edit the deployment. In the Configure Engine section, click Advanced Configuration. Then, click Data Collector Configuration.
  2. Uncomment the security_manager.sdc_manager.enable property, and set it to false.

Protected Directories

When using Java 8 and the Data Collector Security Manager is enabled, the following Data Collector directories are protected directories:
  • $SDC_CONF - Stages cannot access files in the configuration directory.
  • $SDC_DATA - Stages cannot access files in the data directory.
  • $SDC_EXTERNAL_RESOURCES - Stages can read files in the resources directory, but cannot write to files in the directory.
  • $SDC_RESOURCES - Stages can read files in the resources directory, but cannot write to files in the directory.

If needed, you can allow stages to access specific files in these protected directories by modifying Data Collector Security Manager exception properties in the Security Policy configuration properties of the deployment. However, use caution when configuring exceptions to these protected directories.

You can configure exceptions to protected directories as follows:
Exceptions for all stage libraries
To allow all stage libraries access to files in protected directories, modify the security_manager.sdc_dirs.exceptions property to define files that can be accessed.
Exceptions for specific stage libraries
To allow a specific stage library access to files in protected directories, add the following property and then define the files that the stage library can access:
security_manager.sdc_dirs.exceptions.<stage_library_name>=<file_path>
For example, the default Data Collector configuration properties includes an exception for the Java keystore credential store stage library defined as follows:
security_manager.sdc_dirs.exceptions.lib.streamsets-datacollector-jks-credentialstore-lib=$SDC_CONF/jks-credentialStore.pkcs12

When you configure a Security Manager exception property, use the appropriate directory environment variable in the file path: $SDC_CONF, $SDC_DATA, or $SDC_RESOURCES. You can enter multiple file paths separated by commas.