Java and Security Configuration

Data Collector includes several advanced properties that you can modify to customize the following areas:
  • Java configuration options
  • Security Manager that restricts the runtime permissions of user libraries

Java Configuration Options

You define Java configuration options used by Data Collector in the deployment.

In Control Hub, edit the deployment. In the Configure Engine section, click Advanced Configuration. Then, click Java Configuration.

When defining Java configuration options, avoid defining duplicate options. If you do define duplicates, the last option passed to the JVM usually takes precedence.

Java Heap Size

Modify the Data Collector Java heap size as necessary, based on the resources available on the host machine. By default, Data Collector uses 50 percent of the available memory on the host machine as the Java heap size. In most cases, the default percentage value is sufficient.

The Java heap size determines the heap size allocated to Data Collector and affects the amount of memory Data Collector can use when it runs a pipeline. Running a pipeline can use up to 65% of the allocated heap size. For example, with a heap size of 2048 MB, you can configure a pipeline to use up to 65% - that's 1331 MB of memory.

To modify the Java heap size, first select the JVM memory strategy to use:
  • Percentage - Allocates a percentage of the available memory on the host machine as the Java heap size.
  • Absolute - Allocates an absolute number in megabytes as the Java heap size.

Based on your selection, configure the minimum and maximum Java heap size in percentage or in an absolute value in megabytes. To avoid constant recalculation of the allocated heap size, set both the minimum and maximum properties to the same value.

Data Collector requires a heap size of at least 1024 MB to run. As a result, the engine always uses a minimum of 1024 MB for the heap size, regardless of the configured size.

Note: In the pipeline properties, you can use the jvm:maxMemoryMB() function to help define the percentage of the heap size the pipeline uses.

Remote Debugging

You can enable remote debugging to debug a Data Collector instance running on a remote machine.

Enable remote debugging by modifying the Java Options property in the Java configuration properties. Add the following debugging options to the property, where port_number is an open port number on the remote machine running Data Collector:

-Xdebug -Xrunjdwp:server=y,transport=dt_socket,address=<port_number>,suspend=n

For example, to debug Data Collector on a remote machine using port number 2005, define the Java options as follows:

-Xdebug -Xrunjdwp:server=y,transport=dt_socket,address=2005,suspend=n

Garbage Collector

You can define the Java garbage collector that Data Collector uses. The default garbage collector depends on the Java version installed on the Data Collector machine:
  • Java 8 - Default is the Concurrent Mark Sweep (CMS) garbage collector.
  • Java 11 or later - Default is the G1 garbage collector.

Define the garbage collector by modifying the Java Options property in the Java configuration properties. If you define another garbage collector, test and evaluate Data Collector performance before making the same change in a production environment. Garbage collector performance depends on each particular use case.

To enable the G1 garbage collector when Data Collector uses Java 8, specify the UseConcMarkSweepGC, UseParNewGC, and UseG1GC options. For example, add all of the following options to the property:

-XX:-UseConcMarkSweepGC -XX:-UseParNewGC -XX:+UseG1GC

Security Manager

Data Collector includes a Java Security Manager that is enabled by default. For enhanced security, you can enable the Data Collector Security Manager which prevents stages from accessing files in protected Data Collector directories.

Data Collector can use one of the following security managers:
Java Security Manager

By default, Data Collector uses the Java Security Manager. The Java Security Manager restricts the runtime permissions of user libraries. This allows administrators to control user libraries actions on production systems. For example, by default, user libraries cannot call out to network resources and potentially cause denial-of-service (DDoS) attacks.

The security policy is defined in the Security Policy configuration properties of the deployment. The file syntax is java standard.

Data Collector Security Manager
For enhanced security, enable the Data Collector Security Manager. The Data Collector Security Manager prevents stages from accessing files in protected Data Collector directories, regardless of how you define the Security Policy configuration properties of the deployment.
To enable the Data Collector Security Manager, uncomment the security_manager.sdc_manager.enable property in the Data Collector configuration properties.
Note: If you use an older JVM version, the Data Collector Security Manager might encounter some JVM known issues.

Protected Directories

When the Data Collector Security Manager is enabled, the following Data Collector directories are protected directories:
  • $SDC_CONF - Stages cannot access files in the configuration directory.
  • $SDC_DATA - Stages cannot access files in the data directory.
  • $SDC_EXTERNAL_RESOURCES - Stages can read files in the resources directory, but cannot write to files in the directory.
  • $SDC_RESOURCES - Stages can read files in the resources directory, but cannot write to files in the directory.

If needed, you can allow stages to access specific files in these protected directories by modifying Data Collector Security Manager exception properties in the Security Policy configuration properties of the deployment. However, use caution when configuring exceptions to these protected directories.

You can configure exceptions to protected directories as follows:
Exceptions for all stage libraries
To allow all stage libraries access to files in protected directories, modify the security_manager.sdc_dirs.exceptions property to define files that can be accessed.
Exceptions for specific stage libraries
To allow a specific stage library access to files in protected directories, add the following property and then define the files that the stage library can access:
security_manager.sdc_dirs.exceptions.<stage_library_name>=<file_path>
For example, the default Data Collector configuration properties includes an exception for the Java keystore credential store stage library defined as follows:
security_manager.sdc_dirs.exceptions.lib.streamsets-datacollector-jks-credentialstore-lib=$SDC_CONF/jks-credentialStore.pkcs12

When you configure a Security Manager exception property, use the appropriate directory environment variable in the file path: $SDC_CONF, $SDC_DATA, or $SDC_RESOURCES. You can enter multiple file paths separated by commas.