Java Heap Size

Modify the Data Collector Java heap size as necessary, based on the resources available on the host machine. By default, the Java heap size is 1024 MB.By default, Data Collector uses 50 percent of the available memory on the host machine as the Java heap size. In most cases, the default percentage value is sufficient.

The Java heap size determines the heap size allocated to Data Collector and affects the amount of memory Data Collector can use when it runs a pipeline. Running a pipeline can use up to 65% of the allocated heap size. For example, with a heap size of 2048 MB, you can configure a pipeline to use up to 65% - that's 1331 MB of memory.

Use the following Java options to define the Java heap size:
  • Xmx - Defines the maximum heap size.
  • Xms - Defines the minimum heap size.
To modify the Java heap size, first select the JVM memory strategy to use:
  • Percentage - Allocates a percentage of the available memory on the host machine as the Java heap size.
  • Absolute - Allocates an absolute number in megabytes as the Java heap size.

Based on your selection, configure the minimum and maximum Java heap size in percentage or in an absolute value in megabytes. To avoid constant recalculation of the allocated heap size, set both the minimum and maximum properties to the same value.

Tip: To avoid constant recalculation of the allocated heap size, set both the minimum and maximum properties to the same value. To define the unit of measure, use m for MB and g for GB.

Data Collector requires a heap size of at least 1024 MB to run. As a result, the engine always uses a minimum of 1024 MB for the heap size, regardless of the configured size.

Define the heap size based on your installation:

Tarball or RPM installation

Define the heap size in the SDC_JAVA_OPTS environment variable.

For example, to double the heap size, increase the Xmx and Xms settings as follows:

export SDC_JAVA_OPTS="${SDC_JAVA_OPTS} -Xmx2048m -Xms2048m -server"

Modify environment variablesModify environment variables using the method required by your installation type.

Cloud service provider installation
Define the heap size percentage in the SDC_HEAP_SIZE_PERCENTAGE environment variable. Default is 50% of the available memory on the virtual machine.
To define a specific heap size instead of a percentage, comment out the SDC_HEAP_SIZE_PERCENTAGE environment variable and then define the heap size in the SDC_JAVA_OPTS environment variable as described above for an RPM installation.

Modify environment variablesModify environment variables using the method required by your installation type.

Cloudera Manager installation
Define the heap size in the Data Collector Advanced Configuration Snippet (Safety Valve) for sdc-env.sh field for the StreamSets service in Cloudera Manager.
For example, to double the heap size, add the following to the sdc-env.sh safety valve:
export SDC_JAVA_OPTS="-Xmx2048m -Xms2048m"
Note: In the pipeline properties, you can use the jvm:maxMemoryMB() function to help define the percentage of the heap size the pipeline uses.