Java Heap Size
Modify the Data Collector Java heap size as necessary, based on the resources available on the host machine. By default, the Java heap size is 1024 MB.By default, Data Collector uses 50 percent of the available memory on the host machine as the Java heap size. In most cases, the default percentage value is sufficient.
The Java heap size determines the heap size allocated to Data Collector and affects the amount of memory Data Collector can use when it runs a pipeline. Running a pipeline can use up to 65% of the allocated heap size. For example, with a heap size of 2048 MB, you can configure a pipeline to use up to 65% - that's 1331 MB of memory.
- Xmx - Defines the maximum heap size.
- Xms - Defines the minimum heap size.
- Percentage - Allocates a percentage of the available memory on the host machine as the Java heap size.
- Absolute - Allocates an absolute number in megabytes as the Java heap size.
Based on your selection, configure the minimum and maximum Java heap size in percentage or in an absolute value in megabytes. To avoid constant recalculation of the allocated heap size, set both the minimum and maximum properties to the same value.
Data Collector requires a heap size of at least 1024 MB to run. As a result, the engine always uses a minimum of 1024 MB for the heap size, regardless of the configured size.
Define the heap size based on your installation:
- Tarball or RPM installation
-
Define the heap size in the SDC_JAVA_OPTS environment variable.
For example, to double the heap size, increase the Xmx and Xms settings as follows:
export SDC_JAVA_OPTS="${SDC_JAVA_OPTS} -Xmx2048m -Xms2048m -server"
Modify environment variablesModify environment variables using the method required by your installation type.
- Cloud service provider installation
- Define the heap size percentage in the SDC_HEAP_SIZE_PERCENTAGE environment variable. Default is 50% of the available memory on the virtual machine.
- Cloudera Manager installation
- Define the heap size in the Data Collector Advanced Configuration Snippet (Safety Valve) for sdc-env.sh field for the StreamSets service in Cloudera Manager.
jvm:maxMemoryMB()
function
to help define the percentage of the heap size the pipeline uses.