Use a text editor to edit the sdc-env.sh file. Some of the environment variables in the file are commented out and do not reflect the default values. Be sure to uncomment the line when you change a variable value.
After you edit the file, restart Data Collector to enable the changes.
Use a text editor to edit the sdcd-env.sh file.
After you edit the file, restart Data Collector to enable the changes.
Override the default values in the sdc.service file using the same procedure that you use to override unit configuration files on a systemd init system. For an example, see "Example 2. Overriding vendor settings" in this systemd.unit manpage.
systemctl daemon-reload
Then restart Data Collector to enable the changes.
Data Collector includes environment variables that define the directories used to store configuration, data, log, and resource files.
The $SDC_DIST environment variable defines the Data Collector runtime directory. The runtime directory is the base Data Collector directory that stores the executables and related files. This environment variable is set during installation.
When you start Data Collector manually, the default values of the remaining directory variables are relative to the $SDC_DIST runtime directory. When you start Data Collector as a service, the default values of the remaining directory variables are absolute paths that are outside of the $SDC_DIST runtime directory.
Modify environment variables using the method required by your installation type.
You can configure the following environment variables that define directories:
When you run Data Collector as a service, Data Collector runs as the system user account and group defined in environment variables. The default system user and group are named sdc.
You can modify the values of the environment variables to point to another system user or group.
Modify environment variables using the method required by your installation type.
chown -R myuser:myuser /etc/sdc
You define Java configuration options used by Data Collector in environment variables.
For a tarball or RPM installation, define Java configuration options in the following environment variables:
Data Collector loads the value of the version-specific environment variable and adds it to the SDC_JAVA_OPTS environment variable.
For a Cloudera Manager installation, define Java configuration options by configuring the StreamSets service through Cloudera Manager.
Increase or decrease the Data Collector Java heap size as necessary, based on the resources available on the host machine. By default, the Java heap size is 1024 MB.
The Java heap size determines the heap size allocated to Data Collector and affects the amount of memory Data Collector can use when it runs a pipeline. Running a pipeline can use up to 65% of the allocated heap size.
Define the heap size based on your installation:
Define the heap size in the SDC_JAVA_OPTS environment variable.
export SDC_JAVA_OPTS="-Xmx1024m -Xms1024m -server ${SDC_JAVA_OPTS}"
to increase the Xmx and Xms settings as follows:
export SDC_JAVA_OPTS="-Xmx2048m -Xms2048m -server ${SDC_JAVA_OPTS}"
Modify environment variables using the method required by your installation type.
export SDC_JAVA_OPTS="-Xmx2048m -Xms2048m"
You can enable remote debugging to debug a Data Collector instance running on a remote machine.
Define debugging options in the SDC_JAVA_OPTS environment variable.
-Xdebug -Xrunjdwp:server=y,transport=dt_socket,address=<port_number>,suspend=n
export SDC_JAVA_OPTS="-Xmx1024m -Xms1024m -Xdebug -Xrunjdwp:server=y,transport=dt_socket,address=2005,suspend=n -server ${SDC_JAVA_OPTS}"
Modify environment variables using the method required by your installation type.
-Xdebug -Xrunjdwp:server=y,transport=dt_socket,address=2005,suspend=n
You can define the Java garbage collector that Data Collector uses. By default, Data Collector uses the Concurrent Mark Sweep (CMS) garbage collector.
For example, if you configure Data Collector to use a large heap size, you might want to use the G1 garbage collector. If you define another garbage collector, test and evaluate Data Collector performance before making the same change in a production environment. Garbage collector performance depends on each particular use case.
Define the garbage collector based on your installation:
For example, the default garbage collector is defined as follows:
export SDC_JAVA8_OPTS=${SDC_JAVA8_OPTS:-"-XX:+UseConcMarkSweepGC -XX:+UseParNewGC"}
To use the G1 garbage collector, set the option as follows:
export SDC_JAVA8_OPTS=${SDC_JAVA8_OPTS:-"-XX:+UseG1GC"}
Modify environment variables using the method required by your installation type.
export SDC_JAVA8_OPTS="-XX:+UseG1GC"
Data Collector enables garbage collector logging by default to facilitate troubleshooting. Log files are written to $SDC_LOG/gc.log. You can disable logging.
Disable garbage collector logging based on your installation:
export SDC_GC_LOGGING=false
Modify environment variables using the method required by your installation type.
export SDC_GC_LOGGING=false
Data Collector includes a Java Security Manager that is enabled by default.
The Java Security Manager restricts the runtime permissions of user libraries. This allows administrators to control what user libraries do on production systems. For example, by default, user libraries cannot call out to network resources and potentially cause denial of service attacked.
The security policy is defined in the $SDC_CONF/sdc-security.policy file. The file syntax is java standard.
To disable the Java Security Manager, modify the SDC_SECURITY_MANAGER_ENABLED environment variable.
Modify environment variables using the method required by your installation type.
You can edit the SDC_ROOT_CLASSPATH environment variable to define the path to JAR files to be added to the Data Collector root classloader.
Use the variable for components that must be in the root classloader, such as Snappy. Default is $SDC_DIST/root-lib/'*'.
Modify environment variables using the method required by your installation type.
By default, when Data Collector encounters an out of memory error (OOME), it creates a heap dump.
By default, heap dump files are written to the file defined in the SDC_LOG environment variable and use a naming convention that allows generating multiple heap dump files, as follows: $SDC_LOG/sdc_heapdump_${timestamp}.hprof.
You can change the name of the heap dump files, but we recommend using the ${timestamp} or similar variable to ensure that the heap dump name is unique.
Note that Java Virtual Machine, and therefore Data Collector, does not overwrite existing heap dump files. For example, if you use $SDC_LOG/sdc_heapdump.hprof as the file name, after Data Collector creates the first heap dump file, it will not create another until you remove the existing file.
Heap Dump Environment Variable | Description |
---|---|
SDC_HEAPDUMP_ON_OOM | Specifies whether Data Collector generates a heap dump upon encountering an out of memory error.
Default is true. |
SDC_HEAPDUMP_PATH | Specifies the file name and location to use for heap dump files.
By default, heap dumps are written to $SDC_LOG/sdc_heapdump_${timestamp}.hprof. To specify a different file name or location, uncomment the property and enter the location and file name to use. Tip: To write multiple heap dump files to a
directory, use a function or variable to ensure that the
file name is unique. If a file of the same name exists in
the directory, Data Collector does not create a new heap dump file.
|
Modify environment variables using the method required by your installation type.