Data Collector Directories

Data Collector includes environment variables that define the directories used to store files used by Data Collector, such as configuration files, log files, and runtime resources.

The SDC_DIST environment variable defines the Data Collector runtime directory. The runtime directory is the base Data Collector directory that stores the executables and related files. This environment variable is set during installation.

When you start Data Collector manually, the default values of the remaining directory variables are relative to the $SDC_DIST runtime directory. When you start Data Collector as a service, the default values of the remaining directory variables are absolute paths that are outside of the $SDC_DIST runtime directory.

Modify environment variables using the method required by your installation type.

Note: StreamSets does not recommend using NFS or NAS to store Data Collector files.

You can configure the following environment variables that define directories:

Environment Variable Description
SDC_CONF

Defines the configuration directory for the Data Collector configuration file, sdc.properties, and related realm properties files and keystore files. Also includes the logj4 properties file.

Default directories:

  • Manual start: $SDC_DIST/etc
  • Service start: /etc/sdc
SDC_DATA

Defines the data directory for pipeline configuration and run details.

Default directories:

  • Manual start: $SDC_DIST/data
  • Service start: /var/lib/sdc
SDC_LOG

Defines the log directory.

Default directories:

  • Manual start: $SDC_DIST/log
  • Service start: /var/log/sdc
SDC_EXTERNAL_RESOURCES Defines an optional external resources directory. By default, this directory contains the following directories:
  • SDC_RESOURCES
  • STREAMSETS_LIBRARIES_EXTRA_DIR
  • USER_LIBRARIES_DIR

To define this directory, you must add the environment variable to the appropriate file. Set the variable to a directory outside of the $SDC_DIST runtime directory.

Default directory: $SDC_DIST/externalResources

SDC_RESOURCES Defines the directory for runtime resource files.

To configure this environment variable, you must uncomment the variable in the appropriate file. Set the variable to a directory outside of the $SDC_DIST runtime directory.

Default directories:

  • Manual start for tarball installations:

    $SDC_EXTERNAL_RESOURCES/resources

    This resolves to the following directory unless you define the SDC_EXTERNAL_RESOURCES environment variable:

    $SDC_DIST/externalResources/resources

  • Manual start for other installations: $SDC_DIST/resources
  • Service start: /var/lib/sdc-resources
STREAMSETS_LIBRARIES_EXTRA_DIR Defines the directory for external libraries.

To configure this environment variable, you must add the environment variable to the appropriate file. Set the variable to a directory outside of the $SDC_DIST runtime directory.

Default directory:

$SDC_EXTERNAL_RESOURCES/streamsets-libs-extras

This resolves to the following directory unless you define the SDC_EXTERNAL_RESOURCES environment variable:

$SDC_DIST/externalResources/streamsets-libs-extras

USER_LIBRARIES_DIR Defines the directory for custom stage libraries.

To configure this environment variable, you must add it to the appropriate file. Set the variable to a directory outside of the $SDC_DIST runtime directory.

Default directory:

$SDC_EXTERNAL_RESOURCES/user-libs

This resolves to the following directory unless you define the SDC_EXTERNAL_RESOURCES environment variable:

$SDC_DIST/externalResources/user-libs