Hadoop Impersonation Mode

You can configure how Data Collector impersonates a Hadoop user when performing tasks, such as reading or writing data, in Hadoop systems.

By default, Data Collector impersonates Hadoop users as follows:
  • As the user defined in stage properties - When configured, Data Collector uses the user defined in Hadoop-related stages.
  • As the currently logged in Data Collector user who starts the - When no user is defined in a Hadoop-related stage, Data Collector uses the user who starts the .
Note: In both cases, the Hadoop systems must be configured to allow the impersonation.

The system administrator can configure Data Collector to always use the user who starts the by enabling the stage.conf_hadoop.always.impersonate.current.user property in the Data Collector configuration file. When enabled, configuring a user within a stage is not allowed.

Configure Data Collector to always impersonate as the user who starts the when you want to prevent access to data in Hadoop systems by stage-level user properties.

For example, say you use roles, groups, and permissions to ensure that only authorized operators can start . You expect that the operator user accounts are used to access all external systems. But a developer can specify a HDFS user in a Hadoop stage and bypass your attempts at security. To close this loophole, configure Data Collector to always use the currently logged in Data Collector user to read from or write to Hadoop systems.

To always use the user who starts the , in the Data Collector configuration file, uncomment the stage.conf_hadoop.always.impersonate.current.user property and set it to true

With this property enabled, Data Collector prevents configuring an alternate user in the following Hadoop-related stages:
  • Hadoop FS Standalone origin and Hadoop FS destination
  • MapR FS Standalone origin and MapR FS destination
  • HBase lookup and destination
  • MapR DB destination
  • HDFS File Metadata executor
  • MapR FS File Metadata executor
  • MapReduce executor