Hadoop Impersonation Mode
When the Hadoop YARN cluster is configured for impersonation but not for Kerberos authentication, you can configure the Hadoop impersonation mode that Transformer uses when performing tasks in the Hadoop system.
- As the user defined in the pipeline properties - When configured, Transformer uses the specified Hadoop user to launch the Spark application and to access files in the Hadoop system.
- As the currently logged in Transformer user who starts the pipeline - When no Hadoop user is defined in the pipeline properties, Transformer uses the user who starts the pipeline.
The system administrator can configure Transformer to
always use the user who starts the pipeline by enabling the
hadoop.always.impersonate.current.user
property in the Transformer
configuration fileconfiguration properties. When enabled, configuring a Hadoop user within a pipeline is not allowed.
Configure Transformer to always impersonate as the user who starts the pipeline when you want to prevent access to data in Hadoop systems by the pipeline-level property.
For example, say you use roles, groups, and pipeline permissions to ensure that only authorized operators can start pipelines. You expect that the operator user accounts are used to access all external systems. But a pipeline developer can specify an HDFS user in a pipeline and bypass your attempts at security. To close this loophole, configure Transformer to always use the user who starts the pipeline to read from or write to Hadoop systems.
To always use the user who starts the pipeline, in the Transformer
configuration fileconfiguration properties, uncomment the hadoop.always.impersonate.current.user
property and
set it to true.