Transformer Proxy Users

To ensure that Transformer pipelines run as expected, ensure that all Transformer proxy users have permissions on the required directories.

By default, Transformer uses the user who starts a pipeline as a proxy user to launch the Spark application and to access files in the Hadoop system.

When the Hadoop YARN cluster uses Kerberos authentication, you must enable proxy users for Kerberos in the Transformer installation. Or, you can configure individual pipelines to use a Kerberos principal and keytab to override the default proxy user.

When Transformer uses Hadoop impersonation without Kerberos authentication, you can configure a Hadoop user in individual pipelines to override the default proxy user, the user who starts the pipeline.

You can use a Transformer configuration property to prevent overriding the proxy user. This option is highly recommended. It ensures that the user who starts the pipeline is always used as the proxy user and prevents users from entering a different user name in pipeline properties.