Kerberos Authentication
When the Hadoop YARN cluster uses Kerberos authentication, Transformer uses the user who starts the pipeline as the proxy user to launch the Spark application and to access files in the Hadoop system, unless you configure a Kerberos principal and keytab for the pipeline.
Using a Kerberos principal and keytab enables Spark to renew Kerberos tokens as needed, and is strongly recommended.
For example, you should configure a Kerberos principal and keytab for long-running pipelines, such as streaming pipelines, so that the Kerberos token can be renewed by Spark. If Transformer uses a proxy user for a pipeline that runs for longer than the maximum lifetime of the Kerberos token, the Kerberos token expires and the proxy user cannot be authenticated.
For more information about submitting Spark applications to Hadoop clusters that use Kerberos authentication, see the Apache Spark documentation.