Kerberos Authentication

When the Hadoop YARN cluster uses Kerberos authentication, Transformer uses the user who starts the pipeline as the proxy user to launch the Spark application and to access files in the Hadoop system, unless you configure a Kerberos principal and keytab for the pipeline.

Using a Kerberos principal and keytab enables Spark to renew Kerberos tokens as needed, and is strongly recommended.

For example, you should configure a Kerberos principal and keytab for long-running pipelines, such as streaming pipelines, so that the Kerberos token can be renewed by Spark. If Transformer uses a proxy user for a pipeline that runs for longer than the maximum lifetime of the Kerberos token, the Kerberos token expires and the proxy user cannot be authenticated.

Note: If you choose to use proxy users when the cluster uses Kerberos authentication, you first must enable proxy users for Kerberos in the Transformer installation.

For more information about submitting Spark applications to Hadoop clusters that use Kerberos authentication, see the Apache Spark documentation.