Using a Keytab for Each Pipeline

Configure pipelines to use a Kerberos keytab and specify the source of the keytab. When you do not specify a keytab source, Transformer uses the user who starts the pipeline to launch the Spark application and to access files in the Hadoop system.

When using a keytab, Transformer uses the Kerberos principal to launch the Spark application and to access files in the Hadoop system. Transformer also includes the keytab file with the launched Spark application so that the Kerberos token can be renewed by Spark.

When you enable a pipeline to use a keytab, you configure one of the following keytab sources for the pipeline:
Transformer configuration fileconfiguration properties
The pipeline uses the same Kerberos keytab and principal configured for Transformer in the Transformer configuration fileconfiguration properties.
For information about specifying the Kerberos keytab in the Transformer configuration fileconfiguration properties, see Enabling the Properties File as the Keytab Source.
Pipeline configuration - file
The pipeline uses the Kerberos keytab file and principal configured for the pipeline. Store the keytab file on the Transformer machine.
In the pipeline properties, you define the absolute path to the keytab file and the Kerberos principal to use for that keytab.
Define a specific keytab and principal for a pipeline to ensure that only authorized users access data stored in HDFS files.
Pipeline configuration - credential store
The pipeline uses the Kerberos keytab file and principal configured for the pipeline. Add the Base64-encoded keytab to a credential store, and then use a credential function to retrieve the keytab from the credential store.
Note: Be sure to remove unnecessary characters, such as newline characters, before encoding the keytab.
In the pipeline properties, you use the credential:get() or credential:getWithOptions() credential function to retrieve the keytab, and you define the Kerberos principal to use for that keytab.
For more information about using credential stores with Transformer, see Credential Stores.
Define a specific keytab and principal for a pipeline to ensure that only authorized users access data stored in HDFS files. When using a credential store, you can also require group access to credential store secrets for an additional layer of security.