Hadoop YARN Directory Requirements

When using a Hadoop YARN cluster manager, the following directories must exist:
Spark node local directories
The Spark yarn.nodemanager.local-dir configuration parameter in the yarn-site.xml file defines one or more directories that must exist on each Spark node.
The value of the configuration parameter should be available in the cluster manager user interface. By default, the property is set to ${hadoop.tmp.dir}/nm-local-dir.
The specified directories must meet all of the following requirements for each node of the cluster:
  • Exist on each node of the cluster.
  • Be owned by YARN.
  • Have read permission granted to the Transformer proxy user.
HDFS application resource directories
Spark stores resources for all Spark applications started by Transformer in the HDFS home directory of the Transformer proxy user. Home directories are named after the Transformer proxy user, as follows:
/user/<Transformer proxy user name>
Ensure that both of the following requirements are met:
  • Each resource directory exists on HDFS.
  • Each Transformer proxy user has read and write permission on their resource directory.
For example, you might use the following command to add a Transformer user, tx, to a spark user group:
usermod -aG spark tx
Then, you can use the following commands to create the /user/tx directory and ensure that the spark user group has the correct permissions to access the directory:
sudo -u hdfs hdfs dfs -mkdir /user/tx
sudo -u hdfs hdfs dfs -chown tx:spark /user/tx
sudo -u hdfs hdfs dfs -chmod -R 775 /user/tx