Spark Dynamic Allocation Prerequisite

Before you run a pipeline on a MapR cluster, you must set up Spark dynamic allocation on the cluster.

MapR provides a blog post that describes how to perform this task. Perform all of the steps described in the post, with the following change.

At this time, the "Enabling Dynamic Allocation in Apache Spark" section says to add the following entries to the /opt/mapr/spark/spark-1.6.1/conf/spark-defaults.conf file:

spark.dynamicAllocation.enabled = true
spark.shuffle.service.enabled = true
spark.dynamicAllocation.minExecutors = 5 
spark.executor.instances = 0

Setting spark.executor.instances to 0 generates an error. Instead, set spark.executor.instances to 1 or higher, up to the maximum number of executors allowed in the Transformer instance.