Extra Spark Configuration
When you create a pipeline, you can define extra Spark configuration properties that determine how the pipeline runs on Spark. Transformer passes the configuration properties to Spark when it launches the Spark application.
You can add any additional Spark configuration property, as described in the Spark configuration documentation.
| Configuration Property | Description | 
|---|---|
| spark.home | Overrides the SPARK_HOME environment variable set on the
                                    machine. For example, let's say that multiple Spark versions
                                    are installed locally on the Transformer machine. You can add the  | 
Performance Tuning Properties
By default, Transformer adds several Spark configuration properties to each pipeline with suggested values. These properties override the default Spark values in the cluster.
The defaults for these properties should work in most cases. If you are an advanced user, you can tune the performance of a specific pipeline by modifying these properties or by adding additional Spark configuration properties.
For more information about these configuration properties, see the Spark Configuration documentation.
Transformer adds the following Spark configuration properties to each pipeline:
| Spark Configuration Property | Description | 
|---|---|
| spark.driver.memory | Maximum amount of memory that the Spark driver uses to run the pipeline. | 
| spark.driver.cores | Number of cores that the Spark driver uses to run the pipeline. | 
| spark.executor.memory | Maximum amount of memory that each Spark executor uses to run the pipeline. | 
| spark.executor.cores | Number of cores that each Spark executor uses to run the
                                    pipeline. Databricks and Dataproc do not allow overrides of this configuration property. This property is ignored when running the pipeline on a Databricks or Dataproc cluster. | 
| spark.dynamicAllocation.enabled | Enables dynamic resource allocation. Spark uses as many executors
                                as required to run the pipeline. Note: Local pipelines always run on
                                    one Spark executor. | 
| spark.shuffle.service.enabled | Enables the external shuffle service. Must be true when dynamic allocation is enabled. | 
| spark.dynamicAllocation.minExecutors | Minimum number of Spark executors that the pipeline runs on when dynamic allocation is enabled. | 
| spark.dynamicAllocation.maxExecutors | Maximum number of Spark executors that the pipeline runs on when
                                dynamic allocation is enabled. The maximum number of Spark executors allowed for each pipeline is
                                    determined by your account type. You can decrease this number to
                                    limit executor usage in the cluster, but you cannot increase the
                                        number. Note: When dynamic allocation is disabled, then the
                                             spark.executor.instancesconfiguration
                                        property determines the number of Spark executors used for
                                        the pipeline. The maximum value of thespark.executor.instancesproperty is
                                        also determined by your account type. |