Performance Tuning Properties
By default, Transformer adds several Spark configuration properties to each pipeline with suggested values. These properties override the default Spark values in the cluster.
The defaults for these properties should work in most cases. If you are an advanced user, you can tune the performance of a specific pipeline by modifying these properties or by adding additional Spark configuration properties.
For more information about these configuration properties, see the Spark Configuration documentation.
Transformer adds the following Spark configuration properties to each pipeline:
Spark Configuration Property | Description |
---|---|
spark.driver.memory | Maximum amount of memory that the Spark driver uses to run the pipeline. |
spark.driver.cores | Number of cores that the Spark driver uses to run the pipeline. |
spark.executor.memory | Maximum amount of memory that each Spark executor uses to run the pipeline. |
spark.executor.cores | Number of cores that each Spark executor uses to run the
pipeline. Databricks and Dataproc do not allow overrides of this configuration property. This property is ignored when running the pipeline on a Databricks or Dataproc cluster. |
spark.dynamicAllocation.enabled | Enables dynamic resource allocation. Spark uses as many executors
as required to run the pipeline. Note: Local pipelines always run on
one Spark executor. |
spark.shuffle.service.enabled | Enables the external shuffle service. Must be true when dynamic allocation is enabled. |
spark.dynamicAllocation.minExecutors | Minimum number of Spark executors that the pipeline runs on when dynamic allocation is enabled. |
spark.dynamicAllocation.maxExecutors | Maximum number of Spark executors that the pipeline runs on when
dynamic allocation is enabled. The maximum number of Spark
executors allowed for each pipeline is determined by
your StreamSets account. You can decrease this number to limit executor usage
in the cluster, but you cannot increase the number. Note: When
dynamic allocation is disabled, then the
spark.executor.instances configuration
property determines the number of Spark executors used for
the pipeline. The maximum value of the
spark.executor.instances property is
also determined by your StreamSets account. |