Cluster Configuration
When provisioning a cluster for a pipeline, Databricks creates a new Databricks job cluster upon the initial run of a pipeline. You define the Databricks cluster properties to use in the Cluster Configuration pipeline property. Transformer uses Databricks default values for all Databricks cluster properties that are not defined in the Cluster Configuration pipeline property.
When needed, you can override the Databricks default values by defining additional
cluster properties in the Cluster Configuration pipeline property. For example, to
provision a cluster that uses an instance pool, you can add and define the
instance_pool_id
property in the Cluster Configuration property.
When defining cluster configuration properties, use the property names and values as expected by Databricks. The Cluster Configuration property defines cluster properties in JSON format.
Databricks Cluster Property | Description |
---|---|
num_workers | Number of worker nodes in the cluster. |
spark_version | Databricks Runtime and Apache Spark version. |
node_type_id | Type of worker node. |
For information about other Databricks cluster properties, see the Databricks documentation.