Cluster-Scoped Init Scripts

When you provision a Databricks cluster, you can specify cluster-scoped init scripts to execute before processing data. You might use init scripts to perform tasks such as installing a driver on the cluster or creating directories and setting permissions for them.

You can use cluster-scoped init scripts stored in the following locations:
  • DBFS from Pipeline - Databricks File System (DBFS) init script defined in the pipeline. When provisioning the cluster, Transformer temporarily stores the script in DBFS and removes it after the pipeline run.
  • DBFS from Location - Databricks File System init script stored on Databricks.
  • S3 from Location - Amazon S3 init script stored on AWS. Use only when provisioning a Databricks cluster on AWS.
  • ABFSS from Location - Azure init script stored on Azure Blob File System (ABFS). Use only when provisioning a Databricks cluster on Azure.
    Note: To use this option, you must provide an access key to access the init script.

When you specify more than one init script, place them in the order that you want them to run. If a script fails to run, Transformer cancels the cluster provisioning and stops the pipelinejob.

You can use any valid Databricks cluster-scoped init script. For more information about Databricks cluster-based init scripts, see the Databricks documentation.

Configure cluster-scoped init script properties on the Cluster tab of the pipeline properties. After you select the Provision a New Cluster property, you can configure the init script properties.