Staging Directory
To run pipelines on a Databricks cluster, Transformer must store files in a staging directory on Databricks File System (DBFS).
You can configure the root directory to use as the staging directory. The default staging directory is /streamsets.
When a pipeline runs on an existing interactive cluster, configure pipelines to use the same staging directory so that each job created within Databricks can reuse the common files stored in the directory. Pipelines that run on different clusters can use the same staging directory as long as the pipelines are started by the same Transformer instance. Pipelines that are started by different instances of Transformer must use different staging directories. Different Transformer instances cannot send pipelines to the same cluster.
When a pipeline runs on a provisioned job cluster, using the same staging directory for pipelines is best practice, but not required.
- Files that can be reused across pipelines
- Transformer stores files that can be reused across pipelines, including Transformer libraries and external resources such as JDBC drivers, in the following location:
- Files specific to each pipeline
- Transformer stores files specific to each pipeline, such as the pipeline JSON file and resource files used by the pipeline, in the following directory: