Databricks Query

The Databricks Query executor runs one or more Spark SQL queries on Databricks each time it receives an event record. Use the executor as part of an event stream in the pipeline. For information about supported versions, see Supported Systems and VersionsSupported Systems and Versions in the Data Collector documentation.

For example, you might use the Databricks Query executor to run a Spark SQL query that executes the VACUUM command to remove leftover files when the pipeline stop event is generated.

The Databricks Query executor uses a JDBC connection to connect to the Databricks cluster. When you configure the executor, you specify the JDBC connection string and credentials to use to connect to the Databricks cluster, and then you define the Spark SQL queries to run.

When needed, you also define the connection information that the executor uses to connect to the storage location in Amazon S3 or Azure Data Lake Storage Gen2.

You can also configure the executor to generate events for another event stream. For more information about dataflow triggers and the event framework, see Dataflow Triggers Overview.

Before you use the Databricks Query executor, you must complete a prerequisite task. The executor is available in the Databricks Enterprise stage library.the prerequisite tasks, including installing the Databricks stage library. The Databricks stage library is an Enterprise stage libraryEnterprise stage library. Releases of Enterprise stage libraries occur separately from Data Collector releases. For more information, see Enterprise Stage Libraries in the Data Collector documentation.