Databricks Delta Lake

Available when using an authoring Data Collector version 4.0.0 or later.

To create a Databricks Delta Lake connection, the appropriate Databricks stage library must be installed on the selected authoring Data Collector:
  • Data Collector 5.6.0 or later - Requires the Databricks stage library, streamsets-datacollector-sdc-databricks-lib.
  • Data Collector 4.0.0 to 5.5.x - Requires version 1.2.0 or later of the Databricks Enterprise stage library, streamsets-datacollector-databricks-lib.

For a description of the Databricks Delta Lake connection properties, see Databricks Delta Lake Connection Properties.

After you create a Databricks Delta Lake connection, you can use the connection in the following stage:
Engine Stage
Data Collector 4.0.0 or later Databricks Delta Lake destination

Databricks Delta Lake Connection Properties

When creating a Databricks Delta Lake connection, configure the following properties on the Databricks Delta Lake tab:
Databricks Delta Lake Property Description
JDBC URL JDBC URL used to connect to the Databricks cluster.

For connections that use Data Collector 5.6.0 and later, enter in the following format: jdbc:databricks://<server_hostname>:443/default;transportMode=http :ssl=1;httpPath=sql/protocolv1/o/0/xxxx-xxxxxx-xxxxxxxx;AuthMech=3;

For connections that use earlier versions of Data Collector, enter in the following format: jdbc:spark://<server_hostname>:443/default;transportMode=http :ssl=1;httpPath=sql/protocolv1/o/0/xxxx-xxxxxx-xxxxxxxx;AuthMech=3;

Tip: In Databricks, you can locate the JDBC URL for your cluster on the JDBC/ODBC tab in the cluster configuration details. As a best practice, remove the PWD parameter from the URL, and then enter the personal access token value in the Token property below.
Token Personal access token used to connect to the Databricks cluster.
Tip: To secure sensitive information, you can use credential stores or runtime resources.