Loading Data into Databricks Delta Lake

You can use several solutions to load data into a Delta Lake table on Databricks.

Before continuing with one of the solutions, ensure that you have completed all of the required prerequisites in Databricks, including generating a personal access token, configuring and starting your Databricks cluster, and then locating the JDBC URL used to access the cluster.

For detailed prerequisite steps, see one of the following Databricks articles depending on your staging location:
Then use one of the following solutions to build a pipeline that loads data into a Delta Lake table on Databricks:
  • Bulk load data into a Delta Lake table

    Build a pipeline that reads new Salesforce data, cleans some of the input data, and then passes the data to the Databricks Delta Lake destination. The Databricks Delta Lake destination first stages the data in an Amazon S3 staging location, and then uses the COPY command to copy the data from the staging location to a Delta Lake table.

  • Merge changed data into a Delta Lake table

    Build a pipeline that processes change data capture (CDC) data using the MySQL Binary Log origin and then passes the changed data to the Databricks Delta Lake destination. The Databricks Delta Lake destination first stages the changed data in an Amazon S3 staging location, and then uses the MERGE command to merge the changed data from the staging location to a Delta Lake table.