Azure SQL
The Azure SQL destination writes data to a table in Azure SQL Database, Azure Synapse SQL Pool, or Microsoft SQL Server version 2008 or later.
The destination can perform a bulk copy when writing to the table. When performing a bulk copy, the destination creates the table when needed. When you use a write mode to append or overwrite data instead, the target table must exist before the pipeline starts.
When you configure the Azure SQL destination, you specify the database URL, database name, and credentials. You can also define any additional JDBC configuration properties that you want to use.
You specify the table to write to and whether to perform a bulk copy. When performing a bulk copy, you define related properties including the batch size, timeout, and reliability and isolation levels. You can optionally configure additional bulk copy properties.
When not performing a bulk copy, you specify the write mode for writing to an existing table.
Transformer includes a Microsoft JDBC driver for SQL Server with the destination. The destination uses Microsoft JDBC driver for SQL Server version 8 or later.
Partitioning
Spark runs a Transformer pipeline just as it runs any other application, splitting the data into partitions and performing operations on the partitions in parallel.
When the pipeline starts processing a new batch, Spark determines how to split pipeline data into initial partitions based on the origins in the pipeline. Spark uses these partitions for the rest of the pipeline processing, unless a processor causes Spark to shuffle the data.
When writing to a database table, Spark creates one connection to the database for each partition.
Write Mode
The Azure SQL destination uses the write mode to determine how to write to an existing table. You can configure a write mode when the destination is not configured to perform a bulk copy.
- Append rows to existing table
- The destination appends rows to the existing table.
- Overwrite existing table
- The destination removes all rows in the existing table before writing new
rows to the table. When you select overwrite mode, you also configure how
the destination removes rows from the table:
- Truncate all rows in the existing database table, and then write data to the table.
- Drop the existing table, recreate the table, and then write data to the table.
Configuring an Azure SQL Destination
-
On the Properties panel, on the General tab, configure the
following properties:
General Property Description Name Stage name. Description Optional description. Stage Library Stage library to use. Select the Azure SQL stage library installed on the cluster: - Azure SQL for Spark 3.0.x
- Azure SQL for Spark 3.1.x
- Azure SQL for Spark 3.3.x
Available when creating the pipeline with Transformer prebuilt with Scala 2.12.
-
On the Connection tab, configure the following
properties:
Connection Property Description URL URL for the database. Use the following format: <serverName>.database.windows.net:<port>
For example,
sales-sql-db-server.database.windows.net:1433
Database Name Name of the database to write to. User Database user name. Password Password. Additional JDBC Configuration Properties Additional JDBC configuration properties to use. To add properties, click Add and define the JDBC property name and value. You can use simple or bulk edit mode to configure the properties.
Use the property names and values as expected by JDBC.
-
On the Table tab, configure the following
properties:
Table Property Description Table Name of the table to write to. When bulk copy is enabled, the destination creates a table using the specified name. Otherwise, the table must exist before you start the pipeline.
Enable Bulk Copy Performs a bulk copy of the data. For more information about bulk copy, see the Azure documentation.
Batch Size Batch size to use when writing the data. Default is 1000 rows. Available when performing a bulk copy.
Lock Table Locks the table before writing the data, preventing other systems from writing to it. Use to improve performance for batches over 1000 rows. Available when performing a bulk copy.
Timeout Seconds to allow for the copy before timing out. Available when performing a bulk copy.
Reliability Level Reliability level to use for the write: - Best Effort - Data is written directly to the target table, which is not idempotent and can result in duplicate records.
- No Duplicates - Data is written using global temporary staging tables, which ensures that no duplicates are written.
Note: Microsoft temporary table limitations may prevent using the No Duplicates option when writing to Azure. When this occurs, the destination uses Best Effort. For more information, see the Microsoft documentation.Available when performing a bulk copy.
Isolation Level Isolation level to use for the write. Determines the consistency and concurrency levels for multiple transactions: - Read Uncommitted
- Read Committed
- Repeatable Read
- Serializable
- Snapshot
For more information about isolation levels, see the Microsoft SQL documentation. Available when performing a bulk copy.
Schema Check Enables a strict schema check between the dataframes and the target table. The check verifies that the number of columns, as well as the column names and types in the dataframe, match those in the target table.
Available when performing a bulk copy.
Additional Bulk Copy Configuration Properties Additional bulk copy properties to use. Available when performing a bulk copy.
Write Mode Mode to use to write to an existing table: - Overwrite existing table
- Append rows to existing table
Available when you do not enable bulk copy.
Remove Rows Determines how the destination removes rows from an existing table: - Truncate Table - Truncates all rows in the existing database table before writing to the table.
- Drop and Recreate Table - Drops the existing table and then recreates the table before writing to the table.
Available when the destination overwrites an existing table.
By default, the destination truncates the table.