Azure SQL
The Azure SQL destination writes data to a table in Azure SQL Database, Azure Synapse SQL Pool, or Microsoft SQL Server version 2008 or later.
The destination can perform a bulk copy when writing to the table. When performing a bulk copy, it can also create the table. When not performing a bulk copy, the destination can append data to or overwrite data in an existing table.
When you configure the Azure SQL destination, you specify the database URL, database name, and credentials. You can also define any additional JDBC configuration properties you want to use.
You specify the table to write to and whether to perform a bulk copy. When performing a bulk copy, you define related properties including the batch size and timeout. You can optionally configure additional bulk copy properties.
When not performing a bulk copy, you specify the mode the destination uses for writing to an existing table.
Before you use the Azure SQL destination, install a JDBC driver.
Installing the Microsoft JDBC Driver
Before using the Azure SQL destination, you must install a Microsoft JDBC driver for SQL Server 6.2 or later. Install the driver as an external library for the JDBC stage library.
Transformer bundles a JDBC driver into the launched Spark application so that the driver is available on each node in the cluster.
Partitioning
Spark runs a Transformer pipeline just as it runs any other application, splitting the data into partitions and performing operations on the partitions in parallel.
When the pipeline starts processing a new batch, Spark determines how to split pipeline data into initial partitions based on the origins in the pipeline. Spark uses these partitions for the rest of the pipeline processing, unless a processor causes Spark to shuffle the data.
When writing to a database table, Spark creates one connection to the database for each partition.
Write Mode
The Azure SQL destination uses the write mode to determine how to write to an existing table. You can configure a write mode when the destination is not configured to perform a bulk copy.
- Append rows to existing table
- The destination appends rows to the existing table.
- Overwrite existing table
- The destination removes all rows in the existing table before writing new
rows to the table. When you select overwrite mode, you also configure how
the destination removes rows from the table:
- Truncate all rows in the existing database table, and then write data to the table.
- Drop the existing table, recreate the table, and then write data to the table.
Configuring an Azure SQL Destination
-
On the Properties panel, on the General tab, configure the
following properties:
General Property Description Name Stage name. Description Optional description. -
On the Connection tab, configure the following
properties;
Connection Property Description URL URL for the database. Use the following format: <serverName>.database.windows.net:<port>
For example,
sales-sql-db-server.database.windows.net:1433
Database Name Name of the database to write to. User Database user name. Password Password. Additional JDBC Configuration Properties Additional JDBC configuration properties to use. To add properties, click Add and define the JDBC property name and value. You can use simple or bulk edit mode to configure the properties.
Use the property names and values as expected by JDBC.
-
On the Table tab, configure the following
properties:
Table Property Description Table Name of the table to write to. When bulk copy and create table are enabled, the destination creates a table using the specified name. Otherwise, the table must exist before you start the pipeline.
Enable Bulk Copy Performs a bulk copy of the data. For more information about bulk copy, see the Azure documentation.
Auto Create Table Creates a table before writing to it. Available when performing a bulk copy.
Batch Size Batch size to use when writing the data. Default is 1000 rows. Available when performing a bulk copy.
Lock Table Locks the table before writing the data, preventing other systems from writing to it. Use to improve performance for batches over 1000 rows. Available when performing a bulk copy.
Timeout Seconds to allow for the copy before timing out. Available when performing a bulk copy.
Additional Bulk Copy Configuration Properties Additional bulk copy properties to use. Available when performing a bulk copy.
Write Mode Mode to use to write to an existing table: - Overwrite existing table
- Append rows to existing table
Available when you do not enable bulk copy.
Remove Rows Determines how the destination removes rows from an existing table: - Truncate Table - Truncates all rows in the existing database table before writing to the table.
- Drop and Recreate Table - Drops the existing table and then recreates the table before writing to the table.
Available when the destination overwrites an existing table.
By default, the destination truncates the table.