Azure SQL

The Azure SQL destination writes data to a table in Azure SQL Database, Azure Synapse SQL Pool, or Microsoft SQL Server version 2008 or later.

The destination can create tables as needed. It can also perform a bulk copy when writing to a table.

When you configure the Azure SQL destination, you specify the database URL, database name, and credentials. You can also define any additional JDBC configuration properties that you want to use.

You specify the table to write to and whether to perform a bulk copy. When performing a bulk copy, you define related properties including the batch size, timeout, and reliability and isolation levels. You can optionally configure additional bulk copy properties.

When not performing a bulk copy, you specify the write mode for writing to the table.

Transformer includes a Microsoft JDBC driver for SQL Server with the destination. The destination uses Microsoft JDBC driver for SQL Server version 8 or later.

Partitioning

Spark runs a Transformer pipeline just as it runs any other application, splitting the data into partitions and performing operations on the partitions in parallel.

When the pipeline starts processing a new batch, Spark determines how to split pipeline data into initial partitions based on the origins in the pipeline. Spark uses these partitions for the rest of the pipeline processing, unless a processor causes Spark to shuffle the data.

When writing to a database table, Spark creates one connection to the database for each partition.

Write Mode

The Azure SQL destination uses the write mode to determine how to write to a table. You can configure a write mode when the destination is not configured to perform a bulk copy.

When the Azure SQL destination is configured to write to a table that doesn't exist, the destination creates the table in the specified database. The table is created based on the first batch of data that the destination processes when you start the pipeline, and that first batch is inserted to the new table.

The write mode defines how the destination writes to an existing table, including the second batch to a table that the destination created.

You can use one of the following write modes with the Azure SQL destination:
Append to table
The destination appends rows to the table.
Use append mode only when each new row has a unique primary key. If a row with the same primary key exists in the table, the pipeline fails with a primary key violation.
Overwrite table
The destination removes all rows in the table before writing new rows to the table. When you select overwrite mode, you also configure how the destination removes rows from the table:
  • Truncates all rows in the table before the write
  • Drops and recreates the table before the write.

Configuring an Azure SQL Destination

Configure an Azure SQL destination to write data to Azure.
  1. On the Properties panel, on the General tab, configure the following properties:
    General Property Description
    Name Stage name.
    Description Optional description.
    Stage Library Stage library to use. Select the Azure SQL stage library installed on the cluster:
    • Azure SQL for Spark 3.0.x
    • Azure SQL for Spark 3.1.x
    • Azure SQL for Spark 3.3.x

    Available when creating the pipeline with Transformer prebuilt with Scala 2.12.

  2. On the Connection tab, configure the following properties:
    Connection Property Description
    URL URL for the database. Use the following format:
    <serverName>.database.windows.net:<port>

    For example, sales-sql-db-server.database.windows.net:1433

    Database Name Name of the database to write to.
    User Database user name.
    Tip: To secure sensitive information, you can use credential stores or runtime resources.
    Password Password.
    Tip: To secure sensitive information, you can use credential stores or runtime resources.
    Additional JDBC Configuration Properties Additional JDBC configuration properties to use.

    To add properties, click Add and define the JDBC property name and value. You can use simple or bulk edit mode to configure the properties.

    Use the property names and values as expected by JDBC.

  3. On the Table tab, configure the following properties:
    Table Property Description
    Table Name of the database table to write to. If the table does not exist, the destination creates the table.
    Enable Bulk Copy Performs a bulk copy of the data.

    For more information about bulk copy, see the Azure documentation.

    Batch Size Batch size to use when writing the data. Default is 1000 rows.

    Available when performing a bulk copy.

    Lock Table Locks the table before writing the data, preventing other systems from writing to it. Use to improve performance for batches over 1000 rows.

    Available when performing a bulk copy.

    Timeout Seconds to allow for the copy before timing out.

    Available when performing a bulk copy.

    Reliability Level Reliability level to use for the write:
    • Best Effort - Data is written directly to the target table, which is not idempotent and can result in duplicate records.
    • No Duplicates - Data is written using global temporary staging tables, which ensures that no duplicates are written.
    Note: Microsoft temporary table limitations may prevent using the No Duplicates option when writing to Azure. When this occurs, the destination uses Best Effort. For more information, see the Microsoft documentation.

    Available when performing a bulk copy.

    Isolation Level Isolation level to use for the write. Determines the consistency and concurrency levels for multiple transactions:
    • Read Uncommitted
    • Read Committed
    • Repeatable Read
    • Serializable
    • Snapshot

    For more information about isolation levels, see the Microsoft SQL documentation. Available when performing a bulk copy.

    Schema Check Enables a strict schema check between the dataframes and the target table.

    The check verifies that the number of columns, as well as the column names and types in the dataframe, match those in the target table.

    Available when performing a bulk copy.

    Additional Bulk Copy Configuration Properties Additional bulk copy properties to use.

    Available when performing a bulk copy.

    Write Mode Mode to use to write to a table:
    • Overwrite table
    • Append to table

    Available when you do not enable bulk copy.

    Remove Rows Determines how the destination removes rows from a table:
    • Truncate table - Truncates all rows in the database table before writing to the table.
    • Drop and recreate table - Drops and recreates the table before writing to it.

    By default, the destination truncates the table.

    Available when the destination uses Overwrite write mode.