Azure SQL

The Azure SQL destination writes data to a table in Azure SQL Database, Azure Synapse SQL Pool, or Microsoft SQL Server version 2008 or later.

The destination can perform a bulk copy when writing to the table. When performing a bulk copy, the destination creates the table when needed. When you use a write mode to append or overwrite data instead, the target table must exist before the pipeline starts.

When you configure the Azure SQL destination, you specify the database URL, database name, and credentials. You can also define any additional JDBC configuration properties that you want to use.

You specify the table to write to and whether to perform a bulk copy. When performing a bulk copy, you define related properties including the batch size, timeout, and reliability and isolation levels. You can optionally configure additional bulk copy properties.

When not performing a bulk copy, you specify the write mode for writing to an existing table.

Transformer includes a Microsoft JDBC driver for SQL Server with the destination. The destination uses Microsoft JDBC driver for SQL Server version 8 or later.

Partitioning

Spark runs a Transformer pipeline just as it runs any other application, splitting the data into partitions and performing operations on the partitions in parallel.

When the pipeline starts processing a new batch, Spark determines how to split pipeline data into initial partitions based on the origins in the pipeline. Spark uses these partitions for the rest of the pipeline processing, unless a processor causes Spark to shuffle the data.

When writing to a database table, Spark creates one connection to the database for each partition.

Write Mode

The Azure SQL destination uses the write mode to determine how to write to an existing table. You can configure a write mode when the destination is not configured to perform a bulk copy.

You can use one of the following write modes:
Append rows to existing table
The destination appends rows to the existing table.
Use append mode only when each new row has a unique primary key. If a row with the same primary key exists in the table, the pipeline fails with a primary key violation.
Overwrite existing table
The destination removes all rows in the existing table before writing new rows to the table. When you select overwrite mode, you also configure how the destination removes rows from the table:
  • Truncate all rows in the existing database table, and then write data to the table.
  • Drop the existing table, recreate the table, and then write data to the table.

Configuring an Azure SQL Destination

Configure an Azure SQL destination to write data to Azure.
  1. On the Properties panel, on the General tab, configure the following properties:
    General Property Description
    Name Stage name.
    Description Optional description.
    Stage Library Stage library to use. Select the Azure SQL stage library installed on the cluster:
    • Azure SQL for Spark 3.0.x
    • Azure SQL for Spark 3.1.x
    • Azure SQL for Spark 3.3.x

    Available when creating the pipeline with Transformer prebuilt with Scala 2.12.

  2. On the Connection tab, configure the following properties:
    Connection Property Description
    URL URL for the database. Use the following format:
    <serverName>.database.windows.net:<port>

    For example, sales-sql-db-server.database.windows.net:1433

    Database Name Name of the database to write to.
    User Database user name.
    Tip: To secure sensitive information, you can use credential stores or runtime resources.
    Password Password.
    Tip: To secure sensitive information, you can use credential stores or runtime resources.
    Additional JDBC Configuration Properties Additional JDBC configuration properties to use.

    To add properties, click Add and define the JDBC property name and value. You can use simple or bulk edit mode to configure the properties.

    Use the property names and values as expected by JDBC.

  3. On the Table tab, configure the following properties:
    Table Property Description
    Table Name of the table to write to.

    When bulk copy is enabled, the destination creates a table using the specified name. Otherwise, the table must exist before you start the pipeline.

    Enable Bulk Copy Performs a bulk copy of the data.

    For more information about bulk copy, see the Azure documentation.

    Batch Size Batch size to use when writing the data. Default is 1000 rows.

    Available when performing a bulk copy.

    Lock Table Locks the table before writing the data, preventing other systems from writing to it. Use to improve performance for batches over 1000 rows.

    Available when performing a bulk copy.

    Timeout Seconds to allow for the copy before timing out.

    Available when performing a bulk copy.

    Reliability Level Reliability level to use for the write:
    • Best Effort - Data is written directly to the target table, which is not idempotent and can result in duplicate records.
    • No Duplicates - Data is written using global temporary staging tables, which ensures that no duplicates are written.
    Note: Microsoft temporary table limitations may prevent using the No Duplicates option when writing to Azure. When this occurs, the destination uses Best Effort. For more information, see the Microsoft documentation.

    Available when performing a bulk copy.

    Isolation Level Isolation level to use for the write. Determines the consistency and concurrency levels for multiple transactions:
    • Read Uncommitted
    • Read Committed
    • Repeatable Read
    • Serializable
    • Snapshot

    For more information about isolation levels, see the Microsoft SQL documentation. Available when performing a bulk copy.

    Schema Check Enables a strict schema check between the dataframes and the target table.

    The check verifies that the number of columns, as well as the column names and types in the dataframe, match those in the target table.

    Available when performing a bulk copy.

    Additional Bulk Copy Configuration Properties Additional bulk copy properties to use.

    Available when performing a bulk copy.

    Write Mode Mode to use to write to an existing table:
    • Overwrite existing table
    • Append rows to existing table

    Available when you do not enable bulk copy.

    Remove Rows Determines how the destination removes rows from an existing table:
    • Truncate Table - Truncates all rows in the existing database table before writing to the table.
    • Drop and Recreate Table - Drops the existing table and then recreates the table before writing to the table.

    Available when the destination overwrites an existing table.

    By default, the destination truncates the table.