Snowflake
The Snowflake destination writes data to Snowflake. You can use the Snowflake destination to write to any accessible Snowflake database, including those hosted on Amazon S3, Microsoft Azure, and private Snowflake installations.
The Snowflake destination stages data to an internal table before writing it to Snowflake.
When you configure the destination, you specify the Snowflake region, database, table, and schema to use. You also specify the user account and password. You can optionally specify a custom role that overrides the default role for the user account. The user account or the custom role must have the required Snowflake privileges.
You can also use a connection to configure the destination.
You specify the write mode to use: overwrite, append, or merge. When appending data, the destination creates the table if it does not exist. To merge data, you configure additional merge properties.
You specify how to map record fields to table columns and select the behavior for data type mismatches. You can configure the origin to preserve existing capitalization for column names. You can also specify the number of connections to use and configure additional Snowflake properties.
Required Privileges and Custom Roles
The Snowflake destination requires a Snowflake role that grants the following privileges:
Object | Privilege |
---|---|
Schema | CREATE TABLE |
Table | SELECT, INSERT |
- Assign the custom role as the default role
- In Snowflake, assign the custom role as the default role for the Snowflake user account specified in the stage. A Snowflake user account is associated with a single default role.
- Override the default role with the custom role
- In the stage, use the Role property to specify the name of the custom role. The custom role overrides the role assigned to the Snowflake user account specified in the stage.
- Use a Snowflake connection
- When working with Control Hub, you can configure a Snowflake connection to provide connection details for Snowflake stages.
Write Mode
- Overwrite existing table
- If the table exists, the destination drops and recreates the table. Then, the destination inserts all data to the table.
- Append rows to existing table or create table if none exists
- If the table exists, the destination appends data to the table. If the table does not exist, the destination creates the table.
- Merge rows to existing table
- The destination merges data with data in an existing table. The destination performs inserts, updates, and deletes based on the specified merge properties.
Merge Properties
- Join Key
- One or more key columns in the table. Used to perform updates and deletes and to ensure that duplicate rows do not exist for inserts. Pipeline records must include a matching field name.
- When Clause
- Action that the destination
performs when a record meets the specified conditions. You can specify
multiple merge configurations for the destination to perform. Important: The destination performs the writes in the specified order. Best practice is to list merge configurations with the smallest number of affected records first, progressing to the largest number of affected records. When defining multiple merge configurations of the same type, carefully consider the order that you use.
Snowflake Data Types
When writing to Snowflake, the Snowflake destination converts the Spark data types used in the pipeline to Snowflake data types. The following table describes how this conversion occurs.
Spark Data Type | Snowflake Data Type |
---|---|
ArrayType | VARIANT |
BooleanType | BOOLEAN |
ByteType | INTEGER |
DateType | DATE |
DecimalType | DECIMAL |
DoubleType | DOUBLE |
FloatType | FLOAT |
IntegerType | INTEGER |
LongType | INTEGER |
MapType | VARIANT |
ShortType | INTEGER |
StringType | VARCHAR(N) for specified lengths, otherwise VARCHAR |
StructType | VARIANT |
TimestampType | TIMESTAMP |
Configuring a Snowflake Destination
Configure the Snowflake destination to write data to Snowflake.
-
On the Properties panel, on the General tab, configure the
following properties:
General Property Description Name Stage name. Description Optional description. Stage Library Stage library to use to connect to Snowflake: - Snowflake cluster-provided libraries - The cluster
where the pipeline runs has Snowflake libraries
installed, and therefore has all of the necessary
libraries to run the pipeline.
Available only in Databricks pipelines.
- Snowflake Transformer-provided libraries - Transformer passes the necessary libraries with the pipeline
to enable running the pipeline.
Use when running the pipeline locally or when the cluster where the pipeline runs does not include the Snowflake libraries for Spark.
Select the appropriate Spark version for your cluster.
Note: When using additional Snowflake stages in the pipeline, ensure that they use the same stage library. - Snowflake cluster-provided libraries - The cluster
where the pipeline runs has Snowflake libraries
installed, and therefore has all of the necessary
libraries to run the pipeline.
-
On the Connection tab, use one of the following tables to
configure connection properties:
Note: Snowflake JDBC driver versions 3.13.25 or higher convert underscores to hyphens, by default. When needed, you can bypass this behavior by setting the
allowUnderscoresInHost
driver property totrue
. For more information and alternate solutions, see this Snowflake community article.The properties that display differ depending on whether you use a Control Hub connection to provide connection details.When using a connection, configure the following properties:Connection Property - Using Connection Property Connection Connection that defines the information required to connect to an external system. To connect to an external system, you can select a connection that contains the details, or you can directly enter the details in the pipeline. When you select a connection, Control Hub hides other properties so that you cannot directly enter connection details in the pipeline.
Override Warehouse Snowflake warehouse. If the selected Snowflake connection specifies a warehouse to use, this property overrides it.
Available only when using a connection.
Override Database Snowflake database. If the selected Snowflake connection specifies a database to use, this property overrides it.
Available only when using a connection.
Override Schema Snowflake schema. If the selected Snowflake connection specifies a schema to use, this property overrides it.
Available only when using a connection.
When not using a Control Hub connection, configure the following properties:Connection Property - Not Using Connection Description Use Custom JDBC URL Enables using a custom JDBC URL. Custom JDBC URL Custom JDBC URL to use. Include Organization Enables specifying the Snowflake organization. Organization Snowflake organization. Snowflake Region Region where the Snowflake warehouse is located. Select one of the following: - An available Snowflake region.
- Other - Enables specifying a Snowflake region not listed in the property.
- Custom JDBC URL - Enables specifying a virtual private Snowflake installation.
Available when Include Organization is disabled.
Custom Snowflake Region Custom Snowflake region. Available when using Other as the Snowflake region. Account Snowflake account name. Authentication Method Authentication method to connect to Snowflake:- User Credentials
- Key Pair Path - Use to specify the location of a private key file.
- Key Pair Content - Use to provide the contents of a private key file.
- OAuth
- None
User Snowflake user name. The user account or the custom role that overrides the default role for this user account must have the required Snowflake privileges.
For details, see Required Privileges and Custom Roles.
Available when using an authentication method.
Password Snowflake password. Required when using User Credentials authentication.
Private Key Path Path to the private key file. Required when using Key Pair Path authentication.
Private Key Content Contents of the private key file. Required when using Key Pair Content authentication.
Private Key Password Optional password for the specified private key file or content. Use for encrypted private keys. Available when using Key Pair Path or Key Pair Content authentication.
OAuth Token OAuth token to use. You can use a Snowflake security integration access token, refresh token, or code grant. Required when using OAuth authentication.
OAuth Client ID Client ID for Snowflake security integration access tokens. Available when using OAuth authentication.
OAuth Client Secret Client secret for Snowflake security integration access tokens. Available when using OAuth authentication.
OAuth Redirect URI Redirect URI for Snowflake security integration access tokens. Available when using OAuth authentication.
Role Overrides the default role for the specified user account. The custom role must have the required Snowflake privileges.
Warehouse Snowflake warehouse. Database Snowflake database. Schema Snowflake schema. Use Private Link Snowflake URL Enables using a private link URL. You can specify the URL in the Custom JDBC URL property above. Or, you can define the appropriate values for the Account property and either the Snowflake Region or Organization properties.
Connection Properties Additional Snowflake Connection for Spark properties to use. For information about additional options, see the Snowflake documentation. To add properties, click the Add icon and define the property name and value. Specify the property names and values as expected by Snowflake.
You can use simple or bulk edit mode to configure the properties.
-
On the Table tab, configure the following
properties:
Table Property Description Table Name of the Snowflake table to write to. Keep Column Case Preserves the case used in field names when writing data to Snowflake. When not selected, field names are converted to all caps when writing to Snowflake, unless the names are enclosed in quotation marks.
Column Mapping Mode Mode to map record fields to Snowflake table columns: - By Order - Record fields map to table columns based on order.
- By Name - Record fields map to table columns based on names. The mapping is not case-sensitive.
Mismatch Behavior Action to take when field data is the wrong data type for the mapped table column: - Error - Generates an error that stops the pipeline.
- Ignore - Drops the invalid data from the record and inserts a null value in the table column.
Write Mode Write mode for the stage: - Overwrite existing table - If the table exists, the destination drops and recreates the table. Then, the destination inserts all data to the table.
- Append rows to existing table or create table if
none exists - If the table exists, the destination appends data to the
table. If the table does not exist, the destination creates the
table.
Append data to an existing table only when each record has a unique primary key. If a row with the same primary key exists in the table, the pipeline fails with a primary key violation.
- Merge rows to existing table - The destination merges data with data in an existing table. The destination performs inserts, updates, and deletes based on the specified merge properties.
Join Key One or more key columns in the table. Used to perform updates and deletes and to ensure that duplicate rows do not exist for inserts. Pipeline records must include a matching field name. Merge Configuration Action that the destination performs when a record meets the specified conditions. You can specify multiple merge configurations for the destination to perform. Important: The destination performs the writes in the specified order. Best practice is to list merge configurations with the smallest number of affected records first, progressing to the largest number of affected records. When defining multiple merge configurations of the same type, carefully consider the order that you use.Configure the following properties:- When Clause - Determines when an action is
performed. Select one of the following options:
- When Matched - Use with the Update and Delete operations. Updates or deletes a row upon finding a matching join key.
- When Not Matched - Use with the Insert operation. Inserts a row when no matching join key exists in the table.
- When Not Matched by Source - Do not use. Snowflake does not support this option.
- Additional Condition - An optional additional
condition for the merge. Use
$input.
to prefix field names in records, and$target.
to prefix column names in the table. - Operation - The operation to perform: Insert, Update, or Delete. The operation that you can use is dependent on the When Clause option that you select.
Click the Add icon to specify another merge clause.
Available with the Merge Rows to Existing Table write mode.
-
Optionally, on the Advanced tab, configure the following
property:
Advanced Property Description Connection Pool Maximum number of connections to Snowflake that the stage uses. Default is 4.
Increasing this property can improve performance. However, Snowflake warns that setting this property to an arbitrarily high value can adversely affect performance. The default is the recommended value.