Snowflake Lookup

The Snowflake Lookup processor performs a lookup on a Snowflake table. The processor can return the first matching row, all matching rows, a count of matching rows, or a boolean value that indicates whether a match was found. When reading data from Snowflake, the processor stages the data in an internal stage.

When you configure the Snowflake Lookup processor, you define the Snowflake region, database, table, and schema to use. You specify the user account and password to use. You can optionally specify a custom role that overrides the default role for the user account. The user account or the custom role must have the required Snowflake privileges.

You can also use a connection to configure the processor.

You configure the record field to use and the table column to match against. You also specify the operator to use. You select the information to return, then configure related properties.

When returning one or more records, you specify the columns to return and optionally define a prefix for the resulting field names to prevent adding duplicate fields to the record. You can specify columns to sort by and the sort order. When returning multiple rows, you can specify a maximum number of rows to return.

When returning a count or boolean value, you define a name for the field to contain the results. If the field does not exist, the processor creates it.

If the lookup table is static, you can configure the processor to load the table only once, enabling the processor to cache and reuse the data for the duration of the pipeline run.

If not loading only once, and if the processor passes data to multiple stages, you might enable caching to improve pipeline performance.

You can optionally enable pushdown optimization and configure additional Snowflake properties.

Note: When the pipeline runs on a Databricks cluster, use Databricks runtime 6.1 or above for optimal compatibility and pushdown optimization.

Required Privileges and Custom Roles

The Snowflake Lookup processor requires a Snowflake role that grants the following privileges:


Object	Privilege
Internal Snowflake Stage	READ
Table	SELECT

If you create a custom role with the required privileges, define the role that the stage uses in one of the following ways:

Assign the custom role as the default role: In Snowflake, assign the custom role as the default role for the Snowflake user account specified in the stage. A Snowflake user account is associated with a single default role.
Override the default role with the custom role: In the stage, use the Role property to specify the name of the custom role. The custom role overrides the role assigned to the Snowflake user account specified in the stage.; For example, you might define custom roles in Snowflake for specific data sources, and then specify one of the roles when configuring a Snowflake stage.
Use a Snowflake connection: When working with Control Hub, you can configure a Snowflake connection to provide connection details for Snowflake stages.; When a Snowflake stage is configured to use a Snowflake connection, the stage uses the role associated with the connection. You cannot define the role to use in stage properties. For more information, see Using Custom Roles.

Pushdown Optimization

The Snowflake Lookup processor can perform pushdown optimization. When you enable pushdown, the origin pushes all possible processing to the Snowflake database, which can improve performance, especially for large data sets.

You can enable pushdown in the Snowflake Lookup processor independently from ludicrous mode. However, using it in conjunction with Ludicrous mode should provide best results. For details on the Spark SQL operators that can be pushed down to Snowflake, see the Snowflake documentation.

Use the Enable Pushdown property on the Connection tab to enable pushdown for the Snowflake Lookup processor.

Configuring a Snowflake Lookup Processor

Configure a Snowflake Lookup processor to perform a lookup on a Snowflake table.

In the Properties panel, on the General tab, configure the following properties:


General Property	Description
Name	Stage name.
Description	Optional description.

On the Lookup tab, configure the following properties:


Lookup Property	Description
Lookup Behavior	Lookup task to perform: Return the first matching row Return all matching rows, generating a record for each Return a count of matching rows Return true if matches exist, otherwise false
Lookup Keys	Lookup keys to use to find matching records. Specify the following information: Lookup Field - Field in the record to use for the lookup. Lookup Column - Column in the table to use for the lookup. Operator - Spark SQL operator to use for the lookup. Record field names must match the lookup key column names exactly. This property is case sensitive. Click the Add icon to configure additional keys. You can use simple or bulk edit mode to configure the keys.
Return Columns	Columns in the table to return and include in the matching record. Click the Add icon to specify additional columns. You can use simple or bulk edit mode to configure the columns. This property is case sensitive. Available when Lookup Behavior is set to return the first matching row or all matching rows.
Add Prefix to Column Names	Adds the specified prefix to the returned columns before adding fields to records. Use when the original record might include a field with the same name. Using this property is strongly recommended. Available when Lookup Behavior is set to return the first matching row or all matching rows.
Prefix	Prefix to add to the returned columns. Available when Lookup Behavior is set to return the first matching row or all matching rows.
Sorting	Sorts the matching rows in the specified order to determine how records are generated. Specify the following information: Column to Sort - Column to use for the sort. Sort Order - Sort order to use: ascending or descending. Click the Add icon to add additional sort columns. You can use simple or bulk edit mode to configure the columns. Columns are sorted in the configured order. Available when Lookup Behavior is set to return the first matching row or all matching rows.
Max Rows	Maximum number of rows to return. The processor generates a record for each row. Available when Lookup Behavior is set to return all matching rows.
Target Field	Name of the field to store the results of the lookup. Available when Lookup behavior is set to perform a count of matching rows or to return true if a match exists.
Load Lookup Data Only Once	Reads the lookup table in a single batch and caches the results for reuse. Use when data in the lookup table is not expected to change. This property is ignored in batch execution mode.
Cache Data	Caches data processed for a batch so the data can be reused for multiple downstream stages. Use to improve performance when the stage passes data to multiple stages. Caching can limit pushdown optimization when the pipeline runs in ludicrous mode.

On the Table tab, configure the following property:


Table Property	Description
Table	Table to perform lookups on.

On the Connection tab, configure the following properties.

Note: Snowflake JDBC driver versions 3.13.25 or higher convert underscores to hyphens, by default. When needed, you can bypass this behavior by setting the allowUnderscoresInHost driver property to true. For more information and alternate solutions, see this Snowflake community article.

The properties that display differ depending on whether you use a Control Hub connection to provide connection details.

When using a connection, configure the following properties:


Connection Property - Using Connection	Property
Connection	Connection that defines the information required to connect to an external system. To connect to an external system, you can select a connection that contains the details, or you can directly enter the details in the pipeline. When you select a connection, Control Hub hides other properties so that you cannot directly enter connection details in the pipeline.
Override Warehouse	Snowflake warehouse. If the selected Snowflake connection specifies a warehouse to use, this property overrides it. Available only when using a connection.
Override Database	Snowflake database. If the selected Snowflake connection specifies a database to use, this property overrides it. Available only when using a connection.
Override Schema	Snowflake schema. If the selected Snowflake connection specifies a schema to use, this property overrides it. Available only when using a connection.
Enable Pushdown	Pushes all possible processing to the Snowflake database, which can improve performance, especially for large data sets.

When not using a Control Hub connection, configure the following properties:


Connection Property - Not Using Connection	Description
Use Custom JDBC URL	Enables using a custom JDBC URL.
Custom JDBC URL	Custom JDBC URL to use.
Include Organization	Enables specifying the Snowflake organization.
Organization	Snowflake organization.
Snowflake Region	Region where the Snowflake warehouse is located. Select one of the following: An available Snowflake region. Other - Enables specifying a Snowflake region not listed in the property. Custom JDBC URL - Enables specifying a virtual private Snowflake installation. Available when Include Organization is disabled.
Custom Snowflake Region	Custom Snowflake region. Available when using Other as the Snowflake region.
Account	Snowflake account name.
Authentication Method	Authentication method to connect to Snowflake: User Credentials Key Pair Path - Use to specify the location of a private key file. Key Pair Content - Use to provide the contents of a private key file. OAuth None
User	Snowflake user name. The user account or the custom role that overrides the default role for this user account must have the required Snowflake privileges. For details, see Required Privileges and Custom Roles. Available when using an authentication method.
Password	Snowflake password. Required when using User Credentials authentication.
Private Key Path	Path to the private key file. Required when using Key Pair Path authentication.
Private Key Content	Contents of the private key file. Required when using Key Pair Content authentication.
Private Key Password	Optional password for the specified private key file or content. Use for encrypted private keys. Available when using Key Pair Path or Key Pair Content authentication.
OAuth Token	OAuth token to use. You can use a Snowflake security integration access token, refresh token, or code grant. Required when using OAuth authentication.
OAuth Client ID	Client ID for Snowflake security integration access tokens. Available when using OAuth authentication.
OAuth Client Secret	Client secret for Snowflake security integration access tokens. Available when using OAuth authentication.
OAuth Redirect URI	Redirect URI for Snowflake security integration access tokens. Available when using OAuth authentication.
Enable Pushdown	Pushes all possible processing to the Snowflake database, which can improve performance, especially for large data sets.
Role	Overrides the default role for the specified user account. The custom role must have the required Snowflake privileges.
Warehouse	Snowflake warehouse.
Database	Snowflake database.
Schema	Snowflake schema.
Use Private Link Snowflake URL	Enables using a private link URL. You can specify the URL in the Custom JDBC URL property above. Or, you can define the appropriate values for the Account property and either the Snowflake Region or Organization properties.
Connection Properties	Additional Snowflake Connection for Spark properties to use. For information about additional options, see the Snowflake documentation. To add properties, click the Add icon and define the property name and value. Specify the property names and values as expected by Snowflake. You can use simple or bulk edit mode to configure the properties. Tip: To secure sensitive information, you can use credential stores or runtime resources.

Optionally, on the Advanced tab, configure the following property:


Advanced Property	Description
Connection Pool	Maximum number of connections to Snowflake that the stage uses. Default is 4. Increasing this property can improve performance. However, Snowflake warns that setting this property to an arbitrarily high value can adversely affect performance. The default is the recommended value.