Snowflake File Uploader

The Snowflake File Uploader destination writes whole files to an internal Snowflake stage. Use this destination with a whole file origin and the Snowflake executor. For details, see Snowflake File Uploader and Executor Pipelines in the Snowflake executor documentation.

Tip: To configure write details such as the data format or load method, use the Snowflake destination.

You can use the Snowflake File Uploader destination with any accessible Snowflake database, including those hosted on AWS, Microsoft Azure, and private Snowflake installations. For information about supported versions, see Supported Systems and Versions in the Data Collector documentation.

When you configure the Snowflake File Uploader destination, you specify Snowflake connection information, such as the Snowflake region and user account. You can optionally specify an organization name to use. You can also use a connection to configure the destination.

You define staging information, such as the database, schema, and name of the internal Snowflake stage to write to, as well as the number of threads to use. You can configure the destination to create the internal Snowflake stage, then specify the file format for the stage and related properties.

Before you use the Snowflake File Uploader destination, you must complete several prerequisite tasks.

You can configure the destination to generate events for another event stream. For more information about dataflow triggers and the event framework, see Dataflow Triggers Overview.

Prerequisites

Before you configure the Snowflake destination, complete the following prerequisites:
  1. Optionally, create an internal Snowflake stage.

    You can skip this step if you want to stage data on a Snowflake internal user stage or if you want the destination to create the stage.

  2. Assign required privileges.

Create an Internal Snowflake Stage (Optional)

The Snowflake File Uploader destination writes whole files to an internal Snowflake stage. You can either create the internal Snowflake stage before using the destination in a pipeline, or you can have the destination create the stage.

You can use an internal named, table, or user stage. User stages are created by default for each user. For steps on how to create a named or table stage, see CREATE STAGE in the Snowflake SQL command reference documentation.

You can use the default Snowflake configuration for all stages.

For more information about Snowflake stages, see the Snowflake documentation.

Assign Required Privileges

Writing data to Snowflake requires that the user specified in the stage or connection has the following Snowflake privileges for the internal Snowflake stage:
  • Read
  • Write
If you configure the destination to create the internal Snowflake stage, the user generally also needs following privileges:
  • Create Stage
  • Create File Format

The user does not need Create File Format when you use a custom Snowflake file format as the data format for the table.

Implementation Notes

The Snowflake File Uploader destination was designed to be used with a whole file origin and the Snowflake executor. For details on using these stages in a pipeline, see Snowflake File Uploader and Executor Pipelines in the Snowflake executor documentation.

Define a Role

The Snowflake File Uploader destination requires a Snowflake role that grants all privileges needed to load data using the configured load method. Each load method requires a different set of privileges.

Before configuring the destination, ensure that you have granted the required privileges to a Snowflake role, as explained in Assign Required Privileges.

If you create a custom role with the required privileges, define the role that the stage uses in one of the following ways:
Assign the custom role as the default role
In Snowflake, assign the custom role as the default role for the Snowflake user account specified in the stage. A Snowflake user account is associated with a single default role.
Override the default role with the custom role
In the stage, select Use Snowflake Role on the connection information tab. Then, specify the name of the custom role. The custom role overrides the default role assigned to the Snowflake user account specified in the stage.
For example, you might define custom roles in Snowflake for specific data sources, and then specify one of the roles when configuring a Snowflake stage.

Event Generation

The Snowflake File Uploader destination can generate events that you can use in an event stream. When you enable event generation, the destination generates event records each time the destination completes streaming a whole file.

Snowflake File Uploader events can be used in any logical way. For example:

For more information about dataflow triggers and the event framework, see Dataflow Triggers Overview.

Event Records

Snowflake File Uploader event records include the following event-related record header attributes. Record header attributes are stored as String values:
Record Header Attribute Description
sdc.event.type Event type. Uses the following type:
  • file-closed - Generated when the destination closes a file.
sdc.event.version Integer that indicates the version of the event record type.
sdc.event.creation_timestamp Epoch timestamp when the stage created the event.
The destination can generate the following event records:
File closure
The destination generates a file closure event record when it closes an output file.
File closure event records have the sdc.event.type record header attribute set to file-closed and include the following fields:
Field Description
filename Name of the closed file.
filepath Path to the closed file. This does not include the file name.
length Size of the closed file in bytes.

Configuring a Snowflake File Uploader Destination

Configure a Snowflake File Uploader destination to load whole files to an internal Snowflake stage. Use this destination with a whole file origin and the Snowflake executor. For details, see Snowflake File Uploader and Executor Pipelines in the Snowflake executor documentation.

Before you use the destination in a pipeline, complete the prerequisite tasks.

  1. In the Properties panel, on the General tab, configure the following properties:
    General Property Description
    Name Stage name.
    Description Optional description.
    Produce Events Generates event records when events occur.

    Enable this property to use the destination with the Snowflake executor or other event handling scenarios.

    Required Fields Fields that must include data for the record to be passed into the stage.
    Tip: You might include fields that the stage uses.

    Records that do not include all required fields are processed based on the error handling configured for the pipeline.

    Preconditions Conditions that must evaluate to TRUE to allow a record to enter the stage for processing. Click Add to create additional preconditions.

    Records that do not meet all preconditions are processed based on the error handling configured for the stage.

  2. On the Snowflake Connection Info tab, configure the following properties:
    Note: Snowflake JDBC driver versions 3.13.25 or higher convert underscores to hyphens, by default. When needed, you can bypass this behavior by setting the allowUnderscoresInHost driver property to true. For more information and alternate solutions, see this Snowflake community article.
    Snowflake Connection Property Description
    Connection Connection that defines the information required to connect to an external system.

    To connect to an external system, you can select a connection that contains the details, or you can directly enter the details in the pipeline. When you select a connection, Control Hub hides other properties so that you cannot directly enter connection details in the pipeline.

    Include Organization Enables specifying the Snowflake organization.
    Snowflake Region Region where the Snowflake warehouse is located. Select one of the following:
    • An available Snowflake region.
    • Other - Enables specifying a Snowflake region not listed in the property.
    • Custom JDBC URL - Enables specifying a virtual private Snowflake installation.

    Available when Include Organization is disabled.

    Custom Snowflake Region Custom Snowflake region. Available when using Other as the Snowflake region.
    Virtual Private Snowflake URL Custom JDBC URL to use when using a virtual private Snowflake installation.
    Organization Snowflake organization.
    Account Snowflake account name.
    User Snowflake user name.

    The user account or the custom role that overrides the default role for this user account must have the required Snowflake privileges.

    The required privileges depend on the load method that the destination uses. For details, see Assign Required Privileges.

    Password Snowflake password.
    Connection Pool Size Maximum number of connections that the destination uses for the write. The default, 0, ensures that the destination uses the same number of connections as threads used by the pipeline.
    Use Snowflake Role Overrides the default role for the specified user account.
    Snowflake Role Name Name of the custom Snowflake role that overrides the default role of the specified user account.

    The custom role must have the required Snowflake privileges. For details, see Prerequisites.

    Connection Properties Additional Snowflake connection properties to use.

    To add properties, click Add and define the property name and value. Use the property names and values as expected by Snowflake.

  3. On the Staging tab, configure the following properties:
    Staging Property Description
    Stage Database Database for the internal Snowflake stage.
    Stage Schema Schema for the internal Snowflake stage.
    Stage Internal Snowflake stage to write to.
    Upper Case Entity Names Changes the entity names specified in stage properties to all caps.

    Use when entity names in Snowflake, such as the names for schemas, tables, columns, or file formats, are in all caps.

    Auto-Create Stage Creates the internal Snowflake stage to write to.
    Data Format File format for the stage. Select one of the following options:
    • Avro
    • CSV
    • JSON
    • ORC
    • Parquet
    • XML
    • Use custom Snowflake file format

    Available when creating a stage.

    File Format Database Database containing the custom Snowflake file format.

    Available when using a custom Snowflake file format.

    File Format Schema Schema for the custom Snowflake file format.

    Available when using a custom Snowflake file format.

    File Format Name Name of the custom Snowflake file format.

    Available when using a custom Snowflake file format.

    Upload Threads Number of threads to use to upload files.
    Auto-Compress Uploaded Files Compresses files before uploading them to the internal Snowflake stage.
    Temporary Directory Path Temporary directory to use for processing.

    Used only when the pipeline origin does not read from a local directory. When not specified, the destination uses the default temporary directory on the Data Collector machine, typically /tmp.

  4. When creating the stage, you can configure the following advanced data properties on the Data Advanced tab:
    The properties that display on this tab differ based on the selected data format on the Staging tab.
    For Avro, ORC, or Parquet data, optionally configure the following properties:
    Avro/ORC/Parquet Property Description
    Trim Spaces Trims leading and trailing spaces from field data.
    Values Representing Null Sets of characters to be treated as null values in the data set.
    For CSV data, optionally configure the following properties:
    CSV Property Description
    Defaults from Snowflake File Format Allows the use of custom Snowflake CSV file formats. Should not be used unless recommended by StreamSets customer support.
    Column Separator Character to use as a column separator.
    Quoting Mode Mode for handling special characters in the data, such as the column separator and newline character:
    • Quoted - Encloses data in each field with the specified quote character.
      The following example uses asterisks to enclose the data in a field:
      *string data, more string data*
    • Escaped - Precedes a special character with the specified escape character.
      The following example uses a backtick to escape the comma column separator in a field:
      string data`, more string data
    Quote Character Character to enclose field data.

    Available when using Quoted mode.

    Escape Character Character to precede special characters in field data.

    Available when using Escape mode.

    Trim Spaces Trims leading and trailing spaces from field data.
    For JSON data, optionally configure the following properties:
    JSON Property Description
    Strip Outer Array Removes the outer JSON array before writing the data.
    Strip Null Values Removes null values before writing the data.
    Trim Spaces Trims leading and trailing spaces from field data.
    Values Representing Null Sets of characters to be treated as null values in the data set.
    For XML data, optionally configure the following properties:
    XML Property Description
    Preserve Spaces Preserves all spacing in the XML file. When not selected, all leading and trailing spaces are removed.
    Strip Outer Element Removes the outer element from the XML file.