Performance Optimization

Use the following tips to optimize for performance and cost-effectiveness when using the Snowflake destination:

Increase the batch size
The maximum batch size is determined by the origin in the pipeline and typically has a default value of 1,000 records. To take advantage of Snowflake's bulk loading abilities, increase the maximum batch size in the pipeline origin to 20,000-50,000 records. Be sure to increase the Data Collector java heap sizejava heap size, as needed. For more information, see Java Heap Size in the Data Collector documentation.
Important: Increasing the batch size is strongly recommended. Using the default batch size can be slow and costly.
Configure pipeline runners to wait indefinitely when idle
With the default configuration, a pipeline runner generates an empty batch after waiting idly for 60 seconds. As a result, the destination continues to execute metadata queries against Snowflake, even though no data needs to be processed. To reduce Snowflake charges when a pipeline runner waits idly, set the Runner Idle Time pipeline property to -1. This configures pipeline runners to wait indefinitely when idle without generating empty batches, which allows Snowflake to pause processing.
Important: Configuring pipeline runners to wait indefinitely when idle is strongly recommended. Using the default pipeline runner idle time can result in unnecessary Snowflake resource consumption and runtime costs.
Use multiple threads
When writing to Snowflake using Snowpipe or the COPY command, you can use multiple threads to improve performance when you include a multithreaded origin in the pipeline. When Data Collector resources allow, using multiple threads enables processing multiple batches of data concurrently.
As with increasing the batch size, when using multiple threads, you should make sure that the Data Collector java heap sizejava heap size is sized appropriately. For more information, see Java Heap Size in the Data Collector documentation.
Note: Do not use multiple threads to write CDC data to Snowflake with the MERGE command. When using multiple threads to process data, the original order of the data is not retained.
Enable additional connections to Snowflake
When writing to multiple Snowflake tables using the COPY or MERGE commands, increase the number of connections that the Snowflake destination makes to Snowflake. Each additional connection allows the destination to write to an additional table, concurrently.
For example, when writing to 10 tables with only one connection, the destination can only write to one table at a time. With 5 connections, the destination can write to 5 tables at a time. 10 connections enables writing to all 10 tables at the same time.

By default, the destination uses one connection for standard single-threaded pipelines. In multithreaded pipelines, the destination matches the number of threads used by the pipeline. That is, when a multithreaded origin is configured to use up to 3 threads, then by default, the Snowflake destination uses 3 connections to write to Snowflake, one for each thread.

Note that the number of connections is for the entire pipeline, not for each thread. So when using multiple threads to write to multiple tables, you can also improve performance by allocating additional connections. For example, when using 3 threads to write to 3 tables, you might increase the number of connections to 9 for maximum throughput.

Use the Connection Pool Size property to specify the maximum number of connections that the Snowflake destination can use. Use this property when writing to Snowflake with the COPY or MERGE commands. Increasing the number of connections does not improve performance when using Snowpipe.