Performance Optimization
Use the following tips to optimize for performance and cost-effectiveness when using the Snowflake destination:
- Increase the batch size
- The maximum batch size is determined by the origin in the pipeline and typically has a default value of 1,000 records. To take advantage of Snowflake's bulk loading abilities, increase the maximum batch size in the pipeline origin to 20,000-50,000 records. Be sure to increase the Data Collector java heap sizejava heap size, as needed. For more information, see Java Heap Size in the Data Collector documentation.
- Configure pipeline runners to wait indefinitely when idle
- With the default configuration, a pipeline runner generates an empty batch
after waiting idly for 60 seconds. As a result, the destination continues to
execute metadata queries against Snowflake, even though no data needs to be
processed. To reduce Snowflake charges when a pipeline runner waits idly,
set the Runner Idle Time pipeline property to -1. This configures pipeline
runners to wait indefinitely when idle without generating empty batches,
which allows Snowflake to pause processing.Important: Configuring pipeline runners to wait indefinitely when idle is strongly recommended. Using the default pipeline runner idle time can result in unnecessary Snowflake resource consumption and runtime costs.
- Use multiple threads
- When writing to Snowflake using Snowpipe or the COPY command, you can use multiple threads to improve performance when you include a multithreaded origin in the pipeline. When Data Collector resources allow, using multiple threads enables processing multiple batches of data concurrently.
- Enable additional connections to Snowflake
- When writing to multiple Snowflake tables using the COPY or MERGE commands, increase the number of connections that the Snowflake destination makes to Snowflake. Each additional connection allows the destination to write to an additional table, concurrently.