Comparing Snowflake and Other StreamSets Engines

For users already familiar with Control Hub and Data Collector or Transformer, here's how working with Transformer for Snowflake is similar... and different.

Transformer for Snowflake pipelines are configured on the Control Hub canvas, just like Data Collector and Transformer pipelines. The difference lies in the available functionality within the pipelines and how the pipelines run.

As described in How It Works, Transformer for Snowflake does not perform actual pipeline processing like Data Collector. Instead, Transformer for Snowflake follows the Transformer model. Just as Transformer passes pipeline configuration to Spark for processing, Transformer for Snowflake generates a SQL query based on pipeline configuration and passes the query to Snowflake for processing. This structural similarity explains how Transformer for Snowflake got its name.

With Transformer and Data Collector, you can use heterogeneous origins and destinations to read from and write to a wide range of systems. Transformer for Snowflake pipelines process Snowflake data – all origins and destinations read from and write to Snowflake.

However, many concepts and behaviors remain exactly the same. For example, you use origin, processor, and destination stages to define processing in all pipeline types. You create jobs to run pipelines. You can use runtime parameters in all StreamSets pipelines.

Here are some highlights of the similarities and differences between Transformer for Snowflake and Transformer and Data Collector:
Similarities
Since you design and run Transformer for Snowflake in Control Hub, some basic concepts remain the same:
Differences
Unlike Transformer and Data Collector:
  • Transformer for Snowflake is hosted on the StreamSets platform. As a result...
    • You do not need to set up an environment or deployment. You can simply go to the Pipelines view and create a pipeline.
    • You do not need to configure or maintain Snowflake engines.
    • When you create a pipeline or job, you don't need to associate it with a specific Snowflake engine.
  • Transformer for Snowflake includes stages based on Snowflake functionality, such as the Cube processor to apply the Group by Cube command.
  • Transformer for Snowflake uses the terms "column" and "row" to align with Snowflake terminology. Transformer and Data Collector use the terms "field" and "record" to refer to the same concepts.
  • Like Data Collector, Transformer for Snowflake includes executor stages to perform tasks, such as sending an email notification.

    Transformer for Snowflake executors perform tasks using Snowflake integrations after all pipeline writes complete, when triggered by the data. These executors can be placed anywhere in the data flow.

    Data Collector executors expect to be triggered by special event records, which are only generated by certain stages. These executors should be placed downstream from event-generating stages.

  • You can monitor Snowflake jobs as you would any other Control Hub job. However, the Snowflake job summary displays the following different information:
    • Input and output row count

      You cannot view an error row count, row throughput, or runtime statistics as you can for other Control Hub jobs.

    • Log messages
    • Snowflake queries run for the job