Comparing Snowflake and Other StreamSets Engines

For users already familiar with Control Hub and Data Collector or Transformer, here's how working with Transformer for Snowflake is similar... and different.

Transformer for Snowflake pipelines are configured on the Control Hub canvas, just like Data Collector and Transformer pipelines. The difference lies in the available functionality within the pipelines and how the pipelines run.

As described in How It Works, Transformer for Snowflake does not perform actual pipeline processing like Data Collector. Instead, Transformer for Snowflake follows the Transformer model. Just as Transformer passes pipeline configuration to Spark for processing, Transformer for Snowflake generates a SQL query based on pipeline configuration and passes the query to Snowflake for processing. This structural similarity explains how Transformer for Snowflake got its name.

With Transformer and Data Collector, you can use heterogeneous origins and destinations to read from and write to a wide range of systems. Transformer for Snowflake pipelines process Snowflake data – all origins and destinations read from and write to Snowflake.

However, many concepts and behaviors remain exactly the same. For example, you use origin, processor, and destination stages to define processing in all pipeline types. You create jobs to run pipelines. You can use runtime parameters in all StreamSets pipelines.

Here are some highlights of the similarities and differences between Transformer for Snowflake and Transformer and Data Collector:
Since you design and run Transformer for Snowflake in Control Hub, some basic concepts remain the same:
  • Create pipelines in the pipeline canvas.
  • Preview pipelines to help with pipeline development. For more information, see the Control Hub documentation.
  • Use origin, processor, and destination stages to design the pipeline data flow.
  • Processors with the same names as Transformer and Data Collector stages probably do what you expect at a high level, but might have subtle differences or additional features since the processing occurs in Snowflake.

    For example, Transformer for Snowflake supports using the Snowflake SQL query language for data processing. See stage documentation for details, such as the Filter processor.

  • Like Transformer, you use the StreamSets expression language in properties that are evaluated only once, before pipeline processing begins, such as runtime parameters in pipeline properties.

  • Create jobs to run pipelines.
  • Use Control Hub team-based features, such as version control and user management.
Unlike Transformer and Data Collector:
  • Transformer for Snowflake is hosted on the StreamSets platform. As a result...
    • You do not need to set up an environment or deployment. You can simply go to the Pipelines view and create a pipeline.
    • You do not need to configure or maintain Snowflake engines.
    • When you create a pipeline or job, you don't need to associate it with a specific Snowflake engine.
  • Transformer for Snowflake includes stages based on Snowflake functionality, such as the Cube processor to apply the Group by Cube command.
  • Transformer for Snowflake uses the terms "column" and "row" to align with Snowflake terminology. Transformer and Data Collector use the terms "field" and "record" to refer to the same concepts.
  • You can monitor Snowflake jobs as you would any other Control Hub job. However, the Snowflake job summary displays the following different information:
    • Input and output row count

      You cannot view an error row count, row throughput, or runtime statistics as you can for other Control Hub jobs.

    • Log messages
    • Snowflake queries run for the job