Hosted or Deployed Transformer for Snowflake Engine
Most organizations use the Transformer for Snowflake engine hosted and managed by IBM. Using the hosted engine is the easiest way to work with Transformer for Snowflake.
When needed, you can deploy a Transformer for Snowflake engine to your private network, which can be on-premises or on a protected cloud computing platform such as AWS. You might need to deploy a Transformer for Snowflake engine, rather than using the hosted engine, due to company policies or security requirements.
Deploying a Transformer for Snowflake engine requires that your organization have the appropriate account agreement. For more information about your account agreement, contact your IBM StreamSets account team.
Most functionality available with the hosted engine and the deployed engine is exactly the same. For example, you configure pipelines on the same canvas and then create and run jobs to run published pipelines. When you run a job, Transformer for Snowflake generates a SQL query based on your pipeline configuration and passes the query to Snowflake for execution. Since Snowflake performs the work, all data processing occurs within Snowflake. With both the hosted and deployed engine, your data never leaves Snowflake as a job runs a pipeline.
The difference lies in where the engine runs - either the hosted public cloud service or a private network that you manage.
Here's a summary of the differences between hosted and deployed Transformer for Snowflake engines:
Category | Hosted Engine | Deployed Engine |
---|---|---|
Control Hub environments and deployments | Not applicable. | Configure a Control Hub
environment and deployment to deploy a Transformer for Snowflake engine to your private network. A deployment is the primary unit of tenancy in IBM StreamSets. When multiple groups use the same environment, you can restrict access to deployment resources by creating different deployments for each group in the environment and assigning the groups appropriate permissions. |
Engine management | IBM manages the Transformer for Snowflake engine for you. You cannot view any details or perform any actions on
the hosted engine. The hosted engine is shared across organizations. Your data is only accessible by your organization. The logs for the hosted engine include activity from multiple organizations and are accessible only by IBM. |
You manage the Transformer for Snowflake engine in your own private network. You can stop, start, monitor, and
view the logs of the deployed engine. The deployed engine is only accessible by your organization. The logs for the deployed engine include only activity for your organization and are accessible within your own infrastructure. |
Connection information | Default connection information such as the warehouse, database, or
schema to use, is securely stored in your IBM StreamSets account. You can override this information in individual pipelines as needed. |
You must create one or more Snowflake
connections to specify the connection information to use.
Then, in pipelines, you select the appropriate Snowflake connection.
You can override the following details in individual pipelines: role, warehouse, database, and schema. |
Authentication methods | Supports the following authentication methods, which you configure in
the pipeline properties:
|
Supports the following authentication methods, which you configure in
a Snowflake connection:
|
Snowflake credentials | Enter the credentials directly in the Control Hub user interface (UI). Snowflake credentials are validated and securely stored in your IBM StreamSets account. You cannot view the existing Snowflake password or private key through the Control Hub UI. You can only view those values as you enter them. |
As a best practice, configure the deployed engine to use the AWS credential store to securely retrieve Snowflake credentials from AWS Secrets Manager. |
Pipeline design | Design pipelines using the latest Transformer for Snowflake release. With each new release, all existing pipelines are automatically updated to use the latest new features. |
Design pipelines using the selected authoring Transformer for Snowflake version. The authoring engine version determines the stages and
functionality that display in the pipeline canvas. To use features available in a newer release, you must upgrade the engine. |
Pipeline preview | When you preview data in a pipeline, Snowflake data passes through
encrypted connections beyond your own network into Control Hub. You can optionally disable data preview for your organization if your company policies or best practices prohibit data from leaving your own network. |
When you preview data in a pipeline, Snowflake data passes through
encrypted connections beyond your own network into Control Hub. You can optionally change the default engine communication method from WebSocket tunneling to direct engine REST APIs if your company policies or best practices prohibit data from leaving your own network. |
High availability | IBM StreamSets automatically handles any failover scenarios for you. As such, failover properties are omitted from Transformer for Snowflake jobs. | Deploy multiple engines so that you have available backup engines in
case of pipeline failover due to an unexpected engine shutdown. When you configure a Transformer for Snowflake job, you configure failover properties. |
Communication during a job run | When you run a job, the Control Hub passes the query to your Snowflake account. | When you run a job, the Transformer for Snowflake engine deployed to your private network passes the query to your
Snowflake account. For example, if you use AWS PrivateLink to directly connect your Snowflake account to an AWS VPC, you can deploy the Transformer for Snowflake engine to the same AWS VPC. This ensures that IBM StreamSets communications to your Snowflake account occur inside your own network. |
For information about hosted and deployed details in the Transformer for Snowflake documentation, see the Transformer for Snowflake documentation.