Configuring a Pipeline

Configure a pipeline to define how data flows from source tables to target tables and how the data is processed along the way.

The steps to configure a pipeline differ depending on whether your organization uses the default hosted Transformer for Snowflake engine or deployed Transformer for Snowflake engines to run pipelines. Use the appropriate steps for your organization:

Configure a Pipeline (Hosted Engine)

If your organization uses the default hosted Transformer for Snowflake engine to run pipelines, use the following steps to configure a pipeline:
  1. In the Navigation panel, click Build > Pipelines, and then click the Add icon.
  2. In the Define Pipeline step of the pipeline wizard, enter a pipeline name and optional description.
  3. For Engine Type, select Transformer for Snowflake, and then click Next.
    Tip: To skip sharing the pipeline, click Save & Open in Canvas. You can share the pipeline with others at a later time.
  4. In the Share Pipeline step, optionally share the pipeline with other users or groups, in the Select Users and Groups column, type a user email address or a group name. Select users or groups from the list, and then click Add.
    The added users and groups display in the User / Group table.
    Modify permissions as needed. By default, each added user or group is granted both of the following permissions:
    • Read - Enables viewing the pipeline configuration details and pipeline version history, creating a job for the pipeline, and exporting the pipeline.
    • Write - Enables designing and publishing the pipeline, creating and removing tags for the pipeline, and deleting pipeline versions.
    For more information, see the Control Hub documentation.
  5. To configure the pipeline now, click Save & Open in Canvas.
    If you have not saved your Snowflake account URL or credentials to your StreamSets account, the Snowflake Credentials dialog box appears. Enter the following information to enable access to Snowflake. The information is validated, then stored in your StreamSets account for use with all of your subsequent connections.
    Snowflake Credentials Property Description
    Snowflake URL Default Snowflake URL to use. For example:

    https://<yourcompany>.snowflakecomputing.com/

    Username Snowflake user name.
    Authentication Method Authentication method to use: password or private key.
    Password Password for the Snowflake account.

    Available when using password authentication.

    Private Key Private key for the Snowflake account. Enter a PKCS#1 or PKCS#8 private key and include the key delimiters.

    For example, when entering a PKCS#8 private key, include the -----BEGIN PRIVATE KEY----- and -----END PRIVATE KEY----- key delimiters.

    Available when using private key authentication.

  6. In the canvas, on the General tab of the pipeline properties, configure the following properties, as needed:
    General Property Description
    Name Displays the pipeline name specified in the pipeline creation wizard.
    Description Displays the description specified in the pipeline creation wizard.
    Labels Pipeline label to help search and filter pipelines in the Pipelines view. For more information, see the Control Hub documentation.
  7. Then, configure the following additional hosted-engine properties on the General tab, as needed.

    If you have already created pipelines and configured Snowflake settings in your StreamSets account, the following properties might already be configured with default properties. Update them as needed.

    If you have not saved Snowflake pipeline defaults to your StreamSets account, the Snowflake URL, role, warehouse, database, and schema that you specify below are also saved to your StreamSets account for use with subsequent pipelines.

    Hosted Engine General Properties Description
    Snowflake URL Snowflake account URL. For example:

    https://<yourcompany>.snowflakecomputing.com

    If you have a Snowflake account URL defined as a Snowflake pipeline default or Snowflake credential, this property is already configured. Update the URL as needed.

    Role
    Snowflake role. Use to limit access to the Snowflake account. The following permissions are generally required to run Snowflake pipelines:
    • Read
    • Write
    • Create Table

    If you have a role defined as a Snowflake pipeline default, the role is used in this property. Otherwise, the property is set to Public by default. Update the configuration as needed.

    To configure this property, select the role to use from the list, or use the Select Role icon to select the role to use. Then, click Save.

    Warehouse Data warehouse.

    If you have a warehouse defined as a Snowflake pipeline default, this property is already configured. Update the warehouse as needed.

    To configure this property, select the warehouse to use from the list, or use the Select Warehouse icon to select the warehouse to use. Then, click Save.

    Pipeline Working Schema The location for temporarily storing data preview transient tables and for retrieving pipeline execution metrics from the Snowflake Information Schema. Transient tables are removed when you stop the preview.

    Specify the working schema in the following format: <database>.<schema>.

    If you have a database or schema as Snowflake pipeline defaults, this property is already configured to use those values. Update the values as needed.

    To configure this property, select the database and schema to use, or click the Select Schema icon to explore your Snowflake account for the database and schema to use. Then, click Save.

  8. On the Parameters tab, optionally define runtime parameters.
  9. On the Advanced tab, optionally specify inline UDFs to use in the pipeline.
    You do not need to define precompiled UDFs that are available in your Snowflake account.
  10. In the pipeline canvas, add at least one origin to represent the data that the pipeline reads.
  11. Use as many processors as you need to process the data.
  12. Add executors to perform tasks, as needed.
  13. Add at least one destination to represent the data that the pipeline writes.
    You can use data preview during pipeline development to see how data changes as it moves through the pipeline. For more information, see the Control Hub documentation.
    When you are ready, you can publish the pipeline and create a job to run it.

Configure a Pipeline (Deployed Engine)

If your organization uses deployed Transformer for Snowflake engines to run pipelines, use the following steps to configure a pipeline.
Important: Before you can select an authoring engine, you must deploy and launch a Transformer for Snowflake engine. For more information, see Installation Overview.
  1. In the Navigation panel, click Build > Pipelines, and then click the Add icon.
  2. In the Define Pipeline step of the pipeline wizard, enter a pipeline name and optional description.
  3. For Engine Type, select Transformer for Snowflake, and then click Next.
  4. In the Configure Pipeline step, configure the Authoring Engine property as needed.

    The selected authoring engine determines the stages and functionality that display in the pipeline canvas.

    By default, Control Hub selects an accessible authoring engine that you have read permission on and that has the most recent reported time.

    To select another engine, click Click here to select. In the Select an Authoring Engine window, select an accessible engine, and then click Save to return to the pipeline wizard.

    An accessible engine is an engine that is running, that can communicate with Control Hub, and that can be reached by the web browser. For more information and tips on troubleshooting inaccessible engines, see the Control Hub documentation.

  5. Click Save & Next.
    Tip: To skip sharing the pipeline, click Save & Open in Canvas. You can share the pipeline with others at a later time.
  6. In the Share Pipeline step, optionally share the pipeline with other users or groups, in the Select Users and Groups column, type a user email address or a group name. Select users or groups from the list, and then click Add.
    The added users and groups display in the User / Group table.
    Modify permissions as needed. By default, each added user or group is granted both of the following permissions:
    • Read - Enables viewing the pipeline configuration details and pipeline version history, creating a job for the pipeline, and exporting the pipeline.
    • Write - Enables designing and publishing the pipeline, creating and removing tags for the pipeline, and deleting pipeline versions.
    For more information, see the Control Hub documentation.
  7. To configure the pipeline now, click Save & Open in Canvas.
  8. In the canvas, on the General tab of the pipeline properties, configure the following properties, as needed:
    General Property Description
    Name Displays the pipeline name specified in the pipeline creation wizard.
    Description Displays the description specified in the pipeline creation wizard.
    Labels Pipeline label to help search and filter pipelines in the Pipelines view. For more information, see the Control Hub documentation.
  9. Then, configure the following additional deployed-engine properties on the General tab, as needed.
    Deployed Engine General Properties Description
    Connection Connection that defines the information required to connect to Snowflake.

    Select an existing Snowflake connection. Or, to create a new connection, click the Add New Connection icon: . To view and edit the details of the selected connection, click the Edit Connection icon: .

    For more information about Snowflake connections, see the Control Hub documentation.

    Override Role
    Snowflake role. Use to limit access to the Snowflake account. The following permissions are generally required to run Snowflake pipelines:
    • Read
    • Write
    • Create Table

    The pipeline requires defining a role. If the selected Snowflake connection includes a role, you can optionally use this property to override that value. If the connection does not include a role, this property is required.

    To configure this property, select the role to use from the list, or use the Select Role icon to select the role to use. Then, click Save.

    Override Warehouse

    Data warehouse.

    The pipeline requires defining a warehouse. If the selected Snowflake connection includes a warehouse, you can optionally use this property to override that value. If the connection does not include a warehouse, this property is required.

    To configure this property, select the warehouse to use from the list, or use the Select Warehouse icon to select the warehouse to use. Then, click Save.

    Override Working Schema The location for temporarily storing data preview transient tables and for retrieving pipeline execution metrics from the Snowflake Information Schema. Transient tables are removed when you stop the preview.

    The pipeline requires defining a working schema. If the selected Snowflake connection defines a database and schema, you can optionally use this property to override those values. If the connection does not specify a database and schema, this property is required.

    Specify the working schema in the following format: <database>.<schema>.

    To configure this property, select the database and schema to use, or click the Select Schema icon to explore your Snowflake account for the database and schema to use. Then, click Save.
  10. On the Parameters tab, optionally define runtime parameters.
  11. On the Advanced tab, optionally specify inline UDFs to use in the pipeline.
    You do not need to define precompiled UDFs that are available in your Snowflake account.
  12. In the pipeline canvas, add at least one origin to represent the data that the pipeline reads.
  13. Use as many processors as you need to process the data.
  14. Add executors to perform tasks, as needed.
  15. Add at least one destination to represent the data that the pipeline writes.
    You can use data preview during pipeline development to see how data changes as it moves through the pipeline. For more information, see the Control Hub documentation.
    When you are ready, you can publish the pipeline and create a job to run it.