Configuring a Pipeline

Configure a pipeline to define how data flows from source tables to target tables and how the data is processed along the way.

  1. In the Navigation panel, click Build > Pipelines, and then click the Add icon.
  2. Enter a pipeline name and optional description.
  3. For Engine Type, select Transformer for Snowflake, and then click Next.
  4. Optionally, to share the pipeline with other users or groups, in the Select Users and Groups column of the Share Pipeline section, type a user email address or a group name.
  5. Select users or groups from the list, and then click Add.

    The added users and groups display in the User / Group table.

  6. Modify permissions as needed. By default, each added user or group is granted both of the following permissions:
    • Read - Enables viewing the pipeline configuration details and pipeline version history, creating a job for the pipeline, and exporting the pipeline.
    • Write - Enables designing and publishing the pipeline, creating and removing tags for the pipeline, and deleting pipeline versions.

    For more information, see the Control Hub documentation.

  7. Click Save & Next, then click Open in Canvas.
    If you have not saved your Snowflake account URL or credentials to your StreamSets account, the Snowflake Credentials dialog box appears. Enter all required information to enable access to Snowflake. The information is validated, then stored in your StreamSets account for use with all of your subsequent connections.
  8. On the General tab of the pipeline properties, configure the following properties, as needed:

    If you have already created pipelines and configured Snowflake settings in your StreamSets account, the following properties might already be configured with default properties. Update them as needed.

    If you have not saved Snowflake pipeline defaults to your StreamSets account, the Snowflake URL, role, warehouse, database, and schema that you specify below are also saved to your StreamSets account for use with subsequent pipelines.

    General Property Description
    Name Displays the pipeline name specified in the pipeline creation wizard. Edit as needed.
    Description Displays the description specified in the pipeline creation wizard. Edit as needed.
    Labels Pipeline label to help search and filter pipelines in the Pipelines view. For more information, see the Control Hub documentation.
    Snowflake URL Snowflake account URL to use. For example:

    https://<yourcompany>.snowflakecomputing.com

    If you have a Snowflake account URL defined as a Snowflake pipeline default, this property is already configured. Update the URL as needed.

    Role Role to use. Use to limit access to the Snowflake account.
    The following permissions are generally required to run Snowflake pipelines:
    • Read
    • Write
    • Create Table

    If you have a role defined as a Snowflake pipeline default, this property is already configured. Update the configuration as needed.

    Enter the role name. Or, click the Select Role icon to explore your Snowflake account for the role to use, then click Save.

    Warehouse Data warehouse to use.

    If you have a warehouse defined as a Snowflake pipeline default, this property is already configured. Update the warehouse as needed.

    Enter the warehouse name. Or, click the Select Warehouse icon to explore your Snowflake account for the warehouse to use, then click Save.

    Pipeline Working Schema The location for temporarily storing data preview transient tables and for retrieving pipeline execution metrics from the Snowflake Information Schema. Transient tables are removed when you stop the preview.

    Specify the pipeline working schema in the following format: <database>.<schema>.

    If you have a database or schema as Snowflake pipeline defaults, this property is already configured to use those values. Update the values as needed.

    Select the database and schema to use. Or, click the Select Schema icon to explore your Snowflake account for the database and schema to use, then click Save.

  9. On the Parameters tab, optionally define runtime parameters.
  10. On the Advanced tab, optionally specify inline UDFs to use in the pipeline.
    You do not need to define precompiled UDFs that are available in your Snowflake account.
  11. In the pipeline canvas, add at least one origin to represent the data that the pipeline reads.
  12. Use as many processors as you need to process the data.
  13. Add at least one destination to represent the data that the pipeline writes.
    You can use data preview during pipeline development to see how data changes as it moves through the pipeline. For more information, see the Control Hub documentation.
    When you are ready, you can publish the pipeline and create a job to run it.