Creating a Pipeline

Create a pipeline to define how data flows from origin to destination systems and how the data is processed along the way.

To create a pipeline, click Build > Pipelines in the Navigation panel, and then click the Create New Pipeline icon: .

Then, complete the following steps in the pipeline wizard:
  1. Define the Pipeline
  2. Configure the Pipeline
  3. Share the Pipeline

Define the Pipeline

Define the pipeline essentials, including the pipeline name, the type of engine for the pipeline, and how to start building the pipeline.

Important: Before you can define a Data Collector or Transformer pipeline, you must deploy and launch an engine. For more information, see Deployments Overview.
  1. Enter the following information to define the pipeline:
    Property Description
    Name Name of the pipeline.

    Use a brief name that informs your team of the pipeline use case.

    Description Optional description.

    Use the description to add additional details about the pipeline use case.

    Engine Type Type of engine for the pipeline. Select the engine type to use for your pipeline use case:
    • Data Collector - Runs data ingestion pipelines that can read from and write to a large number of heterogeneous origins and destinations. Data Collector pipelines perform record-based data transformations in streaming, CDC, or batch modes.
    • Transformer - Runs data processing pipelines run on Apache Spark. Because Transformer pipelines run on Spark deployed on a cluster, the pipelines can perform set-based transformations such as joins, aggregates, and sorts on the entire data set.
    • Transformer for Snowflake - Generates SQL queries based on your pipeline configuration and passes the queries to Snowflake for execution. Snowflake pipelines read from and write to Snowflake tables using Snowpark DataFrame-based processing.

    For more information, see Comparing StreamSets Pipelines and "Comparing Snowflake and Other StreamSets Engines" in the Transformer for Snowflake documentation.

    Start with Select how you want to start building the pipeline:
    • Blank Pipeline - Use a blank canvas for pipeline development.
    • Sample Pipeline - Use an existing sample pipeline as the basis for pipeline development or to learn how you might develop a similar pipeline.

      Not available for Transformer for Snowflake pipelines.

  2. Click one of the following buttons:
    • Cancel - Cancels creating the pipeline and exits the wizard.
    • Next - Saves the pipeline definition and continues.

Configure the Pipeline

Configure the initial content for the pipeline and the authoring engine to use for designing Data Collector or Transformer pipelines.

Transformer for Snowflake pipelines do not require an authoring engine. As a result, the pipeline wizard skips this step for Transformer for Snowflake pipelines.
Important: Before you can select an authoring engine, you must deploy and launch a StreamSets engine. For more information, see Deployments Overview.
  1. If starting with a sample pipeline, click Click here to select.

    In the Select a Sample Pipeline window, select the sample to use, and then click Save to return to the pipeline wizard.

  2. Select the authoring engine to use for pipeline design.

    The selected authoring engine determines the stages and functionality that display in the pipeline canvas.

    By default, Control Hub selects an accessible authoring engine that you have read permission on and that has the most recent reported time. To select another engine, click Click here to select.

    In the Select an Authoring Engine window, select an accessible engine, and then click Save to return to the pipeline wizard.

    An accessible engine is an engine that is running, that can communicate with Control Hub, and that can be reached by the web browser. For more information and tips on troubleshooting inaccessible engines, see Accessible Engines.

  3. Click one of the following buttons:
    • Back - Returns to the previous step in the wizard.
    • Save & Next - Saves the pipeline configuration and continues.
    • Save & Open in Canvas - Saves the pipeline configuration and opens the pipeline in the canvas. You can share the pipeline with others at a later time.

Share the Pipeline

By default, the pipeline can only be seen by you. Share the pipeline with other users and groups to grant them access to it.

  1. In the Select Users and Groups field, type a user email address or a group name.
  2. Select users or groups from the list, and then click Add.

    The added users and groups display in the User / Group table.

  3. Modify permissions as needed. By default, each added user or group is granted both of the following permissions:
    • Read - View the pipeline configuration details and pipeline version history. Create a job for the pipeline. Export the pipeline.
    • Write - Design and publish the pipeline. Create and remove tags for the pipeline. Delete pipeline versions.

    For more information, see Pipeline Permissions.

  4. Click one of the following buttons:
    • Back - Returns to the previous step in the wizard.
    • Save & Open in Canvas - Saves the pipeline configuration and opens the pipeline in the canvas.
    • Save & Exit - Saves the pipeline permissions and exits the wizard, displaying the new pipeline in the Pipelines view.
    Tip: If you created the pipeline from a sample pipeline, click View Tutorial to complete the required prerequisites to preview or run the sample pipeline.
    For details about building pipelines in the canvas, including how to configure individual pipeline stages, see the appropriate engine documentation for the pipeline type: