Creating a Pipeline
Create a pipeline to define how data flows from origin to destination systems and how the data is processed along the way.
To create a pipeline, click Create New Pipeline icon: .
in the Navigation panel, and then click theDefine the Pipeline
Define the pipeline essentials, including the pipeline name, the type of engine for the pipeline, and how to start building the pipeline.
-
Enter the following information to define the pipeline:
Property Description Name Name of the pipeline. Use a brief name that informs your team of the pipeline use case.
Description Optional description. Use the description to add additional details about the pipeline use case.
Engine Type Type of engine for the pipeline. Select the engine type to use for your pipeline use case: - Data Collector - Runs data ingestion pipelines that can read from and write to a large number of heterogeneous origins and destinations. Data Collector pipelines perform record-based data transformations in streaming, CDC, or batch modes.
- Transformer - Runs data processing pipelines run on Apache Spark. Because Transformer pipelines run on Spark deployed on a cluster, the pipelines can perform set-based transformations such as joins, aggregates, and sorts on the entire data set.
- Transformer for Snowflake - Generates SQL queries based on your pipeline configuration and passes the queries to Snowflake for execution. Snowflake pipelines read from and write to Snowflake tables using Snowpark DataFrame-based processing.
For more information, see Comparing StreamSets Pipelines and "Comparing Snowflake and Other StreamSets Engines" in the Transformer for Snowflake documentation.
Start with Select how you want to start building the pipeline: - Blank Pipeline - Use a blank canvas for pipeline development.
- Sample Pipeline - Use an existing sample pipeline as
the basis for pipeline development or to learn how
you might develop a similar pipeline.
Not available for Transformer for Snowflake pipelines.
-
Click one of the following buttons:
- Cancel - Cancels creating the pipeline and exits the wizard.
- Next - Saves the pipeline definition and continues.
Configure the Pipeline
Configure the initial content for the pipeline and the authoring engine to use for designing pipelines.
-
If starting with a sample pipeline, click Click here to
select.
In the Select a Sample Pipeline window, select the sample to use, and then click Save to return to the pipeline wizard.
-
Select the authoring engine to use for pipeline design.
The selected authoring engine determines the stages and functionality that display in the pipeline canvas.
By default, Control Hub selects an accessible authoring engine that you have read permission on and that has the most recent reported time. To select another engine, click Click here to select.
In the Select an Authoring Engine window, select an accessible engine, and then click Save to return to the pipeline wizard.
An accessible engine is an engine that is running, that can communicate with Control Hub, and that can be reached by the web browser. For more information and tips on troubleshooting inaccessible engines, see Accessible Engines.
-
Click one of the following buttons:
- Back - Returns to the previous step in the wizard.
- Save & Next - Saves the pipeline configuration and continues.
- Save & Open in Canvas - Saves the pipeline configuration and opens the pipeline in the canvas. You can share the pipeline with others at a later time.
Share the Pipeline
By default, the pipeline can only be seen by you. Share the pipeline with other users and groups to grant them access to it.
- In the Select Users and Groups field, type a user email address or a group name.
-
Select users or groups from the list, and then click
Add.
The added users and groups display in the User / Group table.
-
Modify permissions as needed. By default, each added user or group is granted
both of the following permissions:
- Read - View the pipeline configuration details and pipeline version history. Create a job for the pipeline. Export the pipeline.
- Write - Design and publish the pipeline. Create and remove tags for the pipeline. Delete pipeline versions.
For more information, see Pipeline Permissions.
-
Click one of the following buttons:
- Back - Returns to the previous step in the wizard.
- Save & Open in Canvas - Saves the pipeline configuration and opens the pipeline in the canvas.
- Save & Exit - Saves the pipeline permissions and exits the wizard, displaying the new pipeline in the Pipelines view.
Tip: If you created the pipeline from a sample pipeline, click View Tutorial to complete the required prerequisites to preview or run the sample pipeline.For details about building pipelines in the canvas, including how to configure individual pipeline stages, see the appropriate engine documentation for the pipeline type: