Creating a pipeline
Section Contents
Creating a pipeline#
Once the authentication step has been handled and you’ve successfully instantiated a streamsets.sdk.DataCollector
object, you’re now ready to build a pipeline.
Instantiating a Pipeline Builder#
The first step of creating a pipeline is to instantiate a streamsets.sdk.sdc_models.PipelineBuilder
instance.
This class handles the majority of the pipeline configuration on your behalf by building the initial JSON representation
of the pipeline, and configuring default values for essential properties (instead of requiring each to be set manually).
The streamsets.sdk.sdc_models.PipelineBuilder
instance can be created as follows:
pipeline_builder = sdc.get_pipeline_builder()
Adding Stages to the Pipeline Builder#
Now that the builder has been instantiated, you can get streamsets.sdk.sdc_models.Stage
instances from this
builder for use in the pipeline you’re creating. Adding stages to the pipeline can be done by calling
streamsets.sdk.sdc_models.PipelineBuilder.add_stage()
. See the API reference for this method for details on the
arguments it takes.
As shown in the first example, the simplest type of pipeline directs one origin into one
destination. For this example, you can do this with Dev Raw Data Source
origin and Trash
destination,
respectively:
dev_raw_data_source = pipeline_builder.add_stage('Dev Raw Data Source')
trash = pipeline_builder.add_stage('Trash')
Connecting the Stages#
With streamsets.sdk.sdc_models.Stage
instances in hand, you can connect them by using the >>
operator.
Once the stages are connected, you can build the streamsets.sdk.sdc_models.Pipeline
instance with the
streamsets.sdk.sdc_models.PipelineBuilder.build()
method:
dev_raw_data_source >> trash
pipeline = pipeline_builder.build('My first pipeline')
Add the Pipeline to Data Collector#
Finally, to add this pipeline to your Data Collector instance, pass it to the
streamsets.sdk.DataCollector.add_pipeline()
method:
sdc.add_pipeline(pipeline)