Importing and Exporting Pipelines
Section Contents
Importing and Exporting Pipelines#
The SDK also allows for importing and exporting pipelines programmatically, either between multiple SDC instances or within the same SDC instance.
Simple Data Collector to Data Collector Import-Export Operation#
In order to export a pipeline from a Data Collector instance, the
streamsets.sdk.DataCollector.export_pipeline()
method can be used which will return a dict
containing
the JSON representation of the pipeline. See the API reference for this method for details on the arguments it takes.
pipeline_json = sdc.export_pipeline(pipeline=sdc.pipelines.get(title='pipeline name'),
include_plain_text_credentials=True)
with open('./from_sdc_for_sdc.json', 'w') as f:
json.dump(pipeline_json, f)
You can import a pipeline from a JSON file into a Data Collector instance in two ways:
Import the JSON file into
streamsets.sdk.sdc_models.PipelineBuilder
and add the pipeline:
with open('./from_sdc_for_sdc.json', 'r') as input_file:
pipeline_json = json.load(input_file)
sdc_pipeline_builder = sdc.get_pipeline_builder()
sdc_pipeline_builder.import_pipeline(pipeline=pipeline_json)
pipeline = sdc_pipeline_builder.build(title='built from imported json file from sdc')
sdc.add_pipeline(pipeline)
2. Directly import the dict
object that was generated by the
streamsets.sdk.DataCollector.export_pipeline()
method:
pipeline = sdc.import_pipeline(pipeline=pipeline_json)
Exporting pipelines from Data Collector for Control Hub#
To export a Data Collector pipeline to use in Control Hub, the optional argument include_library_definitions
must
be set to True
.
pipeline_json = sdc.export_pipeline(pipeline=sdc.pipelines.get(title='pipeline name'),
include_library_definitions=True,
include_plain_text_credentials=True)
Exporting and Importing multiple Pipelines at once#
To export multiple pipelines from a Data Collector into a zip archive, you can use the
streamsets.sdk.DataCollector.export_pipeline()
and pass in a list of
streamsets.sdk.sdc_models.Pipeline
objects:
# Returns a list of all pipelines on the given SDC instance
pipelines = sdc.pipelines
# Show the pipelines to be exported
pipelines
pipelines_zip_data = sdc.export_pipelines(pipelines, include_library_definitions=True)
with open('./sdc_exports_for_sch.zip', 'wb') as output_file:
output_file.write(pipelines_zip_data)
Output:
[<Pipeline (id=apiprocesdff151fe-1f1b-42e3-8920-895de370d607, title=sample_one)>,
<Pipeline (id=httpclienbea409f3-7cd6-4001-96bc-f065eb255430, title=sample_two)>,
<Pipeline (id=schapi7774665d-6d90-4e79-ad97-303fefcf1822, title=sample_three)>,
<Pipeline (id=snapshot187a8311-ee25-4543-894e-ab0f0a73b255, title=sample_four)>]
Similarly, you could import multiple pipelines into Data Collector by using
streamsets.sdk.DataCollector.import_pipelines_from_archive()
.
with open('./sdc_exports_for_sch.zip', 'rb') as input_file:
pipelines_zip_data = input_file.read()
pipelines = sdc.import_pipelines_from_archive(pipelines_file=pipelines_zip_data)