Importing and Exporting Pipelines¶
The SDK also allows for importing and exporting pipelines programmatically, either between multiple SDC instances or within the same SDC instance.
Simple Data Collector to Data Collector Import-Export Operation¶
In order to export a pipeline from a Data Collector instance, the
streamsets.sdk.DataCollector.export_pipeline()
method can be used which will return a dict
containing
the JSON representation of the pipeline. See the API reference for this method for details on the arguments it takes.
pipeline_json = sdc.export_pipeline(pipeline=sdc.pipelines.get(title='pipeline name'),
include_plain_text_credentials=True)
with open('./from_sdc_for_sdc.json', 'w') as f:
json.dump(pipeline_json, f)
You can import a pipeline from a JSON file into a Data Collector instance in two ways:
Import the JSON file into
streamsets.sdk.sdc_models.PipelineBuilder
and add the pipeline:
with open('./from_sdc_for_sdc.json', 'r') as input_file:
pipeline_json = json.load(input_file)
sdc_pipeline_builder = sdc.get_pipeline_builder()
sdc_pipeline_builder.import_pipeline(pipeline=pipeline_json)
pipeline = sdc_pipeline_builder.build(title='built from imported json file from sdc')
sdc.add_pipeline(pipeline)
2. Directly import the dict
object that was generated by the
streamsets.sdk.DataCollector.export_pipeline()
method:
pipeline = sdc.import_pipeline(pipeline=pipeline_json)
Exporting pipelines from Data Collector for Control Hub¶
To export a Data Collector pipeline to use in Control Hub, the optional argument include_library_definitions
must
be set to True
.
pipeline_json = sdc.export_pipeline(pipeline=sdc.pipelines.get(title='pipeline name'),
include_library_definitions=True,
include_plain_text_credentials=True)
Exporting and Importing multiple Pipelines at once¶
To export multiple pipelines from a Data Collector into a zip archive, you can use the
streamsets.sdk.DataCollector.export_pipeline()
and pass in a list of
streamsets.sdk.sdc_models.Pipeline
objects:
# Returns a list of all pipelines on the given SDC instance
pipelines = sdc.pipelines
# Show the pipelines to be exported
pipelines
pipelines_zip_data = sdc.export_pipelines(pipelines, include_library_definitions=True)
with open('./sdc_exports_for_sch.zip', 'wb') as output_file:
output_file.write(pipelines_zip_data)
Output:
[<Pipeline (id=apiprocesdff151fe-1f1b-42e3-8920-895de370d607, title=sample_one)>,
<Pipeline (id=httpclienbea409f3-7cd6-4001-96bc-f065eb255430, title=sample_two)>,
<Pipeline (id=schapi7774665d-6d90-4e79-ad97-303fefcf1822, title=sample_three)>,
<Pipeline (id=snapshot187a8311-ee25-4543-894e-ab0f0a73b255, title=sample_four)>]
Similarly, you could import multiple pipelines into Data Collector by using
streamsets.sdk.DataCollector.import_pipelines_from_archive()
.
with open('./sdc_exports_for_sch.zip', 'rb') as input_file:
pipelines_zip_data = input_file.read()
pipelines = sdc.import_pipelines_from_archive(pipelines_file=pipelines_zip_data)