Tutorial: Topologies
This tutorial covers working with topologies in StreamSets Control Hub. A topology provides an interactive end-to-end view of data as it traverses multiple pipelines. In this tutorial, we create and publish an SDC RPC origin pipeline and an SDC RPC destination pipeline, create jobs for the pipelines, map the related jobs in a topology, and then measure the progress of the topology.
Although our topology tutorial provides a simple use case, keep in mind that Control Hub enables you to manage topologies for multiple complex pipelines. For example, you can group all data flow activities that serve the needs of one business function under a single topology.
To get started with Control Hub topologies, we'll complete the following tasks:
Design and Publish Related Pipelines
A topology provides a view into multiple related pipelines. To create our related pipelines, we'll configure an SDC RPC origin pipeline that passes data to an SDC RPC destination pipeline.
Then, we'll publish both pipelines to indicate that our design is complete and the pipelines are ready to be added to jobs and run.
Now that we've designed and published related pipelines, let's add jobs for the related pipelines.
Add Jobs for the Related Pipelines
You map jobs that contain related pipelines in a topology. Let's create jobs for the related pipelines that we just published.
Map Jobs in a Topology
We'll add a topology and then map the jobs that work together to create a complete data flow.
Measure and Monitor the Topology
You can measure and monitor the progress of all running pipelines included in a topology. Let's start the jobs and then measure the progress of the running pipelines from our topology view.
That's the end of our Control Hub tutorial on topologies. Remember that our tutorial included only two simple jobs to introduce the concept of topologies. However, you can use Control Hub to manage and monitor topologies for multiple complex jobs.