Map Jobs in a Topology

Create a topology in Control Hub to map multiple related jobs into a single view. You can map all dataflow activities that serve the needs of one business function in a single topology.

You can add the following components to a topology canvas:

Jobs
Add jobs that belong to your organization. When you add a job to the canvas, Control Hub displays the job and all external systems connected to the job. Control Hub uses a rectangle to represent the job, and uses circles to represent the external systems, such as origin and destination systems.
Systems
Add any external system that your jobs connect to. When you add a system to the canvas, Control Hub displays a single circle to represent the external system type.
In most cases, you'll find that you won't need to add a system to the canvas - since adding your job automatically adds the connecting systems. However, if you mistakenly delete a system in the canvas, you can add that system back and connect it to the appropriate jobs.

Example

Let's say that we created a job for a Data Collector pipeline that uses the HTTP Client origin to read Twitter social feeds, performs some calculations, and writes the processed data to an Amazon S3 destination. In addition, the pipeline is configured to write error records to another bucket in Amazon S3. When we add the job to the topology, Control Hub displays the job and external systems in the canvas as follows:

The canvas displays the following components for the added job:
Origin systems
Each job that includes a Data Collector pipeline has one origin system. A job that includes a Transformer pipeline can have one or more origin systems. In the image above, Twitter is the origin system.
If another pipeline is designed to write to the origin system, you can add the related job.
Job
The rectangle represents the job, with inputs and outputs. In the image above, Social Feeds Dataflows is the job. The input represents the origin for the pipeline. The outputs represent all destinations for the pipeline and the error handling system for the pipeline.
Destination systems
Each job has one or more destination systems. In the image above, the job has one destination system, Amazon S3.
If another pipeline is designed to read from one of the destination systems, you can add the related job. For example, in the topology above, you can add a related job containing a pipeline that reads from the Amazon S3 system and processes the data for further analysis.
Error system
Each job has one error system, based on how the pipeline is configured to handle errors. In the image above, Error Records is the error system because the pipeline is configured to write error records to Amazon S3.
You can delete error systems from the topology canvas if you do not want to measure the error records. However, for a complete view into the topology, retain the error systems so that you can measure and monitor the errors that the dataflows encounter.
If a pipeline is configured to write error records to another pipeline or destination system, you can add the related job that processes those error records. For example, in the topology above, you can add the related job containing the pipeline that reads the errors written to the Amazon S3 bucket, processes the error records, and sends them back into the main dataflow.
If a pipeline is configured to discard error records, Control Hub still adds a default Error Records - Discard system to the canvas. In this case, you won't have a related job that processes those error records.

Connect Multiple Jobs to a Single System

If you have multiple jobs that read from or write to a single system, you can connect the jobs into a single system in the topology canvas. When you monitor a system connected to multiple jobs, you can measure and monitor all the data passing into or out of the system from each of those jobs.

For example, let's say that you have two jobs that collect customer data from different source systems. Each job processes and writes the data to the same Kafka system for temporary storage. You add both jobs to the topology canvas, which by default adds two Kafka systems to the canvas:

However, the jobs write to the same Kafka system, so you'd like to visually represent that in the topology canvas. You delete the Kafka system from the Clickstream Processing job, and then connect that job to the remaining Kafka system, like so:

Create Topologies

After you create jobs for published pipelines, create a topology and map the related jobs and connecting systems in the topology.

To create a topology, click Monitor > Topologies in the Navigation panel, and then click the Create New Topology icon: .

Then, complete the following steps in the topology wizard:
  1. Define the Topology
  2. Share the Topology
  3. Review and Open the Topology

Define the Topology

Define the topology essentials, including the topology name, and optionally a description.

  1. Enter the following information to define the topology:
    Property Description
    Name Name of the topology.

    Use a brief name that informs your team of the topology use case.

    Description Optional description.

    Use the description to add additional details about the topology use case.

  2. Click one of the following buttons:
    • Cancel - Cancels creating the topology and exits the wizard.
    • Next - Saves the topology definition and continues.

Share the Topology

By default, the topology can only be seen by you. Share the topology with other users and groups to grant them access to it.

  1. In the Select Users and Groups field, type a user email address or a group name.
  2. Select users or groups from the list, and then click Add.

    The added users and groups display in the User / Group table.

  3. Modify permissions as needed. By default, each added user or group is granted both of the following permissions:
    • Read - View the mapped systems and jobs in the topology canvas. View statistics and monitoring details for the topology. Also requires read access on all jobs and pipelines included in the topology.
    • Write - Delete topology versions. Edit the topology. Editing a topology also requires read access on all jobs and pipelines included in the topology.

    For more information, see Topology Permissions.

  4. Click one of the following buttons:
    • Back - Returns to the previous step in the wizard.
    • Save & Next - Saves the topology permissions and continues.
    • Save & Exit - Saves the topology permissions and exits the wizard, displaying the new topology in the Topologies view.

Review and Open the Topology

You've successfully finished creating the topology.

Click one of the following buttons:
  • Exit - Saves the topology and exits the wizard, displaying the new topology in the Topologies view.
  • Open in Canvas - Opens the topology in a blank canvas and versions the topology as v1-DRAFT.

    Map related jobs and systems in the topology as described in Mapping Jobs and Systems in a Topology.

Mapping Jobs and Systems in a Topology

Map jobs and systems in a topology that is in a draft state.

  1. In the Navigation panel, click Monitor > Topologies.
  2. Click the name of a topology in the Topologies view.
    You can edit topologies that are in a draft state. If you selected a published topology, click the Create Draft icon () to create another draft version.

    Control Hub displays the topology canvas and versions the topology as <version>-DRAFT. Note the Add Job and Add System icons. You'll use these icons to map related jobs and systems in the topology:

  3. Click the Add Job icon, and then click Show All Jobs.
  4. Search for and then select the first job that you want to map in the topology.
    Note: You can perform a basic or advanced search for jobs, as described in Defining Search Conditions.

    Control Hub adds the job to the canvas. It uses a rectangle to represent the job and circles to represent the external systems:

  5. To map a related job that reads from a destination system of the first job, select the connecting destination system, click Add Job > Show All Jobs, and then search for and select the related job.
    Control Hub adds the related job, automatically connecting it to the selected destination system, as follows:

    Tip: If you add a related job without first selecting the connecting system, Control Hub adds a duplicate of the system and cannot connect the jobs. Simply select one of the duplicate systems and click the Delete icon () to remove the duplicate from the topology canvas. Then, connect the related job to the system just as you connect stages in the pipeline canvas. Or, you can instruct Control Hub to automatically discover connecting systems.
  6. Add additional related jobs as necessary.
    Tip: If you mistakenly delete a connecting system, you can add the system back to the topology canvas using the Add System icon. Then, you can connect the system to related jobs.
  7. Optionally add error handling jobs that process error records written to an error system.
  8. When the topology is complete, click the Publish Topology icon () to commit this version of the topology.

Auto Discover Connecting Systems

Control Hub can automatically discover connecting systems for multiple jobs added to a topology. Control Hub suggests how you might want to connect the systems, which you can accept or reject.

After adding multiple jobs to the topology canvas, click the More () icon above the canvas and then click Auto Discover Connections.

The Auto Discover Connections window includes the jobs you selected, with suggested options of how you might want to connect the systems. Notice the number of suggested options to connect the jobs in the bottom left corner of the window:

Click the back and next arrows to view all of the suggested options to connect the jobs. When you've decided on the option to use, display that option in the canvas, and then click Accept.

You cannot make any changes to the suggested options in the Auto Discover Connections window. However, once you accept an option, you can modify any of the connections or map additional jobs and systems in the topology canvas.

Managing Jobs from a Topology

After you map jobs in a topology, you can perform most available actions for jobs from the topology.

You can perform the following actions for jobs from a topology:
  • Start a specific job or start all jobs.
  • Monitor a job.
  • Acknowledge job errors.
  • Stop a specific job or stop all jobs.
  • Force stop a job.
  • Reset the origin for a job.
  • Synchronize a job.
  • Update a job to use the latest pipeline version.

For more information about each of these tasks, see Job Instances Overview.

Customizing System Icons

You can customize the icon for any system displayed in the topology canvas.

For example, let's say that you have a pipeline that uses the HTTP Client origin to read Twitter social feeds. You add a job for the pipeline to a topology. The HTTP Client icon in the topology canvas doesn’t indicate that the origin is a Twitter system:

You can import a custom icon for the HTTP Client system to visually indicate that this is a Twitter system, as follows:

  1. In the Navigation panel, click Monitor > Topologies.
  2. Click the name of the topology that you want to edit.
    Control Hub displays the topology in the canvas. You can edit topologies that are in a draft state. If you selected a published topology, click the Create Draft icon () to create another draft version.
  3. Double-click the canvas or click the Open Detail Pane arrow to display the detail pane.
  4. Select a system in the topology canvas.
  5. In the detail pane, expand the name of the selected system.
  6. Click Upload New Icon.
  7. Select the icon and then click Open.
  8. Click as directed to update the icon in the canvas.