Getting Started with SDC Edge and Control Hub

To get started with SDC Edge and Control Hub, you download and install SDC Edge on the edge device where you want to run edge pipelines.

The Control Hub Pipeline Designer includes several edge sample pipelines that make it easy to design an edge pipeline. Use one of the samples to create the edge pipeline, and then simply create the corresponding Data Collector receiving pipeline. Add the receiving pipeline to a job, and run the job on an execution Data Collector. Then add the edge pipeline to a job, and run the job on an SDC Edge. Use the default topology to monitor the progress of both jobs.

In these instructions, we'll assume that you already have some execution Data Collectors registered with Control Hub. If not, you must first register an execution Data Collector with Control Hub.

We'll use the Directory Spooler To HTTP sample pipeline as an example. This sample pipeline uses a Directory origin to read a local text file on the edge device and write the data in JSON format to an HTTP Client destination:

Step 1. Download and Install SDC Edge

Download and install the SDC Edge executable from Control Hub. Choose to enable Control Hub during the download to automatically register the SDC Edge with Control Hub.

  1. In the Navigation panel, click Execute > Edge Data Collectors.
  2. Click the Download icon: .
  3. In the Download SDC Edge Executable window, select the operating system and architecture of the edge device.
  4. Select Enable Control Hub so that the downloaded SDC Edge is automatically registered with Control Hub at start up.
  5. Optionally configure the following properties:
    Property Description
    Labels Labels to assign to this SDC Edge. Use labels to group Edge Data Collectors registered with Control Hub. To assign multiple labels, enter a comma-separated list of labels.

    Default is "all", which you can use to run a job on all registered instances of SDC Edge.

    For more information, see Labels Overview.

    Ping Frequency Frequency in milliseconds that SDC Edge notifies Control Hub that it is running.
    Status Events Interval Frequency in milliseconds that SDC Edge informs Control Hub of the status of all pipelines running on this SDC Edge.
  6. Click Download.
    Control Hub downloads a tarball or ZIP file to your machine.
  7. Move the downloaded file to the edge device.
  8. Extract the downloaded file.
    For example, use the following command on Linux to extract the tarball to the desired location on the edge device, typically /opt/local:
    tar xf streamsets-datacollector-edge-<version>-<os>-<architecture>.tgz

Step 2. Start SDC Edge

On the edge device, run the following command from the SDC Edge home directory:
bin/edge

The SDC Edge is automatically registered to work with Control Hub at start up.

Step 3. Create an Edge Pipeline from a Sample Pipeline

Use one of the edge sample pipelines to create your edge sending pipeline.

  1. In the Navigation panel, click Pipeline Repository > Pipelines.
  2. Click the Create Pipeline icon: .
  3. In the New Pipeline window, enter a name for the pipeline.
  4. Select Data Collector Edge and Sample Pipeline, and then click Next.
  5. Select the Directory Spooler to HTTP sample pipeline, and then click Next.
  6. Select an authoring Data Collector, and then click Create.
    The pipeline canvas displays the pipeline created from the sample. The pipeline is configured to run in edge execution mode.
  7. Select the Directory origin in the canvas.
  8. On the Files tab, note that the Files Directory property is configured to use a runtime parameter.
    Leave the runtime parameter - you'll specify the parameter values when you create a job for the pipeline.
  9. Configure any remaining properties on the Files tab required to read the local files on your edge device - such as the File Name Pattern.
  10. Click the Data Format tab for the Directory origin, and configure the data format of the local files to be read.
  11. Select the HTTP Client destination in the canvas.
    On the HTTP tab, note that the destination uses runtime parameters for the Resource URL and the application ID. You'll specify the parameter values when you create the job.
  12. To publish the pipeline, click the Check In icon: .
  13. Enter a commit message, and then click Publish and Close.

Step 4. Create a Data Collector Receiving Pipeline

Edge sending pipelines work in tandem with Data Collector pipelines.

Create a corresponding Data Collector receiving pipeline that uses an HTTP Server origin to read data from the HTTP Client destination in the edge sending pipeline. Add any number of processors, executors, and destinations to the pipeline, as follows:

  1. In the Navigation panel, click Pipeline Repository > Pipelines.
  2. Click the Create Pipeline icon: .
  3. In the New Pipeline window, enter a name for the pipeline.
  4. Select Data Collector and Blank Pipeline, and then click Next.
  5. Select an authoring Data Collector, and then click Create.
    An empty pipeline canvas displays.
  6. Click Create.
  7. On the General tab, select Standalone for the Execution Mode.
  8. On the Error Records tab, select Discard for the error record handling.
  9. On the Statistics tab, select Write to Control Hub Directly.
  10. Add the HTTP Server origin to read from the HTTP Client destination in the edge sending pipeline.
  11. On the HTTP tab, enter a unique port number for HTTP Listening Port and enter a unique application ID for Application ID.
    When you specify the parameter values for the edge job, you'll use these same values.
  12. On the Data Format tab, select JSON since the corresponding HTTP Client destination in the edge pipeline is configured with the JSON format.
  13. Add and configure any number of processors, executors, and destinations.
  14. Click the Check In icon: .
  15. Enter a commit message, and then click Publish and Close.

Step 5. Create and Start Jobs for the Pipelines

Create and start a job for the Data Collector receiving pipeline on an execution Data Collector. Create and start a job for the edge pipeline on an SDC Edge.

  1. In the Navigation panel, click Pipeline Repository > Pipelines.
  2. Select both the Data Collector receiving pipeline and the edge pipeline, and then click the Create Job icon: .
  3. Configure the job for the Data Collector receiving pipeline as follows:
    1. Enter a name for the job.
    2. Click under Data Collector Labels, and select the label assigned to your execution Data Collector.
    3. Keep the default values for the remaining properties.
  4. Click Next.
  5. Configure the job for the edge pipeline as follows:
    1. Enter a name for the job.
    2. Click under Data Collector Edge Labels, and select the label assigned to your registered SDC Edge.
    3. Under Runtime Parameters, click Get Default Parameters.
      Control Hub adds the pipeline runtime parameters with their default values:
      {
      	"httpUrl": "http://localhost:9999",
      	"sdcAppId": "sde",
      	"directoryPath": "/tmp/out/dir"
      }
    4. Modify the parameter values as follows:
      • httpUrl - Specify the host name of the edge device and the HTTP listening port number configured for the HTTP Server origin in the Data Collector receiving pipeline.
      • sdcAppId - Specify the application ID configured for the HTTP Server origin in the Data Collector receiving pipeline.
      • directoryPath - Path to the local file to be read on the edge device.
    5. Keep the default values for the remaining properties.
  6. Click Create.
  7. In the Jobs view, click the Start Job icon () for the Data Collector receiving job.
    The Data Collector receiving pipeline must start before the edge sending pipeline.
  8. Click the Start Job icon for the edge job.

Step 6. Monitor the Jobs in a Topology

Add the Data Collector receiving and edge jobs to a topology, and then monitor the progress of both jobs from the single topology view.

  1. On the Jobs view, select both jobs.
  2. Click the More () icon and then click Create New Topology.
  3. Enter a name for the topology.
    The Create New Topology window includes suggested options of how you might want to connect the systems.
  4. While viewing the first option that connects the two jobs with an HTTP system, click Create New Topology.
    Control Hub displays the connected jobs in the topology canvas.
  5. Click the Open Detail Pane arrow to display the detail pane.
    Control Hub displays the record count diagram in the detail pane - which you can use to monitor and measure the performance of the connected jobs: