Validation

Control Hub performs two types of validation in the pipeline canvas:

Implicit validation
Implicit validation occurs by default as Control Hub saves your changes when you build pipelines and fragments. Implicit validation lists missing or incomplete configuration, such as an unconnected stage or a required property that has not been configured.
Errors found by implicit validation display in the Warnings list. Error icons display on stages with undefined required properties and on the canvas for pipeline issues.
Explicit validation
Explicit validation occurs when you click the Validate icon, run a preview, or start a draft run of the pipeline. Explicit validation becomes available when implicit validation passes. You can use explicit validation when you build pipelines.
Note: You cannot use explicit validation when building fragments. For tips on how to perform explicit validation for a fragment, see Explicit Validation.
Explicit validation is a semantic validation that checks all configured values for validity and verifies whether the pipeline can run as configured. For example, while implicit validation verifies that you entered a value for a URI, explicit validation tests the validity of the URI by connecting to the system.
Errors found by explicit validation display in the Validation Errors link. Error icons display on the stages that encounter the errors.

Spark for Transformer Validation

When you click the Validate icon to perform an explicit validation of a Transformer pipeline, you choose the Spark libraries used to validate the pipeline:

Embedded Spark libraries
When you validate using embedded Spark libraries, the pipeline is validated without communicating with the Spark installation on the local Transformer machine or on the cluster.
Validation using the embedded Spark libraries typically completes quickly. However, the validation fails if the Transformer machine cannot access the external systems that the pipeline connects to.
Configured cluster manager
When you validate using the configured cluster manager, the pipeline is validated using the Spark cluster configured for the pipeline.
When you validate a local pipeline using the configured Spark cluster, Transformer launches a Spark application in the local Spark installation on the Transformer machine, and then performs the validation in the local Spark installation.
When you validate a cluster pipeline using the configured Spark cluster, Transformer launches a Spark application in the configured cluster, and then performs the validation on the Spark cluster.
When you use the configured cluster, Transformer uses the same validation as when you start the pipeline. However, using the configured cluster can cause the validation to take longer.

In most cases, you'll want to validate a pipeline using the configured Spark cluster so that Transformer uses the same validation as when you start the pipeline.

Validating a Pipeline

Explicitly validate a pipeline to check all configured values for validity and to verify whether the pipeline can run as configured.

  1. With a pipeline open in the canvas, click the Validate icon: Validate icon.

    When validating a Transformer pipeline, choose the Spark libraries used to validate the pipeline.

    If the validation fails, the canvas displays a link to the validation errors and displays error icons on the stages that encountered the errors, as follows:

    Pipeline canvas displaying a link to the validation errors

  2. Click the Validation Errors link to review all errors.
  3. Resolve each error, and then click the Validate icon again.
  4. If the validation fails due to a timeout error, click Configure Validation.
  5. In the Validation Configuration dialog box, increase the validation timeout value, and then click Validate.