Input and Output Schema for Stages

After running preview for a pipeline, you can view the input and output schema for each stage on the Schema tab in the pipeline properties panel. The schema includes each field path and data type.

Control Hub uses the schema extracted from the last data preview to list available field paths when you invoke expression completion for a stage property.

If you change the schema for a pipeline, for example if you remove a field, rename a field, or change the data type of a field, then you must run preview again so that the schema reflects the change.

In most cases as you configure stage properties, you can use expression completion to specify a field path. However, in some cases, you might use the Schema tab to copy a field path.

For example, let’s say you are configuring a Spark SQL Query processor and you need to reference fields in the query. After running preview, you select the processor in the pipeline canvas, and then click the Schema tab in the pipeline properties panel. You click the Copy Field Path to Clipboard icon () to copy the field path from the Schema tab, and then paste the field path into the Spark SQL query to run.

The following image displays a sample Schema tab with the time of the last data preview:

Note: To copy a field path from the Schema tab, use an authoring Transformer version 3.14.0 or later so that Control Hub correctly uses the required dot or bracket notation. For earlier Transformer versions, Control Hub incorrectly uses forward slashes in copied field paths. If you use earlier Transformer versions, replace the forward slashes with dots or brackets as appropriate, as described in Referencing Fields in Spark SQL Expressions.