Schema Inference

When processing data formats that include schemas with the data, such as Avro, ORC, and Parquet, Transformer origins use those schemas to process the data.

For all other data formats, Transformer origins infer the schema of source data. Best practice is to verify that the schema of the data is inferred as expected as you build the pipeline.

Previewing the pipeline is the easiest way to determine how the origin infers the schema.

Inferring data can require Transformer to perform a full read on the data before processing to determine the correct data type for ambiguous fields, so using a custom schema can improve performance.

Tip: When the origin infers the schema inaccurately, you can define a custom schema for the origin to use.