Custom Schemas

When reading delimited or JSON data, you can configure an origin to use a custom schema to process the data. By default, origins infer the schema from the data.

You might use a custom schema to specify data types for potentially ambiguous fields, such as BigInt for a field that might be inferred as Integer. Inferring data can require Transformer to perform a full read on the data before processing to determine the correct data type for ambiguous fields, so using a custom schema can improve performance.

You can use a custom schema to reorder fields in JSON data or to rename fields in delimited data. When processing delimited files or objects, you can use a custom schema to define field names and types when files or objects do not include a header row.

Use custom schemas with care. When the data contains fields that are not defined in the schema, the origin drops the fields from the record. When the schema contains fields that are not in the data, the origin includes the fields in the record and populates them with null values.

You can define a custom schema using the JSON or Data Definition Language (DDL) format. When you define a schema, you specify the name and data type for each field, as well as the field order. The custom schema is applied differently depending on the data format of the data.

When you define a custom schema, you also specify how the origin handles parsing errors.