Data Formats
The Amazon S3 origin processes data
differently based on the data format. The origin processes the following types of data:
- Avro
- Generates a record for every Avro record. Includes a
precision
andscale
field attribute for each Decimal field. - Delimited
- Generates a record for each delimited line.
- Excel
- Generates a record for every row in the file. Can process
.xls
or.xlsx
files.You can specify whether files include a header row and whether to ignore the header row. A header row must be the first row of a file. Vertical header columns are not recognized.
The origin cannot process Excel files with large numbers of rows. You can save such files as CSV files in Excel, and then use the origin to process with the delimited data format.
- JSON
- Generates a record for each JSON object. You can process JSON files that include multiple JSON objects or a single JSON array.
- Log
- Generates a record for every log line.
- Protobuf
- Generates a record for every protobuf message.
- SDC Record
- Generates a record for every record. Use to process records generated by a Data Collector pipeline using the SDC Record data format.
- Text
- Generates a record for each line of text or for each section of text based on a custom delimiter.
- Whole File
- Streams whole files from the origin system to the destination system. You can specify a transfer rate or use all available resources to perform the transfer.
- XML
- Generates records based on a user-defined delimiter element. Use an XML element directly under the root element or define a simplified XPath expression. If you do not define a delimiter element, the origin treats the XML file as a single record.