Data Formats Overview
Data formats - such as Avro, JSON, and log - are methods to encode data that adhere to generally accepted specifications.
The way that stages process data can be similar based on the stage type and the type of data being processed. For example, file-based origins such as Directory and SFTP/FTP/FTPS Client will typically process data formats the same way. Similarly, message-based destinations such as Kafka Producer and JMS Producer generally process data formats the same way.
This chapter discusses how stages process data formats generally. For the details of how a stage processes different data formats, see the "Data Formats" section of the stage documentation.
For information about the data formats supported by each origin, processor, or destination, see Data Format Support.
File Compression Formats
Origins and processors that read files can read uncompressed files, compressed files, archives, and compressed archives.
Hadoop FS reads compressed files automatically. For other origins and processors that read files, you configure the compression format.
Compression Format | Description |
---|---|
Uncompressed | Processes uncompressed files of the configured data format. |
Compressed | Processes files compressed by the following compression formats:
|
Archive | Processes files archived by the following archive formats:
|
Compressed Archive | Processes files in compressed archives created by supported compression and archive formats. |