Whole File Data Format

You can use the whole file data format to transfer entire files from an origin system to a destination system. With the whole file data format, you can transfer any type of file.

When transferring whole files, Data Collector streams file data from the origin system and writes the data to the destination system based on the directory and file name defined in the destination.

Whole file records contain reference information for the file transfer. They do not contain data from within the files. While you can use several processors to read or modify the reference information, you cannot use typical processors, such as the Field Masker or the Field Replacer, to perform processing on file data. The only processor that substantially alters whole file data is the Whole File Transformer processor, which converts Avro files to Parquet. For more information, see Whole File Transformer.

By default, whole file transfers use available resources as needed. You can limit the resources used to transfer whole file data by specifying a transfer rate.

When the origin system provides checksum metadata, you can configure the origin to verify the checksum. Destinations that generate events can include checksum information in event records.

Most destinations allow you to define access permissions for written files. By default, written files use the default permissions of the destination system.

Note: During data preview, a whole file pipeline displays a single record instead of the number of records configured for the preview batch size.

For a list of origins and destinations that process this data format, see Data Format Support.