Directory

The Directory origin reads data from files in a local directory. The origin can use multiple threads to enable the parallel processing of files.

The files to be processed must be stored in a local directory on the Data Collector machine, must all share a file name pattern, and must be fully written. To read files from a remote server, use the SFTP/FTP/FTPS Client origin. To read data from an active file that is still being written to, use the File Tail origin.

Note: The Directory origin can read files from network-attached storage (NAS) systems. However, be aware that issues can occur when reading high volumes of files with some implementations of the network file system (NFS) protocol.

When you configure the Directory origin, you define the directory to use, read order, file name pattern, file name pattern mode, and the first file to process. You can use glob patterns or regular expressions to define the file name pattern that you want to use.

When using the Last Modified Timestamp read order, you can configure the origin to read from subdirectories. To use multiple threads for processing, specify the number of threads to use.

You can also enable reading compressed files or files in a late arriving directory. After processing a file, the Directory origin can keep, archive, or delete the file.

When the stops, the Directory origin notes where it stops reading. When the starts again, the origin continues processing from where it stopped by default. You can reset the origin reset the origin reset the origin to process all requested files.

Note: The origin processes files based on file names and locations. Having files with the same name in the same location can cause the origin to skip reading the duplicate files.

The origin generates record header attributes that enable you to use the origins of a record in processing.

The origin can generate events for an event stream. For more information about dataflow triggers and the event framework, see Dataflow Triggers Overview.