Directory Path

When you configure the File origin, you specify the directory path to use. The origin reads all files in the specified directory and its subdirectories. You can use glob patterns in the directory path to specify a set of directories to read from.

In each batch, the origin reads any files added to the directory path since the last batch completed.

The format of the directory path depends on the file system that you want to read from:

HDFS
To read files in HDFS, use the following format for the directory path:
hdfs://<authority>/<path>
For example, to read from the /user/hadoop/files directory on HDFS, enter the following path:
hdfs://nameservice/user/hadoop/files
Local file system
To read files in a local file system, use the following format for the directory path:
file:///<directory>
For example, to read from the /Users/transformer/source directory on the local file system, enter the following path:
file:///Users/transformer/source