MapR FS Standalone

Supported pipeline types:
  • Data Collector

The MapR FS Standalone origin reads files in MapR. The origin can use multiple threads to enable the parallel processing of files. The files to be processed must all share a file name pattern and be fully written. Use the MapR FS Standalone origin only in pipelines configured for standalone execution mode. For information about supported versions, see Supported Systems and VersionsSupported Systems and Versions in the Data Collector documentation.
Tip: Data Collector provides several MapR origins to address different needs. For a quick comparison chart to help you choose the right one, see Comparing MapR Origins.

When you configure the MapR FS Standalone origin, you define the directory to use, read order, file name pattern, file name pattern mode, and the first file to process. You can use glob patterns or regular expressions to define the file name pattern that you want to use.

When using the last-modified timestamp read order, you can configure the origin to read from subdirectories. To use multiple threads for processing, specify the number of threads to use. You can also enable reading compressed files. After processing a file, the MapR FS Standalone origin can keep, archive, or delete the file.

When the pipeline stops, the MapR FS Standalone origin notes where it stops reading. When the pipeline starts again, the origin continues processing from where it stopped by default. You can reset the origin to process all requested files.

The origin generates record header attributes that enable you to use the origins of a record in pipeline processing.

When necessary, you can enable Kerberos authentication. You can also specify a Hadoop user to impersonate, define a Hadoop configuration file directory, and add Hadoop configuration properties as needed.

The origin can generate events for an event stream. For more information about dataflow triggers and the event framework, see Dataflow Triggers Overview.