MapR FS

Data Collector

The MapR FS destination writes files to MapR FS. You can write the data to MapR as flat files or Hadoop sequence files. For information about supported versions, see Supported Systems and VersionsSupported Systems and Versions in the Data Collector documentation.

When you configure a MapR FS destination, you can define a directory template and time basis to determine the output directories that the destination creates and the files where records are written.

As part of the Drift Synchronization Solution for Hive, you can alternatively use record header attributes to perform record-based writes. You can write records to the specified directory, use the defined Avro schema, and roll files based on record header attributes. For more information, see Record Header Attributes for Record-Based Writes.

You can define a file prefix and suffix, the data time zone, and properties that define when the destination closes a file. You can specify the amount of time that a record can be written to its associated directory and what happens to late records.

The destination can generate events for an event stream. For more information about the event framework, see Dataflow Triggers Overview.

When necessary, you can enable Kerberos authentication. You can also specify a Hadoop user to impersonate, define a Hadoop configuration file directory, and add Hadoop configuration properties as needed.

You can use Gzip, Bzip2, Snappy, LZ4, and other compression formats to write output files.

Before you use any MapR stage in a pipeline, you must perform additional steps to enable Data Collector to process MapR data. For more information, see MapR PrerequisitesMapR Prerequisites in the Data Collector documentation.