MapReduce
The MapReduce executor starts a MapReduce job in HDFS or MapR FS each time it receives an event record. Use the MapReduce executor as part of an event stream. For information about supported versions, see Supported Systems and VersionsSupported Systems and Versions in the Data Collector documentation.
You can use the MapReduce executor to start a custom job, such as a validation job that compares the number of records in files. You can use a custom job by configuring it in the executor or using a predefined configuration object. You can also use the MapReduce executor to start a predefined job. The MapReduce executor includes two predefined jobs: one that converts Avro files to ORC files, and one that converts Avro files to Parquet.
You can use the executor in any logical way, such as running MapReduce jobs after the Hadoop FS or MapR FS destination closes files. For example, you can use the Avro to ORC job to convert Avro files to ORC files after a MapR FS destination closes a file. Or, you might use the Avro to Parquet job to convert Avro files to Parquet after the Hadoop FS destination closes a file as part of the Drift Synchronization Solution for Hive.
When you configure the MapReduce executor, you specify connection information and job details. For predefined jobs, you specify Avro conversion details, such as the input and output file location, as well as ORC- or Parquet-specific details. For other types of jobs, you specify a job creator or configuration object, and the job configuration properties to use.
When necessary, you can enable Kerberos authentication and specify a MapReduce user. You can also use MapReduce configuration files and add other MapReduce configuration properties as needed.
You can also configure the executor to generate events for another event stream. For more information about dataflow triggers and the event framework, see Dataflow Triggers Overview.
For a solution that describes how to use the MapReduce executor, see Converting Data to the Parquet Data Format.