MapR FS (deprecated)

Data Collector

The MapR FS origin reads files from MapR FS. Use this origin only in pipelines configured for cluster batch pipeline execution mode. For information about supported versions, see Supported Systems and VersionsSupported Systems and Versions in the Data Collector documentation.

Important: This stage is deprecated along with cluster pipelines, and may be removed in a future release. You can use StreamSets Transformer instead. For more information, see the Transformer documentation Transformer documentation.

Data Collector provides several MapR origins to address different needs. For a quick comparison chart to help you choose the right one, see Comparing MapR Origins.

When you configure the MapR FS origin, you specify the input path and data format for the data to be read. You can configure the origin to read from all subdirectories and to generate a single record for records that include multiple objects.

The origin reads compressed data based on file extension for all Hadoop-supported compression codecs.

When necessary, you can enable Kerberos authentication. You can also specify a Hadoop user to impersonate, define a Hadoop configuration file directory, and add Hadoop configuration properties as needed.

The MapR FS origin generates record header attributes that enable you to use the origins of a record in pipeline processing.

Before you use any MapR stage in a pipeline, you must perform additional steps to enable Data Collector to process MapR data. For more information, see MapR PrerequisitesMapR Prerequisites in the Data Collector documentation.