Hive Streaming (deprecated)

Data Collector

The Hive Streaming destination writes data to Hive tables stored in the ORC (Optimized Row Columnar) file format. For information about supported versions, see Supported Systems and VersionsSupported Systems and Versions in the Data Collector documentation.

Important: This stage is deprecated and may be removed in a future release.

Before you use the destination, verify that your Hadoop implementation supports Hive Streaming.

When configuring Hive Streaming, you specify the Hive metastore and a bucketed table stored in the ORC file format. You define the location of the Hive and Hadoop configuration files and optionally specify additional required properties. By default, the destination creates new partitions as needed.

Hive Streaming writes data to the table based on the matching field names. You can defining custom field mappings that override the default field mappings.

Before you use the Hive Streaming destination with the MapR library in a pipeline, you must perform additional steps to enable Data Collector to process MapR data. For more information, see MapR PrerequisitesMapR Prerequisites in the Data Collector documentation.