Hive Streaming (deprecated)
Supported pipeline types:
|
Before you use the destination, verify that your Hadoop implementation supports Hive Streaming.
When configuring Hive Streaming, you specify the Hive metastore and a bucketed table stored in the ORC file format. You define the location of the Hive and Hadoop configuration files and optionally specify additional required properties. By default, the destination creates new partitions as needed.
Hive Streaming writes data to the table based on the matching field names. You can defining custom field mappings that override the default field mappings.
Before you use the Hive Streaming destination with the MapR library in a pipeline, you must perform additional steps to enable Data Collector to process MapR data. For more information, see MapR Prerequisites.
Hive Properties and Configuration Files
- Configuration files
- The following configuration files are required for the Hive Streaming
destination:
- core-site.xml
- hdfs-site.xml
- hive-site.xml
- Individual properties
- You can configure individual Hive properties in the destination. To add a
Hive property, specify the exact property name and the value. The
destination does not validate the property names or values.Note: Individual properties override properties defined in the configuration files.
Configuring a Hive Streaming Destination
-
In the Properties panel, on the General tab, configure the
following properties:
General Property Description Name Stage name. Description Optional description. Stage Library Library version that you want to use. Required Fields Fields that must include data for the record to be passed into the stage. Tip: You might include fields that the stage uses.Records that do not include all required fields are processed based on the error handling configured for the pipeline.
Preconditions Conditions that must evaluate to TRUE to allow a record to enter the stage for processing. Click Add to create additional preconditions. Records that do not meet all preconditions are processed based on the error handling configured for the stage.
On Record Error Error record handling for the stage: - Discard - Discards the record.
- Send to Error - Sends the record to the pipeline for error handling.
- Stop Pipeline - Stops the pipeline.
-
On the Hive tab, configure the following properties:
Hive Property Description Hive Metastore Thrift URL Thrift URI for the Hive metastore. Use the following format: thrift://<host>:<port>
The port number is typically 9083.
Schema Hive schema. Table Bucketed Hive table stored in as an ORC file. Hive Configuration Directory Absolute path to the directory containing the Hive and Hadoop configuration files. For a Cloudera Manager installation, enter
hive-conf
.The destination uses the following configuration files:- core-site.xml
- hdfs-site.xml
- hive-site.xml
Note: Properties in the configuration files are overridden by individual properties defined in this destination.Field to Column Mapping Use to override the default field to column mappings.
By default, fields are written to columns of the same name.
Create Partitions Automatically creates partitions when needed. Used for partitioned tables only. -
On the Advanced tab, optionally configure the following
properties:
Advanced Property Description Transaction Batch Size The number of transactions to request in a batch for each partition in the table. For more information, see the Hive documentation. Default is 1000 transactions.
Buffer Limit (KB) Maximum size of the record to be written to the destination. Increase the size to accommodate larger records. Records that exceed the limit are handled based on the error handling configured for the stage.
Hive Configuration Additional Hive properties to use. Using simple or bulk edit mode, click the Add icon and define the property name and value.
Use the property names and values as expected by Hive.