Hive Metastore
Supported pipeline types:
|
The Hive Metastore destination uses metadata records generated by the Hive Metadata processor to create and update Hive tables. This enables the Hadoop FS and MapR FS destinations to write drifting Avro or Parquet data to HDFS or MapR FS.
The Hive Metastore destination compares information in metadata records with Hive tables, and then creates or updates the tables as needed. For example, when the Hive Metadata processor encounters a record that requires a new Hive table, it passes a metadata record to the Hive Metastore destination and the destination creates the table.
Hive table names, column names, and partition names are created with lowercase letters. Names that include uppercase letters become lowercase in Hive.
Note that the Hive Metastore destination does not process data. It processes only metadata records generated by the Hive Metadata processor and must be downstream from the processor's metadata output stream.
When you configure Hive Metastore, you define the connection information for Hive, the location of the Hive and Hadoop configuration files and optionally specify additional required properties. You can also enable Kerberos authentication. You can also set a maximum cache size for the destination, determine how new tables are created and stored, and configure custom record header attributes.
The destination can also generate events for an event stream. For more information about the event framework, see Dataflow Triggers Overview.
For more information about the Drift Synchronization Solution for Hive and case studies for processing Avro and Parquet data, see Drift Synchronization Solution for Hive. For a tutorial, check out our tutorial on GitHub.