Hive

The Hive destination writes files of a specified file format to a Hive table. Hive is a transactional storage layer that works on top of Hadoop Distributed File System (HDFS) and Apache Spark. Hive stores files in tables on HDFS.

By default, the destination writes to Hive using connection information stored in Hive configuration files on the Transformer machine. You can alternatively specify the location of an external Hive Metastore where the configuration information is stored.

The destination can write to a new or existing Hive table. If the table doesn't exist, the destination creates the table. The destination can create a managed internal table or an external table.

If the table exists, the destination can either append data to the table, overwrite all existing data, or overwrite related partitions in the table.

When you configure the Hive destination, you specify the schema and table to write to. You configure the file format of the data and the write mode to use. You can also configure table columns to partition the data by.

You can enable data drift handling, which allows the destination to automatically compensate for new or missing data in pipeline records. When needed, you can specify URIs for an external Hive metastore where configuration information is stored. You can also specify the type of table to create if the table does not exist.