Hive Query

The Hive Query executor connects to Hive or Impala and performs one or more user-defined Hive or Impala queries each time it receives an event record. For information about supported versions, see Supported Systems and VersionsSupported Systems and Versions in the Data Collector documentation.

Use the Hive Query executor as part of an event stream to perform event-driven queries in Hive or Impala. You can use the executor in any logical way, such as running Hive or Impala queries after the Hive Metadata destination updates the Hive metastore, or after the Hadoop FS or MapR FS destination closes files.

For example, you can use the Hive Query executor to perform the Invalidate Metadata query for Impala as part of the Drift Synchronization Solution for Hive or to configure table properties for newly-created tables.

When using the Hive Query executor with Impala, you can use the default driver included with Data Collector, or you can install an Impala JDBC driver.

Note: The Hive Query executor waits for each query to complete before continuing with the next query for the same event record. It also waits for all queries to complete before starting the queries for the next event record. Depending on the speed of the pipeline and the complexity of the queries, the wait for query completion can slow pipeline performance.

When you configure the Hive Query executor, you configure JDBC connection information to Hive, and optionally add additional HDFS configuration properties to use. You specify the queries that you want to run and indicate whether to run the remaining queries after a query fails.

You can also configure the executor to generate events for another event stream. For more information about dataflow triggers and the event framework, see Dataflow Triggers Overview.

For a solution that describes how to use the Hive Query executor, see Automating Impala Metadata Updates for Drift Synchronization for Hive.