Hive

Available when using an authoring Data Collector version 5.0.0 or later.

To create a Hive connection, one of the following stage libraries must be installed on the selected authoring Data Collector:
  • Cloudera CDP, streamsets-datacollector-cdp_<version>-lib
  • MapR with MEP, streamsets-datacollector-mapr_<version>-mep<version>-lib

For a description of the Hive connection properties, see Hive Connection Properties.

After you create a Hive connection, you can use the connection in the following stages:

Engine Stages
Data Collector 5.0.0 or later
  • Hive Metadata processor
  • Hive Metastore destination
  • Hive Query executor

Hive Connection Properties

When creating a Hive connection, configure the following properties on the Hive tab:
Hive Property Description
JDBC URL

JDBC URL for Hive. For details about specifying the URL, see our StreamSets Community post.

You can optionally include the user name and password in the JDBC URL. If you include a password with special characters, you must URL-encode (also called percent-encoding) the special characters. Otherwise errors will occur when validating or running your pipeline. For example, if your JDBC URL looks like this:
jdbc:hive2://sunnyvale:12345/default;user=admin;password=a#b!c$e
URL-encode your password so that your JDBC URL looks like this:
jdbc:hive2://sunnyvale:12345/default;user=admin;password=a%23b%21c%24e
Tip: To secure sensitive information, you can use credential stores or runtime resources.
JDBC Driver Name The fully-qualified JDBC driver name.

Before using an Impala JDBC driver for the Hive Query executor, install the driver as an external library for the stage library used by the executor. For more information, see Installing the Impala Driver in the Data Collector documentation.

Use Credentials Enables entering credentials in properties. Use when you do not include credentials in the JDBC URL.
Note: To impersonate the current user in connections to Hive, you can edit the Data Collector configuration properties to configure Data Collector to automatically impersonate the user without specifying credentials in the pipeline. See Configuring Data Collector in the Data Collector documentation.
Username User name for the JDBC connection.

The user account must have the correct permissions or privileges in the database.

Password Password for the JDBC user name.
Tip: To secure sensitive information, you can use credential stores or runtime resources.
Additional JDBC Configuration Properties Additional JDBC configuration properties to pass to the JDBC driver.

Using simple or bulk edit mode, click Add to add additional properties and define the property name and value. Use the property names and values as expected by the JDBC driver.

Hadoop Configuration Directory

Absolute path to the directory containing the following Hive and Hadoop configuration files:

  • core-site.xml
  • hdfs-site.xml
  • hive-site.xml

Note: Properties in the configuration files are overridden by individual properties defined in the Additional Hadoop Configuration property.
Additional Hadoop Configuration

Additional properties to use.

Using simple or bulk edit mode, click Add to add additional properties and define the property name and value. Use the property names and values as expected by HDFS and Hive.