Hive
Available when using an authoring Data Collector version 5.0.0 or later.
- Cloudera CDP,
streamsets-datacollector-cdp_<version>-lib
- MapR with MEP,
streamsets-datacollector-mapr_<version>-mep<version>-lib
For a description of the Hive connection properties, see Hive Connection Properties.
After you create a Hive connection, you can use the connection in the following stages:
Engine | Stages |
---|---|
Data Collector 5.0.0 or later |
|
Hive Connection Properties
Hive Property | Description |
---|---|
JDBC URL |
JDBC URL for Hive. For details about specifying the URL, see our StreamSets Community post. You can optionally include the user name and
password in the JDBC URL. If you include a password with special
characters, you must URL-encode (also called percent-encoding) the
special characters. Otherwise errors will occur when validating or
running your pipeline. For example, if your JDBC URL looks like
this:
URL-encode
your password so that your JDBC URL looks like this:
Tip: To secure sensitive
information, you can use credential stores or runtime resources.
|
JDBC Driver Name | The fully-qualified JDBC driver name. Before using an Impala JDBC driver for the Hive Query executor, install the driver as an external library for the stage library used by the executor. For more information, see Installing the Impala Driver in the Data Collector documentation. |
Use Credentials | Enables entering credentials in properties. Use when you do not
include credentials in the JDBC URL. Note: To impersonate the current
user in connections to Hive, you can edit the Data Collector configuration
properties to configure Data Collector to
automatically impersonate the user without specifying credentials in
the pipeline. See Configuring Data Collector in the Data Collector
documentation. |
Username | User name for the JDBC connection. The user account must have the correct permissions or privileges in the database. |
Password | Password for the JDBC user name. Tip: To secure sensitive
information, you can use credential stores or runtime
resources. |
Additional JDBC Configuration Properties | Additional JDBC configuration properties to pass to the JDBC
driver. Using simple or bulk edit mode, click Add to add additional properties and define the property name and value. Use the property names and values as expected by the JDBC driver. |
Hadoop Configuration Directory | Absolute path to the directory containing the following Hive and Hadoop configuration files:
Note: Properties in the configuration files are overridden
by individual properties defined in the Additional Hadoop
Configuration property. |
Additional Hadoop Configuration |
Additional properties to use. Using simple or bulk edit mode, click Add to add additional properties and define the property name and value. Use the property names and values as expected by HDFS and Hive. |