The HBase destination writes data to an HBase cluster. The destination can write data to HBase as text, binary data, or JSON strings. You can define the data format for each column written to HBase.
When you configure the HBase destination, you specify the HBase configuration properties, including the ZooKeeper Quorum, parent znode, and table name. You specify the row key for the table, and then map fields from the pipeline to HBase columns.
When necessary, you can enable Kerberos authentication and specify an HBase user. You can also configure a time basis and add additional HBase configuration properties.
When you configure the HBase destination, you map fields from records to HBase columns.
You can map fields to columns in the following ways:
<column-family>:<qualifier>
You can use Kerberos authentication to connect to HBase. When you use Kerberos authentication, Data Collector uses the Kerberos principal and keytab to connect to HBase. By default, Data Collector uses the user account who started it to connect.
The Kerberos principal and keytab are defined in the Data Collector configuration file, $SDC_CONF/sdc.properties. To use Kerberos authentication, configure all Kerberos properties in the Data Collector configuration file.
For more information about enabling Kerberos authentication for Data Collector, see Kerberos Authentication.
Data Collector can either use the currently logged in Data Collector user or a user configured in the destination to write to HBase.
A Data Collector configuration property can be set that requires using the currently logged in Data Collector user. When this property is not set, you can specify a user in the origin. For more information about Hadoop impersonation and the Data Collector property, see Hadoop Impersonation Mode.
Note that the destination uses a different user account to connect to HBase. By default, Data Collector uses the user account who started it to connect to external systems. When using Kerberos, Data Collector uses the Kerberos principal.
For more information, see the HBase documentation.
The time basis determines the timestamp value added for each column written to HBase.
You can use the following times as the time basis:
You can configure the HBase destination to use individual HDFS properties or HDFS configuration files:
Configure an HBase destination to write data to HBase.