Kudu
Supported pipeline types:
|
When you configure the Kudu destination, you specify the connection information for one or more Kudu masters, define the table to use, and optionally define field mappings. By default, the destination writes field data to columns with matching names. You can also enable Kerberos authentication.
The Kudu destination can use CRUD operations defined in the
sdc.operation.type
record header attribute to write
data. You can define a default operation for records without the header
attribute or value. You can also configure how to handle records with
unsupported operations.
For information about Data Collector change data
processing and a list of CDC-enabled origins, see Processing Changed Data.
If the destination receives a change data capture log from some origin systems, you must select the format of the change log.
You can configure the external consistency mode, operation timeouts, and the maximum number of worker threads to use.
You can also use a connection to configure the destination.
Define the CRUD Operation
The Kudu destination can insert, update, delete, or upsert data. The destination writes the records based on the CRUD operation defined in a CRUD operation header attribute or in operation-related stage properties.
You define the CRUD operation in the following ways:
- CRUD record header attribute
- You can define the CRUD
operation in a CRUD operation record header attribute. The
destination looks for the CRUD operation to use in the
sdc.operation.type
record header attribute. - Operation stage properties
- You define a default operation in the destination
properties. The destination uses the default operation when the
sdc.operation.type
record header attribute is not set.
Kudu Data Types
The Kudu destination converts Data Collector data types to the following compatible Kudu data types:
Data Collector Data Type | Kudu Data Type |
---|---|
Boolean | Bool |
Byte | Int8 |
Byte Array | Binary |
Decimal | Decimal. Available in Kudu version 1.7 and later. If using an earlier version of Kudu, configure your pipeline to convert the Decimal data type to a different Kudu data type. |
Double | Double |
Float | Float |
Integer | Int32 |
Long | Int64 or Unixtime_micros. The destination determines the data
type to use based on the mapped Kudu column. The Data Collector Long data type stores millisecond values. The Kudu Unixtime_micros data type stores microsecond values. When converting to the Unixtime_micros data type, the destination multiplies the field value by 1,000 to convert the value to microseconds. |
Short | Int16 |
String | String |
- Character
- Date
- Datetime
- List
- List-Map
- Map
- Time
Kerberos Authentication
You can use Kerberos authentication to connect to a Kudu cluster. When you use Kerberos authentication, Data Collector uses the Kerberos principal and keytab to connect to Kudu. By default, Data Collector uses the user account who started it to connect.
The Kerberos principal and keytab are defined in Data Collector configuration file,
$SDC_CONF/sdc.properties
. To use Kerberos authentication, configure all Kerberos properties in the Data Collector
configuration file.
For more information about enabling Kerberos authentication for Data Collector, see Kerberos Authentication in the Data Collector documentation.
Configuring a Kudu Destination
Configure a Kudu destination to write to a Kudu cluster.