Kudu

The Kudu destination writes data to a Kudu table. You can also use the destination to write to a Kudu table created by Impala.

The destination writes record fields to table columns by matching names. The Kudu destination can insert or upsert data to the table.

When you configure the Kudu destination, you specify the connection information for one or more Kudu masters. You configure the table and write mode to use. When needed, you can specify a maximum batch size for the destination.

You can also use a connection to configure the destination.

Note: Due to a Kudu limitation on Spark, pipeline validation does not validate Kudu stage configuration.

Configuring a Kudu Destination

Configure a Kudu destination to write to a Kudu table.

On the Properties panel, on the General tab, configure the following properties:


General Property	Description
Name	Stage name.
Description	Optional description.

On the Kudu tab, configure the following properties:


Kudu Property	Description
Connection	Connection that defines the information required to connect to an external system. To connect to an external system, you can select a connection that contains the details, or you can directly enter the details in the pipeline. When you select a connection, Control Hub hides other properties so that you cannot directly enter connection details in the pipeline.
6.1 and later Kudu Primary Nodes 6.0 Kudu Masters	Comma-separated list of Kudu primary nodes used to access the Kudu table. For each Kudu primary node, specify the host and port in the following format: `<host>:<port>`
Kudu Table	Name of the table to write to. To write to a Kudu table created by Impala, use the following format: `impala::default.<table name>`
Write Operation	Operation to perform when writing to Kudu: Insert - Inserts all data to the table. Upsert - Inserts new data to the table and updates existing data.

On the Advanced tab, optionally configure the following properties:


Advanced Property	Description
Write Batch Size	Maximum number of records to write to Kudu in a batch. `-1` uses the batch size configured for the Spark cluster.
Maximum Number of Worker Threads	Maximum number of threads to use to perform processing for the stage. Default is the Kudu default – twice the number of available cores on each processing node in the Spark cluster. Use this property to limit the number of threads that can be used. To use the Kudu default, set to 0.