Kudu
Available when using an authoring Data Collector version 4.0.0 or later.
To create a Kudu connection, the Cloudera CDP stage library,
streamsets-datacollector-cdp_<version>-lib
, must be installed on
the selected authoring Data Collector.
For a description of the Kudu connection properties, see Kudu Connection Properties.
Engine | Stages |
---|---|
Data Collector 4.0.0 or later |
|
Transformer 4.0.0 or later |
|
Kudu Connection Properties
Kudu Property | Description |
---|---|
Kudu Masters | Comma-separated list of Kudu masters used to access the Kudu
table. For each Kudu master, specify the host and port in the
following format:
|
Optionally, configure the following properties on the Advanced tab.
Advanced Property | Description |
---|---|
Maximum Number of Worker Threads |
Maximum number of threads to use to perform processing for the stage. Default is the Kudu default – twice the number of available cores on the processing machine. For a Data Collector pipeline, the processing machine is the Data Collector machine. For a Transformer pipeline, the processing machine is each node in the Spark cluster. Use this property to limit the number of threads that can be used. To use the Kudu default, leave 0. |
Operation Timeout (milliseconds) | Number of milliseconds to allow for
operations such as writes or lookups. Default is 10000, or 10 seconds. Note: Used in Data Collector pipelines only.
|
Admin Operation Timeout (milliseconds) | Number of milliseconds to allow for
admin-type operations, such as opening a table or getting a table
schema. Default is 30000, or 30 seconds. Note: Used in Data Collector pipelines only.
|