Kafka Cluster Requirements
Cluster mode pipelines that read from a Kafka cluster have the following
requirements:
Component | Requirement |
---|---|
Spark Streaming for cluster streaming modes | Spark version 2.1 or later |
Apache Kafka | Spark Streaming on YARN requires a Cloudera or Hortonworks distribution of an Apache Kafka cluster version 0.10.0.0 or later. |
Note: By default, a Cloudera CDH cluster sets the Kafka-Spark integration version as
0.9. However, Data Collector
cluster streaming pipelines require version 0.10 of the Kafka-Spark integration. As
a result, the SPARK_KAFKA_VERSION environment variable is set to 0.10 by default in
the Data Collector
environment configuration file -
sdc.env.sh
or sdcd.env.sh
. Do not change this environment variable
value.