Kafka Cluster Requirements

Cluster mode pipelines that read from a Kafka cluster have the following requirements:


Component	Requirement
Spark Streaming for cluster streaming modes	Spark version 2.1 or later
Apache Kafka	Spark Streaming on YARN requires a Cloudera or Hortonworks distribution of an Apache Kafka cluster version 0.10.0.0 or later.

Note: By default, a Cloudera CDH cluster sets the Kafka-Spark integration version as 0.9. However, Data Collector cluster streaming pipelines require version 0.10 of the Kafka-Spark integration. As a result, the SPARK_KAFKA_VERSION environment variable is set to 0.10 by default in the Data Collector environment configuration file - sdc.env.sh or sdcd.env.sh. Do not change this environment variable value.