Kafka Cluster Requirements
Cluster mode pipelines that read from a Kafka cluster
have the following requirements:
Component | Requirement |
---|---|
Spark Streaming for cluster streaming modes | Spark version 2.1 or later |
Apache Kafka | Spark Streaming on YARN requires a Cloudera or Hortonworks
distribution of an Apache Kafka cluster version 0.10.0.0 or
later. Spark Streaming on Mesos requires Apache Kafka on Apache Mesos. |
Note: By default, a Cloudera CDH cluster sets the Kafka-Spark integration version as
0.9. However, Data Collector
cluster streaming pipelines require version 0.10 of the Kafka-Spark integration. As
a result, the SPARK_KAFKA_VERSION environment variable is set to 0.10 by default in
the Data Collector
environment configuration fileenvironment configuration file -
sdc.env.sh
or sdcd.env.sh
. Do not change this environment variable
value.