Kafka
The Kafka destination writes data to a Kafka cluster. The destination supports Apache Kafka 0.10 and later. When using a Cloudera distribution of Apache Kafka, use CDH Kafka 3.0 or later.
The destination writes each record as a Kafka message to the specified topic. The Kafka cluster determines the number of partitions that the destination uses to write the data.
When you configure the Kafka destination, you specify the Kafka brokers that the destination connects to, the Kafka topic to write to, and the data format to use. You can configure the destination to connect securely to Kafka. You can configure the destination to pass values to Kafka as Kafka message keys. You can also specify additional Kafka configuration properties to pass to Kafka.
You can also use a connection to configure the origin.
Generated Messages
Each Kafka message contains two parts: an optional key and a required value. The Kafka destination generates a null value for the message key and writes the record data to the message value.
For example, let's say that a batch contains the following data:
order_id | customer_id | amount |
---|---|---|
1075623 | 2 | 34.56 |
1076645 | 47 | 234.67 |
1050945 | 342 | 126.05 |
Key | Value |
---|---|
null | {"order_id":1075623,"customer_id":2,amount":34.56} |
null | {"order_id":1076645,"customer_id":47,"amount":234.67} |
null | {"order_id":1050945,"customer_id":342,"amount":126.05} |
Kafka Security
You can configure the destination to connect securely to Kafka through SSL/TLS, SASL, or both. For more information about the methods and details on how to configure each method, see Security in Kafka Stages.
Data Formats
The Kafka destination writes records based on the specified data format.
- Avro
- The destination writes records based on the Avro schema. Note: To use the Avro data format, Apache Spark version 2.4 or later must be installed on the Transformer machine and on each node in the cluster.You can use one of the following methods to specify the location of the Avro schema definition:
- In Pipeline Configuration - Use the schema defined in the stage properties. Optionally, you can configure the destination to register the specified schema with Confluent Schema Registry at a URL with a schema subject.
- Confluent Schema Registry - Retrieve the schema from Confluent Schema Registry. Confluent Schema Registry is a distributed storage layer for Avro schemas. You specify the URL to Confluent Schema Registry and whether to look up the schema by the schema ID or subject.
You can also compress data with an Avro-supported compression codec.
- Delimited
- The destination writes a delimited message for every record. You can specify a custom delimiter, quote, and escape character to use in the data.
- JSON
- The destination writes a JSON line message for every record. For more information, see the JSON Lines website.
- Text
- The destination writes a message with a single String field for every record. When you configure the destination, you select the field to use.
Configuring a Kafka Destination
Configure a Kafka destination to write data to a Kafka cluster.