MapR DB CDC

Data Collector

The MapR DB CDC origin reads changed data from MapR DB that has been written to MapR Streams. The origin can use multiple threads to enable parallel processing of data. For information about supported versions, see Supported Systems and VersionsSupported Systems and Versions in the Data Collector documentation.

You might use this origin to perform database replication. You can use a separate pipeline with the MapR DB JSON origin to read existing data. Then start a pipeline with the MapR DB CDC origin to process subsequent changes.

When you configure a MapR DB CDC origin, you configure the MapR Streams consumer group name and topics to process, and the number of threads to use. You can specify additional MapR Streams and supported Kafka configuration properties as needed.

The MapR DB CDC origin includes the CRUD operation type in a record header attribute so generated records can be easily processed by CRUD-enabled destinations. For an overview of Data Collector changed data processing and a list of CRUD-enabled destinations, see Processing Changed Data.

Tip: Data Collector provides several MapR origins to address different needs. For a quick comparison chart to help you choose the right one, see Comparing MapR Origins.

Before you use any MapR stage in a pipeline, you must perform additional steps to enable Data Collector to process MapR data. For more information, see MapR PrerequisitesMapR Prerequisites in the Data Collector documentation.