You can use CDC-enabled
origins and CRUD-enabled destinations in pipelines together or individually. Here are
some typical use cases:
- CDC-enabled origin with CRUD-enabled destinations
- You can use a CDC-enabled origin and a CRUD-enabled destination to easily
process changed records and write them to a destination system.
-
For example, say you want to write CDC data from Microsoft SQL Server to
Kudu. To do this, you use the CDC-enabled SQL Server CDC Client origin
to read data from a Microsoft SQL Server change capture table. The
origin places the CRUD operation type in the
sdc.operation.type
header attribute, in this case:
1 for INSERT, 2 for DELETE, 3 for UPDATE.
You configure the pipeline to write to the CRUD-enabled Kudu destination.
In the Kudu destination, you can specify a default operation for any
record with no value set in the sdc.operation.type
attribute, and you can configure error handling for invalid values. You
set the default to INSERT and you configure the destination to use this
default for invalid values. In the sdc.operation.type
attribute, the Kudu destination supports 1 for INSERT, 2 for DELETE, 3
for UPDATE, and 4 for UPSERT.
When you run the pipeline, the SQL Server CDC Client origin determines
the CRUD operation type for each record and writes it to the
sdc.operation.type
record header attribute. And the
Kudu destination uses the operation in the
sdc.operation.type
attribute to inform the Kudu
destination system how to process each record. Any record with an
undeclared value in the sdc.operation.type
attribute,
such as a record created by the pipeline, is treated like an INSERT
record. And any record with an invalid value uses the same default
behavior.
- CDC-enabled origin to non-CRUD destinations
-
If you need to write changed data to a destination system without a
CRUD-enabled destination, you can use an Expression Evaluator or
scripting processor to move the CRUD operation information from the
sdc.operation.type
header attribute to a field, so
the information is retained in the record.
For example, say you want to read from Oracle LogMiner redo logs and
write the records to Hive tables with all of the CDC information in
record fields. To do this, you'd use the Oracle CDC Client origin to
read the redo logs, then add an Expression Evaluator to pull the CRUD
information from the sdc.operation.type
header
attribute into the record. Oracle CDC Client writes additional CDC
information, such as the table name and scn, into
oracle.cdc
header attributes, so you can use
expressions to pull that information into the record as well. Then you
can use the Hadoop FS destination to write the enhanced records to
Hive.
- Non-CDC origin to CRUD destinations
- When reading data from a non-CDC origin, you can use the Expression
Evaluator or scripting processors to define the
sdc.operation.type
header attribute.
- For example, say you want to read from a transactional database table and
keep a dimension table in sync with the changes. You'd use the JDBC Query
Consumer to read the source table and a JDBC Lookup processor to check the
dimension table for the primary key value of each record. Then, based on the
output of the lookup processor, you know if there was a matching record in
the table or not. Using an Expression Evaluator, you set the
sdc.operation.type
record header attribute - 3 to
update the records that had a matching record, and 1 to insert new records.
- When you pass the records to the JDBC Producer destination, the destination
uses the operation in the
sdc.operation.type
header
attribute to determine how to write the records to the dimension table.