Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Towards Transactional Buffering of CDC Events @...

Towards Transactional Buffering of CDC Events @ Flink Forward 2025 Barcelona Spain

Abstract:

Data pipelines built on top of change data capture (CDC) are gaining ever more traction and power many different real-time applications these days. The standard way CDC solutions operate is to propagate captured data changes as separate events, which are typically consumed one by one and as is by downstream systems.

In this talk, we are taking a deep dive to explore CDC pipelines for transactional systems to understand how the direct consumption of individually published CDC events impacts data consistency at the sink side of data flows. In particular, we'll learn why the lack of transactional boundaries in change event streams may well lead to temporarily inconsistent state‚ such as partial updates from multi-table transactions‚ that never existed in the source database.

A promising solution to mitigate this issue is aggregating CDC events based on their original transactional context. To demonstrate the practical aspects of this approach, we'll go through a concrete end-to-end example showing:

* how to configure Debezium to enrich captured change events from a relational database with transaction-related metadata
* an experimental Apache Flink stream processing job to buffer CDC events based on transactional boundaries
* a bespoke downstream consumer to atomically apply transactional CDC event buffers into a target system

If you have ever wondered how to tackle the often-neglected problem of temporarily inconsistent state when consuming change event streams originating from relational databases, this session is for you!

Recording: pending...

Avatar for Hans-Peter Grahsl

Hans-Peter Grahsl

October 15, 2025
Tweet

More Decks by Hans-Peter Grahsl

Other Decks in Programming

Transcript

  1. DB Transactions and CDC START TRANSACTION; INSERT INTO inventory.customers VALUES

    (default, 'Issac', 'Fletcher', '[email protected]'); SET @customer_id = LAST_INSERT_ID(); INSERT INTO inventory.addresses VALUES (default, @customer_id, '1234 Nowhere Street', 'Great City', 'SomeState', '12345', 'LIVING'); COMMIT; @hpgrahsl (.bsky.social) | Flink Forward 2025 | Oct 15th | Barcelona
  2. We need to know... 1. when the transaction starts and

    ends respectively @hpgrahsl (.bsky.social) | Flink Forward 2025 | Oct 15th | Barcelona
  3. We need to know... 1. when the transaction starts and

    ends respectively 2. how many change events per involved table belong to the transaction @hpgrahsl (.bsky.social) | Flink Forward 2025 | Oct 15th | Barcelona
  4. We need to know... 1. when the transaction starts and

    ends respectively 2. how many change events per involved table belong to the transaction 3. to which transaction a change event refers to @hpgrahsl (.bsky.social) | Flink Forward 2025 | Oct 15th | Barcelona
  5. ! Challenges with TX-aware Buffering in general: → transactions vary

    in size and shape → metadata and CDC events published separately → N+1 topics to correlate @hpgrahsl (.bsky.social) | Flink Forward 2025 | Oct 15th | Barcelona
  6. ! Challenges with TX-aware Buffering in case of distributed processing:

    → requires TX ID-based (re)partitioning → CDC consumption happens non-deterministically → original transaction order must be retain @hpgrahsl (.bsky.social) | Flink Forward 2025 | Oct 15th | Barcelona
  7. ! Consuming and Applying TX Buffers 1. disassemble the TX

    buffer contents 2. start a transaction 3. apply all INSERT / UPDATE / DELETE statements 4. commit the transaction @hpgrahsl (.bsky.social) | Flink Forward 2025 | Oct 15th | Barcelona