Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Towards Transactional Buffering of Change Data ...

Avatar for Hans-Peter Grahsl Hans-Peter Grahsl
May 21, 2025
65

Towards Transactional Buffering of Change Data Capture Events @ Current 2025 London UK

Abstract

Data pipelines built on top of change data capture (CDC) are gaining ever more traction and power many different real-time applications these days. The standard way CDC solutions operate is to propagate captured data changes as separate events, which are typically consumed one by one and as is by downstream systems. In this talk, we are taking a deep dive to explore CDC pipelines for transactional systems to understand how the direct consumption of individually published CDC events impacts data consistency at the sink side of data flows. In particular, we'll learn why the lack of transactional boundaries in change event streams may well lead to temporarily inconsistent state--such as partial updates from multi-table transactions--that never existed in the source database. A promising solution to mitigate this issue is aggregating CDC events based on their original transactional context. To demonstrate the practical aspects of this approach, we'll go through a concrete end-to-end example showing:

* how to configure Debezium to enrich captured change events from a relational database with transaction-related metadata
* an experimental Apache Flink stream processing job to buffer CDC events based on transactional boundaries
* a bespoke downstream consumer to atomically apply transactional CDC event buffers into a target system

If you have ever wondered how to tackle the often-neglected problem of temporarily inconsistent state when consuming change event streams originating from relational databases, this session is for you!

Recording: pending...

Avatar for Hans-Peter Grahsl

Hans-Peter Grahsl

May 21, 2025
Tweet

More Decks by Hans-Peter Grahsl

Transcript

  1. DB Transactions and CDC START TRANSACTION; INSERT INTO inventory.customers VALUES

    (default, 'Issac', 'Fletcher', '[email protected]'); SET @customer_id = LAST_INSERT_ID(); INSERT INTO inventory.addresses VALUES (default, @customer_id, '1234 Nowhere Street', 'Great City', 'SomeState', '12345', 'LIVING'); COMMIT; @hpgrahsl (.bsky.social) — decodable.co | Current 2025 | May 21st | London UK
  2. We need to know... 1. when the transaction starts and

    ends respectively @hpgrahsl (.bsky.social) — decodable.co | Current 2025 | May 21st | London UK
  3. We need to know... 1. when the transaction starts and

    ends respectively 2. how many change events per involved table belong to the transaction @hpgrahsl (.bsky.social) — decodable.co | Current 2025 | May 21st | London UK
  4. We need to know... 1. when the transaction starts and

    ends respectively 2. how many change events per involved table belong to the transaction 3. to which transaction a change event refers to @hpgrahsl (.bsky.social) — decodable.co | Current 2025 | May 21st | London UK
  5. ! Challenges with TX-aware Buffering → transactions vary in size

    / shape → metadata and cdc events published separately → in general N+1 topics to correlate @hpgrahsl (.bsky.social) — decodable.co | Current 2025 | May 21st | London UK
  6. ! Challenges with TX-aware Buffering → requires TX ID-based (re)partitioning

    → CDC consumption happens non-deterministically → original transaction order must be retain in case of distributed processing @hpgrahsl (.bsky.social) — decodable.co | Current 2025 | May 21st | London UK
  7. ! Consuming and Applying TX Buffers 1. disassemble the TX

    buffer contents 2. start a transaction 3. apply all INSERT / UPDATE / DELETE statements 4. commit the transaction @hpgrahsl (.bsky.social) — decodable.co | Current 2025 | May 21st | London UK