Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Apache Kafka for Microservices

Apache Kafka for Microservices

Apache Kafka is one of the most useful OSS for the message hub in micro services and can easily connect various data sources.
This feature can be applied to the realization of Transactional Outbox and Strangler Fig patterns.

Kiminori Kurihara

October 26, 2021
Tweet

Other Decks in Technology

Transcript

  1. © Hitachi, Ltd. 2021. All rights reserved. Apache Kafka for

    Microservices 2021.10.29 Hitachi Ltd., R&D Group Kiminori Kurihara OSS Tech Talk
  2. © Hitachi, Ltd. 2021. All rights reserved. Summary Transactional Outbox

    pattern: one of methods for achieving distributed transaction. Kafka Connectivity ➢ Apache Kafka can easily connect to various data sources. ➢ Consistency with Kafka & DB event is useful for microservices. Use in microservices Development Migration Strangler Fig pattern: synchronizing DB for update and DB for query (CQRS pattern). Change data Caption(CDC) Kafka and Debezium • Low code: Using provided image & config files, Kafka connector can be built. • Source & Sink Connector: Data into Kafka (Source) & from Kafka (Sink) connector for data integration. • Various Data source: Self-developed service, managed service, DB and so on. Kafka Connector Kafka can connect various data sources.
  3. © Hitachi, Ltd. 2021. All rights reserved. Agenda 1. Background

    2. Apache Kafka 1. Apache Kafka Overview 2. High scalability 3. Fault tolerance 4. Loosely coupled 5. Connectivity 3. Distributed transaction in microservices 1. Database per service 2. Problem: Consistency 3. Solution: Distributed transaction 1. Saga pattern 2. TCC pattern 4. Saga vs. TCC 4. Using Kafka in microservices 1. Problem in Saga pattern 2. Transactional Outbox pattern 3. Debezium + Kafka 4. Transactional Outbox pattern with Debezium 1. Sample 5. Other use case 1. Strangler Fig pattern 5. Conclusion
  4. © Hitachi, Ltd. 2021. All rights reserved. Agenda 1. Background

    2. Apache Kafka 1. Apache Kafka Overview 2. High scalability 3. Fault tolerance 4. Loosely coupled 5. Connectivity 3. Distributed transaction in microservices 1. Database per service 2. Problem: Consistency 3. Solution: Distributed transaction 1. Saga pattern 2. TCC pattern 4. Saga vs. TCC 4. Using Kafka in microservices 1. Problem in Saga pattern 2. Transactional Outbox pattern 3. Debezium + Kafka 4. Transactional Outbox pattern with Debezium 1. Sample 5. Other use case 1. Strangler Fig pattern 5. Conclusion
  5. © Hitachi, Ltd. 2021. All rights reserved. 1. Background ➢

    Why can Apache Kafka be adopted for the hub of microservices architecture? https://www.confluent.jp/what-is-apache-kafka/ Microservices Architecture • Divide the system into smaller units called services and loosely couple them. Overview • Improve the agility of business speed by allowing systems to change flexibly. Merit Hub • There are many examples of microservices using Apache Kafka as a hub. Why?
  6. © Hitachi, Ltd. 2021. All rights reserved. Agenda 1. Background

    2. Apache Kafka 1. Apache Kafka Overview 2. High scalability 3. Fault tolerance 4. Loosely coupled 5. Connectivity 3. Distributed transaction in microservices 1. Database per service 2. Problem: Consistency 3. Solution: Distributed transaction 1. Saga pattern 2. TCC pattern 4. Saga vs. TCC 4. Using Kafka in microservices 1. Problem in Saga pattern 2. Transactional Outbox pattern 3. Debezium + Kafka 4. Transactional Outbox pattern with Debezium 1. Sample 5. Other use case 1. Strangler Fig pattern 5. Conclusion
  7. © Hitachi, Ltd. 2021. All rights reserved. 2-1. Apache Kafka

    Overview ➢ Apache Kafka is one of the most famous OSSs for distributed event streaming platform. • Kafka can easily handle dozens of nodes and distributed processing by "partitions" 1. High scalability 1 2 3 4 5 • One message is distributed into several partitions. • Messages can be sent by several semantics. 2. Fault tolerance • Services are loosely coupled by using Kafka as message hub. • Kafka has a rebalancing function for service configuration changes. 3. Loosely coupled • Kafka ecosystem provides low-code integration functions to connect to external components. 4. Connectivity Used by thousands of companies for • high-performance data pipelines • streaming analytics • data integration • mission-critical applications.
  8. © Hitachi, Ltd. 2021. All rights reserved. 2-2. High scalability

    1. Kafka has “Topic” consisted of several partitions. 1 2 3 4 5 ➢ Kafka distributes partitions across multiple brokers to hold messages for high scalability. Produces messages to Topic Subscribe messages from Topic 2. One partition is distributed to several broker nodes. 3. Messages are performed on a per-partition. -> Then the performance is scalable. Kafka messaging
  9. © Hitachi, Ltd. 2021. All rights reserved. 2-3. Fault tolerance

    One message is replicated into several partitions against message lost. Message store 1 2 3 4 5 ➢ One message is distributed into several partitions. ➢ Messages can be sent by several semantics. Types of message semantics At most once Exactly once At least once A message is delivered to consumers at most once. -> the message can be lost. A message is delivered to consumers at least once. -> the message can be delivered many times. A message is delivered to consumers exactly once. Attention) this is guaranteed only inside Kafka. - A producer can produce the same message many times. - A consumer cannot send ACK to Kafka after processing the message (e.g. NW disconnection) Designing idempotent is necessary.
  10. © Hitachi, Ltd. 2021. All rights reserved. 2-4. Loosely coupled

    1 2 3 4 5 ➢ Services are loosely coupled by using Kafka as message hub. ➢ Kafka has a rebalancing function for configuration changes. Service configuration changes don’t influence other services. New service If there is not a messaging hub, connected services must change after adding new service. Rebalancing Kafka automatically adapts the configuration changes inside Service. Loose Coupling between services. Loose Coupling between service and Kafka.
  11. © Hitachi, Ltd. 2021. All rights reserved. 2-5. Apache Kafka

    - Connectivity Source Connector 1 2 3 4 5 ➢ Kafka ecosystem provides low-code integration functions to connect to external components (Kafka Connect). DB Service worker 1 worker 2 worker 3 Sink Connector worker 1 worker 2 worker 3 DB Service Managed service Managed service Kafka Need provided image & config file to build Image files are mainly provided by Confluent or Github.
  12. © Hitachi, Ltd. 2021. All rights reserved. Agenda 1. Background

    2. Apache Kafka 1. Apache Kafka Overview 2. High scalability 3. Fault tolerance 4. Loosely coupled 5. Connectivity 3. Distributed transaction in microservices 1. Database per service 2. Problem: Consistency 3. Solution: Distributed transaction 1. Saga pattern 2. TCC pattern 4. Saga vs. TCC 4. Using Kafka in microservices 1. Problem in Saga pattern 2. Transactional Outbox pattern 3. Debezium + Kafka 4. Transactional Outbox pattern with Debezium 1. Sample 5. Other use case 1. Strangler Fig pattern 5. Conclusion
  13. © Hitachi, Ltd. 2021. All rights reserved. 3-1. Database per

    service Monolith 1 2 3 4 ➢ In microservices architecture, each service typically has its own database for loosely coupling. Microservices Component 1 Component 2 Component 3 Component 4 DB Service 1 DB 1 Service 2 DB 2 Service 3 DB 3 • There is one DB. • All components can access it. • Each service has its own DB. • Each DB can be accessed only by the relevant service.
  14. © Hitachi, Ltd. 2021. All rights reserved. 3-2. Problem: consistency

    1 2 3 4 ➢ When multiple DBs are updated, consistent state of the data is not maintained. ⇒ Commit (at step #2) can’t be rollbacked Service 1 Service 2 1. Request 2. Update & Commit 3. Request 5. Failure Occur 6. Rollback 4. Update DB 1 DB 2 Problem Multiple DBs cannot be committed at the same time.
  15. © Hitachi, Ltd. 2021. All rights reserved. 3-3. Solution: Distributed

    transaction Saga pattern ➢ Typical solution of this issue is distributed transaction, just like Saga / TCC pattern does TCC(try confirm-cancel) pattern • Update is committed in each service, respectively. • Compensation transaction in failure cancels the committed data. • 2 phase across services: (i) commit OK/NG check (ii) commit request after all services responding OK 1 2 3 4
  16. © Hitachi, Ltd. 2021. All rights reserved. 3-3-1. Saga pattern

    ➢ Solution 1. Saga pattern: Send compensating transaction in failure occurred. Service 1 Service 2 1. Request 2.Update & Commit 3. Request 5. Failure Occur 6. Rollback 7. Compensating Transaction Request 8. Commit for Canceling step #2 4. Update DB 1 DB 2 Solution Send compensating transaction in failure cancels the committed data. 1 2 3 4
  17. © Hitachi, Ltd. 2021. All rights reserved. 3-3-2. TCC pattern

    – normal case ➢ Solution 2. TCC pattern: 2 phase commit across services. DB 1 DB 2 Solution 2 phase( (i) commit OK/NG check, (ii) confirm/cancel request) across services. Service 1 Service 2 1. Request 3. Request 5. Commit OK 6. Commit 4. Commit Preparation 8. Commit 7. Commit Direction 2. Commit Preparation 1 2 3 4
  18. © Hitachi, Ltd. 2021. All rights reserved. 3-3-2. TCC pattern

    – failure case ➢ Solution 2. TCC pattern: 2 phase commit across services. DB 1 DB 2 Solution 2 phase( (i) commit OK/NG check, (ii) confirm/cancel request) across services. Service 1 Service 2 1. Request 3. Request 7. Commit NG 8. Cancel Commit Preparation 4. Commit Preparation 6. Rollback 2. Commit Preparation 5. Failure occur 1 2 3 4
  19. © Hitachi, Ltd. 2021. All rights reserved. 3-4. Saga vs.

    TCC 1 2 3 4 ➢ We evaluated that Saga is superior for large system from viewpoints about throughput and loose coupling. Saga TCC Process Outline Compensation transaction 2 phase commit Throughput Good 1 phase process in normal Not Good Always 2 phase process Loose coupling Good Independent commit at respective services Not Good Commit after checks at all other services Consistency Not Good Inconsistent state can exist Good Only temporary data can exist Problem For early resolution of inconsistent state, (i) data commit and (ii) requests to other services in individual services must be done without conflict. The larger the system size, the greater the degree of performance problem.
  20. © Hitachi, Ltd. 2021. All rights reserved. Agenda 1. Background

    2. Apache Kafka 1. Apache Kafka Overview 2. High scalability 3. Fault tolerance 4. Loosely coupled 5. Connectivity 3. Distributed transaction in microservices 1. Database per service 2. Problem: Consistency 3. Solution: Distributed transaction 1. Saga pattern 2. TCC pattern 4. Saga vs. TCC 4. Using Kafka in microservices 1. Problem in Saga pattern 2. Transactional Outbox pattern 3. Debezium + Kafka 4. Transactional Outbox pattern with Debezium 1. Sample 5. Other use case 1. Strangler Fig pattern 5. Conclusion
  21. © Hitachi, Ltd. 2021. All rights reserved. 4-1. Problem in

    Saga pattern ➢ (i) data commit and (ii) requests to other services in individual services can be temporarily inconsistent. Service 1 1. Request 2.Update & Commit 3. Request DB 1 Problem & Solution Process 2 & 3 must be done without conflict. => Kafka is suitable for the solution. After 2, the system state becomes temporarily inconsistent. Service 2 2 & 3 must be processed without conflict. if 2 fails => 3 must not be processed. if 2 successes => 3 must be processed. Kafka 1 2 3 4 5
  22. © Hitachi, Ltd. 2021. All rights reserved. 4-2. Transactional outbox

    pattern ➢ One of the implementation method for Saga pattern is transactional outbox pattern. Message relay Updating DB & request to other services are processed without conflict using outbox table. 3. Read RDB Transactional outbox pattern 1. Service 1 updates business data table. 2. Service 1 inserts into the outbox table and commits. 3. Message relay reads the outbox table. 4. The data of outbox table is sent to other services by the Message relay. Other services 6 6 business data outbox table 1. CUD 2. C Service 1 Sequence 4. Request Is there any tool? 1 2 3 4 5
  23. © Hitachi, Ltd. 2021. All rights reserved. 4-3. Debezium +

    Kafka ➢ z “ ” “ ” Debezium connector system https://access.redhat.com/documentation/ja-jp/red_hat_integration/2020-q2/html-single/debezium_user_guide/index • Debezium is OSS about distributed platform for Change Data Capture(CDC). • Debezium provides Kafka Connector -> “Debezium connector”. • Debezium connector converts “update of RDB” to “event to Kafka”. 1 2 3 4 5
  24. © Hitachi, Ltd. 2021. All rights reserved. 4-4. Transactional outbox

    pattern with Debezium ➢ Debezium provides the Kafka connector for the message relay of transactional outbox pattern and the library for services. Debezium Connector Kafka 4. Produce RDB Transactional outbox pattern with Debezium connector 1. Service 1 update its DB(business data). 2. The library insert updating info. Into outbox table. 3. 1&2 is committed at the same time. 4. Debezium connector detects the change of outbox table. 5. the data of outbox table is sent to Kafka by Debezium connector. Other services 5. Consume 6 6 business data outbox table 1. CUD 2. C debezium-quarkus-outbox library Service 1 Sequence 1 2 3 4 5
  25. © Hitachi, Ltd. 2021. All rights reserved. 4-4-1. Sample ➢

    A sample of the transactional outbox pattern is provided in Github (E-commerce). https://github.com/debezium/debezium-examples/tree/master/saga 1 2 3 4 5
  26. © Hitachi, Ltd. 2021. All rights reserved. 4-4-1-1. Sample -

    outbox table ➢ The outbox table has metadata and payload about updating. ➢ One of metadata indicates the Kafka topic to be sent. # Column Content Example 1 ID ID of outbox table f9a78388-883a-45d7-b4f6-7d47f8c89187 2 AGGREGAT ETYPE Process name used for indicating Kafka topic (defined by business logic) credit-approval 3 AGGREGATE ID ID of process event (defined by business logic) 57974d15-a2f5-49a7-b18a-71c636193bdd 4 TYPE Request type in process (defined by business logic) Request 5 PAYLOAD updating information about business data. (omitted) 6 TIMESTAMP timestamp of request 2021-08-16 07:12:42.605371 7 TRACING SPAN CONTEXT ID for distributed tracing (Jaeger) uber-trace- id=76d506f0a064b52d¥:a7c822eda5e591 86¥:76d506f0a064b52d¥:1 1 2 3 4 5
  27. © Hitachi, Ltd. 2021. All rights reserved. 4-4-1-2. Sample –

    compensation transaction ➢ When order service receives the error response from one service, it sends the compensating transaction to the other. Order service Kafka Customer service Payment service 1. requests 2. request 3. request 5. error 4. success 6. responses 7. compensating transaction request 8. compensating transaction request Communication for compensating transaction case The library sends compensating transactions to the other services except for the service where error occurred. 1 2 3 4 5
  28. © Hitachi, Ltd. 2021. All rights reserved. 4-5. Other use

    case ➢ Debezium connector is utilized in loosely coupling the monolithic system. 6 Data of Service 1 Transaction Log 1. CUD Debezium Connector Kafka 2. Replica 3. Produce RDB ・・・ Service 1 Debezium Connector System Other services 4. Consume Debezium Connector receive whole transaction logs from RDB, by setting it as replica of RDB. ’ in migration. 1 2 3 4 5
  29. © Hitachi, Ltd. 2021. All rights reserved. 4-5-1. Other use

    case – strangler fig pattern ➢ Strangler Fig Pattern: Functions in legacy system are gradually split into services. CRUD CDC-based Strangler Fig Pattern Example: separating update and query processes in monolith Component 1 Component 2 Component 3 Component 4 RDB CUD Component 1 Component 2 Component 3 Component 4 RDB Loosely coupled (DBs are constructed by CQRS pattern) Proxy R Data Source Kafka Debezium Connector Connector Update Query Legacy Monolith 1 2 3 4 5
  30. © Hitachi, Ltd. 2021. All rights reserved. Agenda 1. Background

    2. Apache Kafka 1. Apache Kafka Overview 2. High scalability 3. Fault tolerance 4. Loosely coupled 5. Connectivity 3. Distributed transaction in microservices 1. Database per service 2. Problem: Consistency 3. Solution: Distributed transaction 1. Saga pattern 2. TCC pattern 4. Saga vs. TCC 4. Using Kafka in microservices 1. Problem in Saga pattern 2. Transactional Outbox pattern 3. Debezium + Kafka 4. Transactional Outbox pattern with Debezium 1. Sample 5. Other use case 1. Strangler Fig pattern 5. Conclusion
  31. © Hitachi, Ltd. 2021. All rights reserved. 5. Conclusion suitable

    for microservices message hub Kafka Characteristics ➢ The Apache Kafka with Debezium is useful for microservices arch. on not only dev. phase but migration phase. (1) Updating data & (2)requests to other services become consistent Transactional Outbox pattern Strangler Fig pattern Functions in legacy system are gradually split into services z z z 1. High scalability 2. Fault tolerance 3. Loosely coupled 4. Connectivity Especially important
  32. © Hitachi, Ltd. 2021. All rights reserved. Trademark • Apache

    Hadoop®, Hadoop, Apache Kafka®, Kafka, Apache ZooKeeper™, ZooKeeper, and associated open-source project names are trademarks of the Apache Software Foundation. • Debezium name and logo are trademarks of Red Hat. • Elasticsearch is a trademark of Elasticsearch BV, registered in the U.S. • MySQL, and Oracle are registered trademarks of Oracle and/or its affiliates. • Twitter is registered trademark of Twitter, Inc. • All other company names, product names, service names, and other proper nouns mentioned herein are trademarks or registered trademarks of their respective companies • TM and ® marks are not indicated in the text and figures in this presentation