耐障害性・データの堅牢性 • 特定のインフラベンダーに依存し ない Kafkaの利点 Benefits of Kafka • Producing and consuming are loosely coupled ◦ Multiple consumers can receive a produced message unlike a traditional MQ (point-to-point communication) • Scalability • Fault tolerance, data durability • Independent on infrastructure vendors
process to produce messages to a Kafka Topic ◦ Kafka producer client implementation ◦ Business logic specific to our company • On the producer side, messages can be produced from an HTTP client instead of a Kafka client • Kafka Topicにメッセージを produceする処理の抽象化 ◦ Kafka producerクライアント の実装 ◦ 弊社特有のビジネスロジック • Producer側では、Kafka clientの 代わりにHTTP clientからメッセー ジをproduceできる
• Reduction of learning curve • Reduction of development cost Cost • Decline in performance capability: producer batching cannot be used Kafka HTTP Proxy – 利点と対価 Kafka HTTP Proxy – Cost and Benefits
を保つために有効 • Error handling ◦ Synchronously return errors when failed to produce a message ◦ Convert Kafka errors to HTTP status codes ◦ Useful for ensuring data consistency on the product side Kafka HTTP Proxy – 機能要件 (5) Kafka HTTP Proxy – Functional Requirements (5)
Requirements • 高可用性(冗長性) ◦ 単一障害点にならないこと ◦ Multi-AZ • スケーラビリティ・パフォーマンス ◦ ロードバランシング ◦ オートスケーリング • 監視とログ • High availability (redundancy) ◦ Never introducing a single point of failure ◦ Multi-AZ • Scalability and performance ◦ Load balancing ◦ Auto scaling • Monitoring and logging
2. bytes_out_per_sec: あるトピッ クで1秒あたりにconsumeしたバイ ト数 3. Produceからconsumeまでにかか る時間 4. Consumeから処理完了までにかか る時間 Kafka Consumerの監視 Monitoring of a Kafka Consumer Amazon MSK metrics + Datadog 1. max_offset_lag: The maximum offset lag (the number of waiting messages) across all partitions in a topic 2. bytes_out_per_sec: The byte size consumed per second in a topic 3. Producer-to-consumer latency 4. Consumer-to-processing latency
Pill” への対処 How to Address “a Poison Pill?” • Poison pill = “a record that has been produced to a Kafka topic and always fails when consumed, no matter how many times it is attempted.” – Confluent.io
ジ送信があることを知らせる • プロダクトチームはDLQを確認 し、手動でリトライする ※DLQのメッセージは自動リトライで きないため “Poison Pill” への対処 How to Address “a Poison Pill?” • The consumer application stores the invalid contents in the dead letter queue (DLQ henceforth) • The consumer application notifies the product team about the DLQ • The product teams check the DLQ and retry producing manually Note: DLQ messages cannot be retried automatically
るかもしれない 3. どの程度の抽象化をプロダクト向 けに提供するのが適切か? 1. How to guarantee that a consumer properly retries a failed message without blocking the processing of subsequent messages? 2. Not in production use yet; there may be potential performance issues 3. What is the right level of abstraction to present to the product teams?
トが利用する基盤では、イベ ントストリーミングにKafkaを 使用する ◦ 理由:イベントストリーミン グサービスを統一すること で、プロダクト開発チームの 負担を減らすため Why We Adopt Kafka • Ref. "Benefits of Kafka" in the first half of the document • "The Design Guidelines for Platform Services" of Money Forward Cloud ◦ Guideline: For platforms used by multiple products, use Kafka for event streaming. ◦ Reason: To reduce the burden on product development teams by standardizing the event streaming service.