Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Kafka, the hard parts

Kafka, the hard parts

This talk tries to summarize a lot of the lessons I've learned building systems on kafka.

Chris Keathley

January 10, 2019
Tweet

More Decks by Chris Keathley

Other Decks in Programming

Transcript

  1. Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems

    and the joys of functional programming Data Validation Finding Errors Monitoring Capacity Planning #hottakes
  2. Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems

    and the joys of functional programming Data Validation Finding Errors Monitoring Capacity Planning #hottakes
  3. Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems

    and the joys of functional programming Data Validation Finding Errors Monitoring Capacity Planning #hottakes
  4. Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems

    and the joys of functional programming Data Validation Finding Errors Monitoring Capacity Planning #hottakes
  5. Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems

    and the joys of functional programming Data Validation Finding Errors Monitoring Capacity Planning #hottakes
  6. Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems

    and the joys of functional programming Data Validation Finding Errors Monitoring Capacity Planning #hottakes
  7. Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems

    and the joys of functional programming Data Validation Finding Errors Monitoring Capacity Planning #hottakes
  8. Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems

    and the joys of functional programming Data Validation Finding Errors Monitoring Capacity Planning #hottakes
  9. 0 <= 1 <= n Delivery At least once At

    most once Impossible-ish
  10. You

  11. Idempotence: …the property of certain operations in mathematics and computer

    science whereby they can be applied multiple times without changing the result beyond the initial application.
  12. smtp send_email Sending Emails email id: 1 email id: 2

    email id: 3 What do we do if this fails?
  13. send_email Sending Emails email id: 1 If we see this

    message again move it to an audit topic
  14. Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems

    and the joys of functional programming Data Validation Finding Errors Monitoring Capacity Planning #hottakes
  15. Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems

    and the joys of functional programming Data Validation Finding Errors Monitoring Capacity Planning #hottakes
  16. { msg_id: String, type: String, data: { user_id: Integer, msg:

    String } } Data payloads None of this tells you anything useful about your data
  17. { msg_id: String, type: String, data: { user_id: Integer, msg:

    String } } Data payloads What do we do when these things change?
  18. { msg_id: String, type: String, data: { user_id: String, msg:

    String } } Data payloads What do we do when these things change?
  19. { msg_id: String, type: String, data: { user_id: String, msg:

    String } } Data payloads Lets just use versions!
  20. { msg_id: String, type: String, data: { user_id: String, msg:

    String } } Data payloads Lets just use versions! (spoiler: this isn’t great)
  21. { msg_id: String, type: String, data: { user_id: String, msg:

    String }, meta: { version: 2 } } Data payloads
  22. Data Versions Consumer v1 v1 v1 v1 v2 This consumer

    needs to understand both versions
  23. Data Versions Consumer v1 v1 v1 v1 v2 This team

    needs to know to make these changes
  24. { msg_id: String, type: String, data: { user_id: Integer, msg:

    String } } Data payloads What are these?
  25. { msg_id: String, type: String, data: { user_id: Integer, msg:

    String } } Data payloads What are these?
  26. UUID = string? & re_matches?(/^[0-9A-F]{8}-[0-9A-F] {4}-4[0-9A-F]{3}-[89AB][0-9A-F]{3}-[0-9A-F]{12}$/i) ) CommentCreated = schema{

    req :msg_id, UUID req :type, lit(“comment.created”) req :data, schema { req :user_id, integer? | UUID req :msg, string? } } Data payloads
  27. json = {type: “comment.created”, msg: “Hello world”} Norm.decode(CommentEvent, json) =>

    {:ok, data} Norm.decode(CommentEvent, {}) => {:error, errors}
 Norm.explain(CommentEvent, {}) => "In :msg_id, val: {} fails spec: required In :type, val: {} fails spec: required In :data, val: {} fails spec: required" Data payloads
  28. CommentEvent = schema{ req :type, lit(“comment.created”) req :msg, string? }

    json = { type: “comment.created”, msg: “Hello world”, data: { msg: “Hello world” } } Norm.decode(CommentEvent, json) => {:ok, data} Norm is extensible
  29. CommentEvent = schema{ req :type, lit(“comment.created”) req :msg, string? }

    json = { type: “comment.created”, msg: “Hello world”, data: { msg: “Hello world” } } Norm.decode(CommentEvent, json) => {:ok, data} Norm is extensible This will still get passed through
  30. Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems

    and the joys of functional programming Data Validation Finding Errors Monitoring Capacity Planning #hottakes
  31. Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems

    and the joys of functional programming Data Validation Finding Errors Monitoring Capacity Planning #hottakes
  32. Property based testing Database Consumer id: 1 id: 2 id:

    3 id: 1 Information should end up here
  33. Property based testing Database Consumer id: 1 id: 2 id:

    3 id: 1 Some combination of these messages causes a failure
  34. Property based testing Database id: 1 id: 1 Looks like

    we aren’t handling duplicates correctly Consumer
  35. Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems

    and the joys of functional programming Data Validation Finding Errors Monitoring Capacity Planning #hottakes
  36. Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems

    and the joys of functional programming Data Validation Finding Errors Monitoring Capacity Planning #hottakes
  37. Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems

    and the joys of functional programming Data Validation Monitoring Capacity Planning #hottakes
  38. Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems

    and the joys of functional programming Data Validation Monitoring Capacity Planning #hottakes
  39. Calculating partions partitions < 100 x brokers x replication factor

    source: https://www.confluent.io/blog/how-choose-number-topics-partitions-kafka-cluster
  40. Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems

    and the joys of functional programming Data Validation Monitoring Capacity Planning #hottakes
  41. Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems

    and the joys of functional programming Data Validation Monitoring Capacity Planning #hottakes