$30 off During Our Annual Pro Sale. View Details »

Monitoring time in a distributed database: a play in three acts

Monitoring time in a distributed database: a play in three acts

Monitoring time is tricky given its fluid nature. Doing so across distributed database hosts is trickier. Latency, probe intervals, clock synchronization, all affect the metrics, and taking actions based on those metrics makes matters even more complex. How does one measure time? What is the baseline? What accuracy and tradeoffs can we expect? Can we use time itself to affect the outcome? At GitHub, we monitor time in our database topologies for throttling and consistent reads purposes. We present our use case and our findings.

Shlomi Noach

May 14, 2019
Tweet

More Decks by Shlomi Noach

Other Decks in Technology

Transcript

  1. Monitoring time in distributed
    databases: a play in three acts
    Shlomi Noach
    GitHub
    StatsCraft 2019

    View Slide

  2. Agenda
    TL;DR: time adventures and
    mishaps


    Throttling
    Consistent reads
    And all that follows

    View Slide

  3. About me
    @github/database-infrastructure
    Author of orchestrator, gh-ost, freno, ccql
    and other open source tools.
    Blog at http://openark.org

    github.com/shlomi-noach

    @ShlomiNoach

    View Slide

  4. GitHub

    Built for developers
    Largest open source hosting
    100M+ repositories

    36M+ developers

    1B+ contributions
    Largest supplier of octocat T-Shirts and stickers

    View Slide

  5. Prelude

    View Slide

  6. Asynchronous replication
    Single writer node
    Asynchronous replicas
    Multi layered
    Scale reads across replicas
    ! !
    !
    !
    !
    !

    View Slide

  7. Replication lag
    Desired behavior: smallest possible lag
    • Consistent reads (aka read your own writes)
    • Faster/lossless/less lossy failovers
    ! !
    !
    !
    !
    !

    View Slide

  8. Replication lag
    ! !
    !
    !
    !
    !

    View Slide

  9. Replication lag
    ! !
    !
    !
    !
    !

    View Slide

  10. Measuring lag via heartbeat
    Inject heartbeat on master
    Read replicated value on replica, compare with time now()
    ! !
    !
    !
    !
    !

    View Slide

  11. Inject and read
    Heartbeat generated locally on writer node
    ! !
    !
    !
    !
    !
    Inject
    Read & compare
    " Read & compare
    "
    Read & compare
    "

    View Slide

  12. create table heartbeat (

    anchor int unsigned not null,

    ts timestamp(6),

    primary key (anchor)

    );
    Heartbeat
    ! !
    !
    !
    !
    !

    View Slide

  13. create table heartbeat (

    anchor int unsigned not null,

    ts timestamp(6),

    primary key (anchor)

    );
    replace into heartbeat values (

    1, now(6)

    );
    Heartbeat: inject on master
    ! !
    !
    !
    !
    !

    View Slide

  14. create table heartbeat (

    anchor int unsigned not null,

    ts timestamp(6),

    primary key (anchor)

    );
    select 

    unix_timestamp(now(6)) - 

    unix_timestamp(ts) as lag 

    from 

    heartbeat

    where

    anchor = 1
    Heartbeat: read on replica
    ! !
    !
    !
    !
    !

    View Slide

  15. Replication lag: graphing
    ! !
    !
    !
    !
    !

    View Slide

  16. Act I

    View Slide

  17. Objective: throttling

    View Slide

  18. Throttling
    Break large writes into small tasks
    Allow writes to take place if lag is low
    Hold off writes when lag is high
    Threshold: 1sec

    View Slide

  19. !
    Heartbeat injection
    15:07:00.00 .050 .100 .150 .200
    .950
    15:07:00.000

    View Slide

  20. !
    Heartbeat injection: applied on replica
    15:07:00.00 .050 .100 .150 .200
    .950
    !
    15:07:00.000
    15:07:00.004

    View Slide

  21. !
    Heartbeat injection: read by app
    15:07:00.00 .050 .100 .150 .200
    .950
    !
    15:07:00.000
    15:07:00.004
    # 15:07:00.007
    0.007

    View Slide

  22. !
    Heartbeat injection: delayed app read
    15:07:00.00 .050 .100 .150 .200
    .950
    !
    15:07:00.000
    15:07:00.004
    # 15:07:00.047
    0.047

    View Slide

  23. !
    Heartbeat injection: delayed apply
    15:07:00.00 .050 .100 .150 .200
    .950
    !
    15:07:00.000
    15:07:00.044
    # 15:07:00.047
    0.047

    View Slide

  24. Heartbeat injection: granularity
    +50ms

    View Slide

  25. Act II

    View Slide

  26. Practical constraints

    View Slide

  27. Lag monitor service
    ! !
    !
    !
    !
    !
    freno to monitor replication lag:
    • Polls all replicas at 50ms interval
    • Aggregates data per cluster at 25ms interval
    • https://githubengineering.com/mitigating-replication-lag-and-reducing-read-load-with-freno/
    • https://github.com/github/freno

    View Slide

  28. !
    Heartbeat injection
    15:07:00.00 .050 .100 .150 .200
    .950
    15:07:00.000

    View Slide

  29. !
    Heartbeat injection: applied on replica
    15:07:00.00 .050 .100 .150 .200
    .950
    !
    15:07:00.000
    15:07:00.004

    View Slide

  30. !
    Heartbeat injection: read by freno
    15:07:00.00 .050 .100 .150 .200
    .950
    !
    15:07:00.000
    15:07:00.004
    15:07:00.007
    0.007

    View Slide

  31. !
    Heartbeat injection: read by app
    15:07:00.00 .050 .100 .150 .200
    .950
    !
    15:07:00.000
    15:07:00.004
    15:07:00.007
    0.007
    # 15:07:00.009

    View Slide

  32. !
    Heartbeat injection: delayed app read
    15:07:00.00 .050 .100 .150 .200
    .950
    !
    15:07:00.000
    15:07:00.004
    15:07:00.007
    0.007
    # 15:07:00.048

    View Slide

  33. !
    Delayed app read, broken replica
    15:07:00.00 .050 .100 .150 .200
    .950
    !
    15:07:00.000
    15:07:00.004
    15:07:00.007
    0.007
    # 15:07:00.048
    xx

    View Slide

  34. Heartbeat injection with freno: granularity
    ±50ms

    View Slide

  35. Actual safety margins:
    50ms freno sampling interval
    25ms freno aggregation interval
    Allow additional 25ms for “extra complications”
    Total 100ms

    View Slide

  36. Throttling: 

    granularity is not important

    View Slide

  37. Granularity is important

    View Slide

  38. Objective: consistent reads

    View Slide

  39. Consistent reads, 

    aka read-your-own-writes
    A classic problem of distributed databases
    ! !
    !
    !
    !
    !
    write
    expect data
    "

    View Slide

  40. Consistent read checks
    ! !
    !
    !
    !
    !
    App asks freno:
    “I made a write 350ms ago. Are all replicas up to date?”
    Client auto-requires 100ms error margin
    We compare replication lag with 250ms
    write
    read
    "
    check

    View Slide

  41. Everything is terrible
    ! !
    !
    !
    !
    !
    100ms is where interesting stuff happens, and it’s within our
    error margin.
    write
    read
    "
    check

    View Slide

  42. The metrics dilemma
    The metrics dilemma
    Can’t we just reduce the interval?

    View Slide

  43. Act III

    View Slide

  44. Beyond our
    control

    View Slide

  45. Latency

    View Slide

  46. High latency networks
    Minimal lag
    ! !
    !
    !
    !
    !

    View Slide

  47. Latency: consistent reads
    App close to writer node, far from replica
    ! !
    !
    !
    !
    !
    write
    check lag
    "

    View Slide

  48. Latency: consistent reads
    App close to writer node, far from replica
    ! !
    !
    !
    !
    !
    write
    check lag
    "

    View Slide

  49. Skewed clocks

    View Slide

  50. !
    Heartbeat injection
    15:07:00.00 .050 .100 .150 .200
    .950
    15:07:00.000

    View Slide

  51. !
    Heartbeat injection: applied on skewed replica
    15:07:00.00 .050 .100 .150 .200
    .950
    !
    15:07:00.000
    15:07:00.004 -> 15:06:59.994

    View Slide

  52. !
    Heartbeat injection: read by app
    15:07:00.00 .050 .100 .150 .200
    .950
    !
    15:07:00.000
    15:07:00.004 -> 15:06:59.994
    # 15:07:00.007
    -0.003

    View Slide

  53. !
    Heartbeat injection on skewed master
    15:07:00.00 .050 .100 .150 .200
    .950
    15:07:00.025

    View Slide

  54. !
    Heartbeat injection: applied on skewed replica
    15:07:00.00 .050 .100 .150 .200
    .950
    !
    15:07:00.025
    15:07:00.004

    View Slide

  55. !
    Heartbeat injection: read by app
    15:07:00.00 .050 .100 .150 .200
    .950
    !
    15:07:00.025
    15:07:00.004
    # 15:07:00.007
    -0.018

    View Slide

  56. Timer skew

    View Slide

  57. GC

    View Slide

  58. VM

    View Slide

  59. Granularity limitation

    View Slide

  60. Everything
    is still
    terrible

    View Slide

  61. Atomic clocks

    View Slide

  62. Clock synchronization: verification

    View Slide

  63. A late mitigation

    View Slide

  64. An untimely postlude:


    Can we do without clocks?

    View Slide

  65. $ $
    $
    $
    $
    Consensus
    protocols

    View Slide

  66. $ $
    $
    $
    $
    Lamport
    timestamps

    View Slide

  67. MySQL: GTID
    Each transaction generates a GTID:

    00020192-1111-1111-1111-111111111111:830541
    Each server keeps track of gtid_executed: all transactions ever
    executed:

    00020192-1111-1111-1111-111111111111:1-830541
    SELECT GTID_SEUBSET(

    ‘00020192-1111-1111-1111-111111111111:830541’,

    @@gtid_executed

    );

    View Slide

  68. And yet the search for time
    metrics endures…
    %

    View Slide

  69. Questions?
    github.com/shlomi-noach
    @ShlomiNoach
    Thank you!

    View Slide