$30 off During Our Annual Pro Sale. View Details »

My Mom told me that Git doesn't scale

Vicent Martí
November 08, 2012
1.5k

My Mom told me that Git doesn't scale

Vicent Martí

November 08, 2012
Tweet

Transcript

  1. View Slide

  2. Thanks
    for being here

    View Slide

  3. View Slide

  4. github

    View Slide

  5. github
    Git hosting:
    No longer a pain in the ass

    View Slide

  6. github
    Git hosting:
    No longer a pain in the ass
    for you.
    Not for us.
    Because, goddamnit,
    if I ever find the guy who invented
    this thing I’m going to hang him from

    View Slide

  7. Let’s host some
    Git repos!
     file.c
     src
     file.h
     README.md
     COPYING.md
     .git
     Bare Repository
     HEAD
     index
     objects
     refs
    git-daemon

    View Slide

  8. OK, now about
    the web...
    grit Ruby - Git
    interface

    View Slide

  9. OK, now about
    the web...
    grit  Bare Repo
     Bare Repo
     Bare Repo
    Ruby - Git
    interface

    View Slide

  10. 1VM
    grit

    storage
    rails app

    View Slide

  11. nVM

    storage

    View Slide

  12. nVM

    storage
    (GFS)

    View Slide

  13. Rails was making us slow.

    View Slide

  14. View Slide

  15. Literally.

    View Slide

  16. Time to move to
    Real Hardware

    View Slide

  17. fileservers
    frontends

    db

    View Slide

  18. fileservers
    frontends

    db
    ?????????

    View Slide

  19. smoke

    View Slide

  20. View Slide

  21. bert
    (binary Erlang term)

    View Slide

  22. bert
    (binary Erlang term)
    ernie
    (not an acronym)

    View Slide


  23. chimney
    (Redis)
    frontend
    fileserver
    smoke
    grit
    ernie
    grit

    View Slide

  24. Horizontal Scaling
    Vertical Scaling

    View Slide

  25. Horizontal Scaling
    Vertical Scaling
    problums wit them gigabits
    problums wit them gigahurtz

    View Slide

  26. them
    gigabits

    View Slide

  27. bummer
    x4180 =
    A LOT.

    View Slide

  28. NetShard

         ...
    alternate
    network

    View Slide

  29. NetShard

         ... 
    alternate
    network

    View Slide

  30. them
    gigahurtz

    View Slide

  31. bottleneck:
    grit

    View Slide

  32. bottleneck:
    grit
    solution:
    git
    shell out to

    View Slide

  33. bottleneck:
    git
    shell out to

    View Slide

  34. bottleneck:
    git
    solution:
    git
    shell out to
    shell out to

    View Slide

  35. bottleneck:
    git
    solution:
    git
    shell out to
    shell out to
    properly

    View Slide

  36. posix_spawn

    View Slide

  37. posix_spawn
    Seriously.
    < 1ms

    View Slide

  38. posix_spawn
    Seriously.
    < 1ms
    The issue is not in
    “shelling out”,
    the issue is in the
    spawned process.

    View Slide

  39. View Slide

  40. GUISE

    View Slide

  41. GUISE
    GUISE

    View Slide

  42. GUISE
    GUISE
    GUISE

    View Slide

  43. GUISE
    GUISE
    GUISE
    ...what?

    View Slide

  44. View Slide

  45. Why don’t we take

    View Slide

  46. Why don’t we take
    the Git binary...

    View Slide

  47. Why don’t we take
    the Git binary...
    yeah?

    View Slide

  48. Why don’t we take
    the Git binary...
    yeah? and compile it as

    View Slide

  49. Why don’t we take
    the Git binary...
    yeah? and compile it as
    a library

    View Slide

  50. Why don’t we take
    the Git binary...
    yeah? and compile it as
    a library
    oh... go on...

    View Slide

  51. Why don’t we take
    the Git binary...
    yeah? and compile it as
    a library
    oh... go on...
    and link that into

    View Slide

  52. Why don’t we take
    the Git binary...
    yeah? and compile it as
    a library
    oh... go on...
    and link that into
    our server

    View Slide

  53. Scientific
    Graph™

    View Slide

  54. Memory Usage
    Time
    Scientific
    Graph™

    View Slide

  55. Memory Usage
    Time
    Scientific
    Graph™

    View Slide

  56. Memory Usage
    Time
    Scientific
    Graph™

    View Slide

  57. Memory Usage
    Time
    Scientific
    Graph™

    View Slide

  58. Memory Usage
    Time
    Scientific
    Graph™

    View Slide

  59. Memory Usage
    Time
    Scientific
    Graph™

    View Slide

  60. View Slide

  61. Well, we didn’t think about

    View Slide

  62. Well, we didn’t think about
    freeing memory, but...

    View Slide

  63. Well, we didn’t think about
    freeing memory, but...
    THIS IS THE KIND
    OF PROBLEM
    WE COULD SOLVE
    WITH CGI

    View Slide

  64. Well, we didn’t think about
    freeing memory, but...
    THIS IS THE KIND
    OF PROBLEM
    WE COULD SOLVE
    WITH CGI
    IN 1995

    View Slide

  65. Scientific
    Graph™

    View Slide

  66. Memory Usage
    Time
    Scientific
    Graph™

    View Slide

  67. Memory Usage
    Time
    Scientific
    Graph™

    View Slide

  68. Memory Usage
    Time
    Scientific
    Graph™

    View Slide

  69. Memory Usage
    Time
    Scientific
    Graph™

    View Slide

  70. Memory Usage
    Time
    Scientific
    Graph™

    View Slide

  71. Memory Usage
    Time
    Scientific
    Graph™

    View Slide

  72. Memory Usage
    Time
    Scientific
    Graph™

    View Slide

  73. What do you mean
    the server died?

    View Slide

  74. die("BUG: non-INDEX attr direction
    in a bare repo");
    What do you mean
    the server died?

    View Slide

  75. die("BUG: non-INDEX attr direction
    in a bare repo");
    die("a bad revision is needed");
    What do you mean
    the server died?

    View Slide

  76. die("BUG: non-INDEX attr direction
    in a bare repo");
    die("a bad revision is needed");
    die("'%s' is not a valid
    branch name.", name);
    What do you mean
    the server died?

    View Slide

  77. die("BUG: non-INDEX attr direction
    in a bare repo");
    die("a bad revision is needed");
    die("'%s' is not a valid
    branch name.", name); die("Empty patch.
    Aborted.");
    What do you mean
    the server died?

    View Slide

  78. die("BUG: non-INDEX attr direction
    in a bare repo");
    die("a bad revision is needed");
    die("'%s' is not a valid
    branch name.", name); die("Empty patch.
    Aborted.");
    die("unable to read index file");
    What do you mean
    the server died?

    View Slide

  79. libgit

    View Slide

  80. libgit2
    the “2” means this one
    frees memory

    View Slide

  81. libgit2
    the “2” means this one
    frees memory
    NOT ENOUGH
    ABSTRACT
    FACTORIES

    View Slide

  82. JGit
    the “J” means this one
    is in Java
    ...not our thing.

    View Slide

  83. Java
    a brief timeline
    New companies
    don’t use Java
    because it’s
    not like Unix
    1995
    New companies
    use Java
    because it’s
    new and shiny
    1997
    New companies
    don’t use Java
    because it’s
    ooooooold
    2005
    New companies
    use the JVM
    because
    2011

    View Slide

  84. Java
    a brief timeline
    New companies
    don’t use Java
    because it’s
    not like Unix
    1995
    New companies
    use Java
    because it’s
    new and shiny
    1997
    New companies
    don’t use Java
    because it’s
    ooooooold
    2005
    New companies
    use the JVM
    because
    2011
    github

    View Slide

  85. If you think you understand
    the JVM, you are either:

    View Slide

  86. If you think you understand
    the JVM, you are either:
    a) Very smart

    View Slide

  87. If you think you understand
    the JVM, you are either:
    a) Very smart
    b) Very wrong

    View Slide

  88. If you think you understand
    the JVM, you are either:
    a) Very smart
    b) Very wrong

    View Slide

  89. Some people think
    that
    github
    is a
    Rails shop
    Ruby shop.
    or even a

    View Slide

  90. Some people think
    that
    github
    is a
    Rails shop
    Ruby shop.
    or even a
    github
    is a
    Unix shop
    and everything else is
    just a detail.

    View Slide

  91. libgit2
    So,

    View Slide

  92. libgit2
    a brief timeline
    Shawn
    Pearce
    The Past

    View Slide

  93. libgit2
    a brief timeline
    Shawn
    Pearce
    myself
    The Past

    View Slide

  94. libgit2
    a brief timeline
    Shawn
    Pearce
    myself
    myself
    (about to have a
    mental breakdown)
    The Past

    View Slide

  95. libgit2
    a brief timeline
    Shawn
    Pearce
    myself
    myself
    (about to have a
    mental breakdown)
    myself
    (having a mental
    breakdown)
    The Past

    View Slide

  96. libgit2
    a brief timeline
    Shawn
    Pearce
    myself
    myself
    (about to have a
    mental breakdown)
    myself
    (having a mental
    breakdown)
    myself
    (reaching Git nirvana)
    The Past

    View Slide

  97. libgit2
    a brief timeline
    Shawn
    Pearce
    myself
    myself
    (about to have a
    mental breakdown)
    myself
    (having a mental
    breakdown)
    myself
    (reaching Git nirvana)
    The Past
    Russell
    Belfer
    Carlos
    Martín
    Michael
    Schubert
    Ben
    Straub
    real contributors

    View Slide

  98. libgit2
    a brief timeline
    ?

    View Slide

  99. libgit2
    a brief timeline
    ?

    View Slide

  100. libgit2
    a brief timeline
    ?
    1.0 release

    View Slide

  101. libgit2

    View Slide

  102. libgit2

    View Slide

  103. Good Heavens,
    just look at the
    time.
    It’s NoSQL o’clock
    NoSQL
    NoSQL NoSQL
    NoSQL
    NoSQL NoSQL
    NoSQL
    NoSQL

    View Slide

  104. View Slide

  105. ...do you even

    View Slide

  106. ...do you even
    mongo?

    View Slide

  107. Key-Value
    Stores
    The Magic of
    If you wish upon a star,
    and have a pure heart...

    View Slide

  108. Key-Value
    Stores
    The Magic of
    If you wish upon a star,
    and have a pure heart...
    Anything can be
    a Key-Value store!

    View Slide

  109. id name state lat
    13 San Francisco CA 24
    24 Phoenix AZ 33
    7 Denver CO 40
    8 Caribou ME 47
    2 Los Angeles CA 22
    SELECT * FROM CITIES
    WHERE name = ‘San Francisco’
    Key-Value
    Stores
    The Magic of

    View Slide

  110. Git is queried like
    a Key-Value Store
    But it is not a
    Key-Value store
    git show
    f3c896c1949476e85abc0d75bb2143656a9580a6

    View Slide

  111. a b r i e f i n t r o d u c t i o n
    t o t h e G i t d a t a m o d e l

    View Slide

  112. View Slide

  113. View Slide

  114.  file.c
     src
     file.h
     README.md
     COPYING.md

    View Slide

  115.  file.c
     src
     file.h
     README.md
     COPYING.md
    tree
    src/
    README.md
    COPYING.md
    tree
    file.c
    file.h
    blob
    blob
    blob
    blob

    View Slide

  116. commit
    parent
    tree T
    metadata

    View Slide

  117. commit
    T
    commit
    T
    commit
    T
    commit
    T
    commit
    T
    commit
    T
    Behold,
    a graph.

    View Slide

  118. View Slide

  119. Well that was easy.

    View Slide

  120. View Slide

  121. master
    Oh
    god
    kill
    me

    View Slide

  122. Li le known
    torture
    methods:

    View Slide

  123. View Slide

  124. warning:
    the rabbit
    hole is
    pretty deep

    View Slide

  125. View Slide

  126. warning:
    git totally
    wasn’t designed
    for this

    View Slide

  127. Git doesn’t give
    a #!%$ about CAP

    View Slide

  128. View Slide

  129. Number of hops on a complex query
    1,000,000

    View Slide

  130. Number of hops on a complex query
    1,000,000
    Required hops for a successful query
    1,000,000

    View Slide

  131. Number of hops on a complex query
    1,000,000
    Required hops for a successful query
    1,000,000
    Replica count to ensure 100% availability
    a metric shitton

    View Slide

  132. We could fix it.

    View Slide

  133. We could fix it.
    But we won’t.

    View Slide

  134. GitRPC

    View Slide

  135. GitRPC
    Less.

    View Slide

  136. GitRPC
    Rugged
    libgit2
    server
    Ruby
    Ruby
    C

    View Slide


  137. chimney
    (Redis)
    frontend
    fileserver
    smoke
    grit
    ernie
    grit
    GitRPC GitRPC

    View Slide


  138. chimney
    (Redis)
    frontend fileserver
    GitRPC GitRPC
    server
    client

    View Slide

  139. New
    serialization
    protocol
    Banana
    Pack
    MessagePack
    + more

    View Slide

  140. New
    serialization
    protocol
    Banana
    Pack
    MessagePack
    + more
    mochilo

    View Slide

  141. New
    serialization
    protocol
    Banana
    Pack
    MessagePack
    + more
    mochilo
    Banana Phone

    View Slide

  142. evolutionary
    (disappointing?)

    View Slide

  143. Summary:

    View Slide

  144. Ruby
    C
    Unix
    Unix
    Unix
    Unix
    Unix
    Unix
    Unix
    Unix
    Boring
    Boring
    Boring
    Boring
    Boring
    Summary:

    View Slide

  145. Use the
    most reliable
    tools you know.

    View Slide

  146. Challenge yourself to build
    the simplest thing.
    Not because it’s easy,
    but because it works.

    View Slide

  147. Innovate
    where it really ma ers.

    View Slide

  148. revolutionary
    product
    create a
    revolutionary
    backend.
    not a

    View Slide

  149. View Slide

  150. Q: Does Git scale?

    View Slide

  151. Q: Does Git scale?
    A: Who cares?

    View Slide

  152. View Slide