Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Aggressively Probing Ruby Projects

Aggressively Probing Ruby Projects

So we built an employee-driven, geographically-distributed, multi-client, HTML5-based, API-centric, bathroom-enabled, buzzword-embracing music server for the GitHub offices. It's been a fun project to explore company culture, CSS frameworks, JavaScript methodologies, native clients, outside contributions, and to discover who gets really angry when Garth Brooks starts singing on the speakers. It's one of those projects that ended up far more nutty than the original idea. Come steal some ideas for your own projects.

Zach Holman

April 19, 2012
Tweet

More Decks by Zach Holman

Other Decks in Programming

Transcript

  1. PROBING RUBY PROJECTS
    AGGRESSIVELY

    View full-size slide

  2. SO
    I WROTE
    CODE THAT
    DESCRIBES RUBY

    View full-size slide

  3. INDEX +
    ANALYZE

    View full-size slide

  4. INDEX +
    ANALYZE

    View full-size slide

  5. 2
    PROBE ANALYSIS
    1REPOSITORY
    3REPORTING

    View full-size slide

  6. 1REPOSITORY
    git clone

    View full-size slide

  7. 2
    PROBE ANALYSIS
    “Probe”
    a class that looks for
    something in your
    project

    View full-size slide

  8. 3REPORTING
    run a report
    on what we discovered
    about your project

    View full-size slide

  9. REPOSITORY
    PROBES
    REPORTS
    b
    s
    n

    View full-size slide

  10. REPOSITORY
    PROBES
    REPORTS
    b
    s
    n
    Simple, right?

    View full-size slide

  11. REPOSITORY
    PROBES
    REPORTS
    b
    s
    n
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    bbbbbbbbbbbbbbbbb
    15,698

    View full-size slide

  12. REPOSITORY
    PROBES
    REPORTS
    b
    s
    n
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    bbbbbbbbbbbbbbbbb
    15,698
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    17 FILES, 41 TESTS
    sssss
    sssss
    s

    View full-size slide

  13. 10 REPO SLICES
    REPOSITORY
    PROBES
    REPORTS
    b
    s
    n
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    b
    bbbbbbbbbbbbbbbbb
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    s
    sssss
    sssss
    s
    n
    n
    n
    n
    n
    n
    n
    n
    n
    n
    n
    n
    n
    n
    n
    n
    n
    n
    n
    n
    n
    n
    n
    n
    n
    n
    n
    n
    n
    n
    n
    n
    n
    n
    n
    n
    n
    n
    n
    n
    n
    n
    n
    n
    n
    n
    n
    n
    n
    n
    n
    n
    n
    n
    n
    n
    n
    n
    n
    n
    n
    n
    n
    n
    n
    n
    n
    n
    n
    n
    n
    n
    n
    n
    n
    n
    n
    n
    n
    n
    n
    n
    n
    n
    n
    n
    n
    n
    n
    n
    n
    n
    n
    n
    15,698
    17 FILES, 41 TESTS

    View full-size slide

  14. REPOSITORY SLICES
    - Split all SHAs into ten slices
    - Probe history at each slice

    View full-size slide

  15. DO THE MATH
    80,000,000,000,000 metrics
    (or something)

    View full-size slide

  16. GOAL:
    Index projects, see how they
    change over time, and compare
    them against all indexed projects

    View full-size slide

  17. NOT A STATISTICIAN
    OPEN SOURCE
    (so if you are one, fix it)
    ...but this is

    View full-size slide

  18. GitHub-only
    k API v3
    Newer Projects
    Popular Projects
    Only Public Code

    View full-size slide

  19. This is a new project.
    It’s for fun.
    Take it lightly (for now).

    View full-size slide

  20. OTHERWISE,
    everything else in this
    is 100% ACCURATE and
    won’t generate an
    INTERNET FLAMEWAR
    maybe probably.

    View full-size slide

  21. github.com/holman/hopper

    View full-size slide

  22. HOPPER
    SINATRA
    REDIS
    RESQUE
    HEROKU
    1.9 LIBGIT2

    View full-size slide

  23. Simple UI
    SINATRA
    PJAX
    Minimal Frontend
    Mustache
    SCSS + CoffeeScript

    View full-size slide

  24. REDIS
    Primarily schemaless
    Dynamic probes
    Easy bootstrapping
    k Redis To Go k

    View full-size slide

  25. Easy deployment
    HEROKU
    Cedar stack is rad, yo

    View full-size slide

  26. Independent workers
    RESQUE
    Millions of jobs

    View full-size slide

  27. AST parsing with Ripper
    1.9
    Stop hurting Ruby. Use 1.9.

    View full-size slide

  28. libgit2: linkable Git library
    LIBGIT2
    rugged: a gem for libgit2
    fast, easy Git access

    View full-size slide

  29. WRITING NEW PROBES
    SHOULD BE TRIVIAL

    View full-size slide

  30. class Loc < Probe
    exposes :lines
    def lines
    repository.files.map do |file|
    repository.read(file).to_s.lines.count
    end.sum
    end
    end
    APP/PROBES/#{PROBE}.RB

    View full-size slide

  31. class Loc < Probe
    exposes :lines
    def lines
    repository.files.map do |file|
    repository.read(file).to_s.lines.count
    end.sum
    end
    end
    APP/PROBES/#{PROBE}.RB

    View full-size slide

  32. class Loc < Probe
    exposes :lines # Report on these methods
    def lines
    repository.files.map do |file|
    repository.read(file).to_s.lines.count
    end.sum
    end
    end
    APP/PROBES/#{PROBE}.RB

    View full-size slide

  33. class Loc < Probe
    exposes :lines
    def lines
    repository.files.map do |file|
    repository.read(file).to_s.lines.count
    end.sum
    end
    end
    APP/PROBES/#{PROBE}.RB

    View full-size slide

  34. class Loc < Probe
    exposes :lines
    def lines
    repository.files.map do |file|
    repository.read(file).to_s.lines.count
    end.sum
    end
    end
    APP/PROBES/#{PROBE}.RB

    View full-size slide

  35. class Loc < Probe
    exposes :lines
    def lines
    repository.files.map do |file|
    repository.read(file).to_s.lines.count
    end.sum
    end
    end
    APP/PROBES/#{PROBE}.RB

    View full-size slide

  36. class Repository
    def lines
    def read
    def revisions
    def commit_messages
    end
    APP/MODELS/REPOSITORY.RB
    COMMON HELPER METHODS
    CAN BE ABSTRACTED TO SVN, HG, ETC.

    View full-size slide

  37. A QUICK ASIDE: RUGGED IS COOL
    repo = Rugged::Repository.new(path)
    repo.lookup(‘2cc3e9a’).message
    # => "Run an resque worker\n"

    View full-size slide

  38. A QUICK ASIDE: RUGGED IS COOL
    walker = Rugged::Walker.new(repo)
    walker.push(‘2cc3e9a’)
    walker.map(&:oid)
    # => [an, array, of, shas]

    View full-size slide

  39. A QUICK ASIDE: RUGGED IS COOL
    - Faster than shelling out (naturally)
    - Write or stage new commits
    - Multi-platform, permissive license,
    bindings for all major languages
    Ruby: gem install rugged
    - Faster than any other Git library

    View full-size slide

  40. [1, 2, 3, 4, 4]
    REMEMBER YOUR SCHOOLING:
    MEAN:
    MEDIAN:
    MODE: 4
    3
    2.8

    View full-size slide

  41. PROJECTS
    15,698
    most-forked Ruby projects on GitHub

    View full-size slide

  42. SCIENCE #1
    MOST
    PROJECTS
    ARE LONELY

    View full-size slide

  43. CONTRIBUTORS
    3.77 (MEAN)

    View full-size slide

  44. CONTRIBUTORS
    2 (MEDIAN)

    View full-size slide

  45. Popular projects get a
    disproportionate amount of help.

    View full-size slide

  46. FOLLOWERS
    22 (MEDIAN)

    View full-size slide

  47. FORKS
    5 (MEDIAN)
    five forks for two contributors
    means three inactive or ignored forks

    View full-size slide

  48. Again, this doesn’t take into
    account the bottom 90%, either.

    View full-size slide

  49. Open source is a
    long, long, lonely tail.

    View full-size slide

  50. SCIENCE #2
    OFFENSIVE
    RUBY CODE
    IS OFFENSIVE

    View full-size slide

  51. SWEAR WORDS
    0.50
    PER PROJECT

    View full-size slide

  52. DEFINE_METHOD()S
    4.27
    PER PROJECT
    PERHAPS MORE OFFENSIVELY,
    (and 30.1 SEND()s, but that’s harder to measure)

    View full-size slide

  53. QUESTION:
    \t or SPACES ?

    View full-size slide

  54. ANSWER:
    YOU ARE A
    HORRIBLE
    PERSON IF
    YOU HARD TAB

    View full-size slide

  55. ANSWER:
    LUCKILY ONLY 8.4% OF
    PROJECTS PREDOMINANTLY
    HARD TAB

    View full-size slide

  56. median trailing spaces:
    mean trailing spaces: 531.5
    31
    IT IS A HORROR

    View full-size slide

  57. 98.7%
    of the top Rails projects
    AVOID SEMICOLONS
    ...in their JavaScript;

    View full-size slide

  58. just kidding omg stop talking about semicolons
    they’re ;;;;;;;ing boring

    View full-size slide

  59. SCIENCE #3
    THE
    WORK

    View full-size slide

  60. TOTAL LINES
    17,316
    LINES OF RUBY CODE
    4,572
    761 COMMENT LINES
    (mean)

    View full-size slide

  61. TOTAL LINES
    1,132
    LINES OF RUBY CODE
    563
    63 COMMENT LINES
    (median)

    View full-size slide

  62. FROM THIS, WE CAN SEE:
    Popular projects tend to
    have more non-Ruby code
    Inline documentation is sparse (11%)

    View full-size slide

  63. BRANCHES .
    1.96
    remote branches per project
    median: 1

    View full-size slide

  64. TOTAL COMMITS
    417.0
    MEAN
    110.0
    MEDIAN

    View full-size slide

  65. SCIENCE #4
    THE
    RUBY
    ECOSYSTEM

    View full-size slide

  66. RAKE 78% of projects
    have a Rakefile

    View full-size slide

  67. BUNDLER 31% of projects
    have a Gemfile

    View full-size slide

  68. BUNDLER 15% of projects
    have a Gemfile.lock

    View full-size slide

  69. GEMS 51% of projects
    had a .gemspec

    View full-size slide

  70. CONTAINERS
    CLASSES
    MEAN MEDIAN
    MODULES
    55
    44
    8
    6
    *(includes redefinitions)

    View full-size slide

  71. METHODS
    CLASS
    MEAN MEDIAN
    INSTANCE
    20
    266
    2
    28

    View full-size slide

  72. SCIENCE #5
    THE
    PAPERWORK

    View full-size slide

  73. LICENSES
    47.1% of projects don’t have a license
    This is worrisome

    View full-size slide

  74. LICENSES
    Across all projects,
    44% chose MIT as their license
    2.2% Apache
    1.1% GPL
    0.5% LGPL

    View full-size slide

  75. ARBITRARY
    COMPARISONS

    View full-size slide

  76. MULTIPLE
    LANGUAGES

    View full-size slide

  77. LANGUAGE
    COMPARISONS

    View full-size slide

  78. MORE D3.JS
    VISUALIZATIONS

    View full-size slide

  79. GITHUB.COM/HOLMAN/HOPPER
    CODESTAT.US

    View full-size slide

  80. @HOLMAN
    ZACH HOLMAN
    ZACHHOLMAN.COM/TALKS

    View full-size slide