Upgrade to PRO for Only $50/Year—Limited-Time Offer! 🔥

GitHub Flavored Ruby

GitHub Flavored Ruby

Someone once told me that software development is a constant battle against complexity. Over the past three years we've built several large systems at GitHub and if anything, that saying is an understatement. Things like tight coupling, insufficient testing or documentation, lack of versioning discipline, and underspecified design documents can easily lead you down a path of ruin. In this talk I'll cover several of the techniques we use at GitHub to defend against complexity in our Ruby systems, including modularization, Readme Driven Development, Rakegem, TomDoc, Semantic Versioning.

Tom Preston-Werner

November 12, 2011
Tweet

More Decks by Tom Preston-Werner

Other Decks in Programming

Transcript

  1. @mojombo you should follow me and read my blog tom.preston-werner.com

    You can find me on Twitter and GitHub as mojombo, and read my blog at http://tom.preston-werner.com.
  2. RakeGem Readme Driven Development TomDoc Semantic Versioning Relentless Modularization Today

    I’m going to talk about five ideas that we use at GitHub to streamline how we approach building Rubygems.
  3. 0 5,000 10,000 15,000 20,000 Lines of code Tinything Bigshit

    Think of a small project you’ve done. Maybe it has 1000 lines of code. It’s a pleasure to work with. Easy to maintain. You love working on it. Now think of a huge monolithic project you’ve been part of. I’m betting you’d give anything to stay away from that code.
  4. FUUUUUUUUUUUUUUUU How the hell does this even happen!? Large code

    has a tendency to become messy and tightly coupled. Sometimes you don’t even realize this is happening. Without a weapon to fight this trend, you’ll end up spending your days untangling slinkies instead of clapping like an idiot while they slink down the stairs.
  5. How do I decide what to modularize?” “ Sometimes it

    can be tricky to decide what to modularize and when something should be extracted. There’s a simple heuristic I like to use. Modularize...
  6. EVERYTHINGGGGGGGGG EVERYTHINGGGGGGGGGGGG. If you find yourself wondering if you should

    modularize something or not, just remember this baby staring into your soul and you’ll do the right thing.
  7. github.com grit smoke chimney bertrpc proxymachine ernie failbot gerve resque

    RockQueue jekyll nodeload albino markup camo gollum heaven stratocaster amen haystack hubot services help.github.com jobs At GitHub we embrace modularization in a big way. We continually extract pieces of the main GitHub.com Rails app into their own components. Then we give them funny names.
  8. A neat trick that I use to approach modularity is

    by remembering my good childhood friend Mr. Rogers. He liked to make believe, and so do I.
  9. Make Believe Open Source Libraries I make believe that whatever

    I’m working on is going to be open sourced. This forces me to use proper abstractions and prevents me from coupling the code too tightly with the main app.
  10. Make Believe Open Source Libraries Even better is if you

    really DO open source your libraries and components. We’re a huge fan of this at GitHub. We try to open source anything that does not represent core business value.
  11. MODULARIZE TO PREVENT PAIN KEY CONCEPT Small projects are easy

    and enjoyable to write and maintain. Big projects are hard and suck to maintain. Save yourself some pain and modularize like you mean it!
  12. [waterfall] In 1970 Winston Royce wrote a book about project

    management. In it he outlined a methodology called Waterfall Design. Even though he wrote about this system as an example of what NOT to do, enterprises and government ignored that and started using it anyway. =/ Over specifying requirements is a disaster. We’ve all embraced Agile techniques to escape its tyranny.
  13. [cowboy] I don’t follow anyone’s rules... ...not even my own

    But in retaliation of Waterfall, we’re tempted to go too far in the other direction and become cowboy coders. This is just as bad.
  14. A perfect implementation of the wrong specification is useless. Either

    way you can end up with the wrong specification. A perfect implementation of the wrong specification is useless.
  15. IS THERE A MIDDLE GROUND? there must be a middle

    ground, right? Surely there must be some solution that lies between these two extremes. Something that’s not OVER specified or UNDER specified.
  16. WRITE YOUR README FIRST There’s already a document that we

    write that contains the information we need to understand a project and how it works. It’s called the README. What if we wrote our READMEs first? We could think through the problem domain enough to prevent big mistakes, but still leave ourselves with enough flexibility to end up with a correct implementation.
  17. Readme.md Spec.md When I first started doing this, it was

    amazing. But it can be confusing if you have an empty repository with just a README file and no implementation. I’ve solved this problem by renaming README to SPEC during the initial phase. Then I move parts of the SPEC into the README as I implement features, thereby keeping the code and the docs in sync.
  18. google://readme driven development I’ve written a blog post that further

    explains this idea. It’s on my weblog. Just search for “readme driven development” and it’ll be the first result.
  19. USE RDD TO SPECIFY THE RIGHT PRODUCT KEY CONCEPT RDD

    can help you build better software by writing down your thoughts before you start coding, and prevents you from locking in the wrong specification by writing too much.
  20. Rakegem is a Minecraft plugin I created that totally makes

    it easy to harvest Rubies from a standard grass block. It’s really great when... Naw, I’m just kidding.
  21. RAKE-BASED GEM BUILDER and deployer, doccer, tester, and manifester Rakegem

    is a flexible, customizable Rake-based gem builder, and more.
  22. github.com/ mojombo/ rakegem If you want to follow along, load

    up this URL. You’ll see just how simple it really is.
  23. NO DEPENDENCIES like, for real. no gems involved*. * except

    yours, duh Rakegem has NO DEPENDENCIES whatsoever.
  24. HAND-ROLLED GEMSPEC + SIMPLE RAKE TASKS Rubygems already have a

    great system for specifying everything about how the gem works. It’s called the gemspec. Rakegem gives you a template gemspec that’s easy to fill out and doesn’t involve any magic. It combines that with a simple Rakefile that handles all the build and release dynamics for you.
  25. GEMSPEC Here’s what the gemspec template looks like. It provides

    a lot of guidance about how to write your gemspec so you don’t have to dig through mountains of documentation.
  26. RAKEFILE The Rakefile can be copied directly to your project

    without modification. Everything it needs it can get from the gemspec.
  27. $ rake -T rake build # Build scoped-0.1.0.gem into the

    pkg directory rake clobber_rdoc # Remove rdoc products rake console # Open an irb session preloaded with this library rake coverage # Generate RCov test coverage and open in your browser rake gemspec # Generate scoped.gemspec rake rdoc # Build the rdoc HTML Files rake release # Create tag v0.1.0 and build and push scoped-0.1.0.gem to Rubygems rake rerdoc # Force a rebuild of the RDOC files rake test # Run tests rake validate # Validate scoped.gemspec It adds Rake tasks for all your normal needs: building the gem and docs, running tests, and doing releases.
  28. RAKEGEM — CUSTOMIZATION The beauty of this system is that

    it’s infinitely customizable. Since the entire system is embedded in your project as simple code, you can change anything you want to get the perfect workflow.
  29. RAKEFILE Here’s what the release task looks like. I like

    to use a version number that looks like “vX.Y.Z”, but maybe you don’t. To change how Rakegem works, just change that line of code!
  30. STOP FIGHTING YOUR GEM BUILDING SYSTEM KEY CONCEPT Your gem

    management system should be simple and customizable. Rakegem gives you the ultimate power and freedom to get things done without any hassle.
  31. FOUR LEVELS of documentation Line Code API Book I’ve identified

    four levels of code documentation. Line-level docs explain tricky lines of code within methods. Code-level docs describe how methods or classes work. API-level docs are for end users of your library. Book-level docs provide a long format overview suitable to beginners.
  32. WHY DOCUMENT CODE? what does it do? is it considered

    public? what params are expected? what types are the params? what are valid options? how do I use the damn thing? what type is the return? There are a lot of things we ask ourselves when looking at new code. Ruby is especially difficult to unravel because of its flexibility. If we don’t write down what we’re thinking when we write code, that information is easily lost to the ghosts of time.
  33. PAST TOM AND FUTURE TOM I’d like to introduce you

    to Past Tom. He’s been looking out for me for a long time. Four years ago he was writing TomDoc that I still read today. Everytime I’m coding now, I think about Future Tom. If I write good docs, I know he’ll look back at me from the future and give me two big thumbs up, because I’ve saved him a ton of time and stress.
  34. class Gollum class Wiki # # # def exist? #

    ... end end end Here’s some code. If all we have is the method signature, it’s hard to tell what’s going on. Even something simple like what type it returns requires reading the code.
  35. class Gollum class Wiki # Public: Check whether the wiki's

    git repo exists on the filesystem. # # Returns true if the repo exists, and false if it does not. def exist? # ... end end end what does it do? is it considered public? what type is the return? With just a few shorts words, we can solve a lot of problems and make sure that future developers that work with this code don’t change it in unpredictable ways.
  36. class Gollum class Wiki # # # # # #

    # # # # # # # # # # # # def write_page(name, format, data, commit = {}) # ... end end end Maybe you think that’s too trivial and reading the code would be fine. Ok, how about this example. Not so simple now, is it? We can get some idea of what the method does, and even though the argument names are good, there is no visibility into specifics about either. As coders, we rely on specifics to write good code.
  37. class Gollum class Wiki # Public: Write a new version

    of a page to the Gollum repo root. # # name - The String name of the page. # format - The Symbol format of the page. # data - The new String contents of the page. # commit - The commit Hash details: # :message - The String commit message. # :name - The String author full name. # :email - The String email address. # :parent - Optional Grit::Commit parent to this update. # :tree - Optional String SHA1 of the tree to create the # index from. # :committer - Optional Gollum::Committer instance. If provided, # assume that this operation is part of a batch of # updates and the commit happens later. # # Returns the String SHA1 of the newly written version, or the # Gollum::Committer instance if this is part of a batch update. def write_page(name, format, data, commit = {}) # ... end end end what params are expected? what types are the params? what are valid options? With a little bit of extra work we can illuminate what this method does and make it obvious how to use it without having to dig through long method chains and a ton of code.
  38. class Gollum class Page # # # # # #

    # # # # def self.cname(name) # ... end end end One last example. Here’s a simple method. The name was obvious to me when I wrote it, but two years later, it’s a different story.
  39. class Gollum class Page # Convert a human page name

    into a canonical page name. # # name - The String human page name. # # Examples # # Page.cname("Bilbo Baggins") # # => 'Bilbo-Baggins' # # Returns the String canonical name. def self.cname(name) # ... end end end how do I use the damn thing? With just a few short lines of TomDoc, I’ve ensured that every developer that sees this code for the rest of time will understand and be able to use this method in the proper fashion. That’s a pretty big benefit for a few minutes of effort!
  40. The TomDoc specification is designed to be as simple as

    possible. You should be able to read the spec once and know how to write TomDoc without referring back to it very often. Code docs should be optimized for humans. We are the ones reading and writing it.
  41. This is Eric Hodel. He likes hats. He also likes

    TomDoc, and he just happens to be the maintainer of RDoc.
  42. RDOC 3.10 WILL SUPPORT TOMDOC He’s added TomDoc support to

    the latest versions of RDoc and if you install 3.10 or later, you can convert your TomDoc’d code to nice HTML output without any extra tools!
  43. CODE DOCUMENTATION IS FOR HUMANS KEY CONCEPT Stop optimizing your

    docs for machines, and start writing them for Future You. TomDoc is easy to write, easy to read, and saves everyone a boatload of time.
  44. DEPENDENCY HELL Version Lock Version Promiscuity There’s a dread place

    in software development called dependency hell. It’s where you end up when you have version requirements that are either overly specific or so broad that incompatible versions can sneak in and screw up your system.
  45. semver.org You can find the Semantic Versioning spec at this

    URL. It’s very short and easy to follow.
  46. PUBLIC API Remember TomDoc? The hardest part of implementing SemVer

    is defining a public API for your project. Without a public API that tells people what classes/methods/etc they can and cannot use, it is impossible to tell users how those things change over time. Remember TomDoc? If you use TomDoc and the Public/Internal/Deprecated designators, you can easily define your public API without a lot of extra work. So do that.
  47. 2.4.3 major minor patch In SemVer, there are three numbers

    that comprise the version number. Major, minor, and patch.
  48. MAJOR backwards incompatible big changes The major version number must

    be incremented anytime the public API changes in a backwards incompatible way. If you’re a responsible software developer, you don’t want this to happen very often. Maintaining backwards compatibility is a big part of not screwing over your users.
  49. MINOR backwards compatible new functionality big internal changes may contain

    bug fixes The minor version must be incremented when new functionality is added to the public API. These changes must always be backwards compatible.
  50. PATCH backwards compatible bug fixes only The patch version must

    be incremented if bugs are fixed to bring the code back into line with the documentation. These must always be backwards compatible and must not change the public API in any way.
  51. gem "gollum", "~> 2.4" BUNDLER If you follow these rules,

    you can avoid dependency hell in your project by using Bundler’s pessimistic version constraint operator. This rule means that any version >= 2.4.0 and < 3.0.0 will satisfy the requirement.
  52. If you’re worried about large version numbers, you can relax.

    They’re numbers. It’s not like they’re going to run out.
  53. USE VERSION NUMBERS TO CONVEY MEANING KEY CONCEPT Why bother

    with three part version numbers if you’re not going to convey consistent meaning with them? You may as well just use a single incrementing number if that’s the case. If you follow SemVer you can save yourself from dependency hell.
  54. Are you wasting time because of too much or too

    little planning? README DRIVEN DEVELOPMENT