Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
How to Build a GitHub
Search
Zach Holman
August 05, 2012
Programming
146
170k
How to Build a GitHub
Learn about the growth patterns and the architecture behind github.com.
Zach Holman
August 05, 2012
Tweet
Share
More Decks by Zach Holman
See All by Zach Holman
Firing People
holman
40
6.4k
Even More Emoji Abuse 🚧🚨
holman
17
10k
Move Fast and Break Nothing
holman
68
180k
The Talk on Talks
holman
66
33k
How GitHub (no longer) Works
holman
312
140k
More Git and GitHub Secrets
holman
182
110k
Keeping People
holman
64
62k
If Only I Knew This Shit in College
holman
98
100k
GitHub: Behind the Feature
holman
41
15k
Other Decks in Programming
See All in Programming
AIレシート読み取り機能をRuby on Rails on AWSで実現するLLMにまつわるアレコレ / AI-based receipt reading function powered by LLM on Ruby on Rails on AWS
moznion
3
120
Запуск 1С:УХ в крупном энтерпрайзе: мечта и реальность ПМа
lamodatech
0
940
AHC041解説
terryu16
0
310
menu基盤チームによるGoogle Cloudの活用事例~Application Integration, Cloud Tasks編~
yoshifumi_ishikura
0
200
『改訂新版 良いコード/悪いコードで学ぶ設計入門』活用方法−爆速でスキルアップする!効果的な学習アプローチ / effective-learning-of-good-code
minodriven
28
3.9k
2025.01.17_Sansan × DMM.swift
riofujimon
2
500
QA環境で誰でも自由自在に現在時刻を操って検証できるようにした話
kalibora
1
140
Fibonacci Function Gallery - Part 2
philipschwarz
PRO
0
210
責務を分離するための例外設計 - PHPカンファレンス 2024
kajitack
9
2.3k
快速入門可觀測性
blueswen
0
490
iOS開発におけるCopilot For XcodeとCode Completion / copilot for xcode
fuyan777
1
1.5k
watsonx.ai Dojo #6 継続的なAIアプリ開発と展開
oniak3ibm
PRO
0
160
Featured
See All Featured
A better future with KSS
kneath
238
17k
Bootstrapping a Software Product
garrettdimon
PRO
305
110k
Reflections from 52 weeks, 52 projects
jeffersonlam
348
20k
How to Think Like a Performance Engineer
csswizardry
22
1.3k
Dealing with People You Can't Stand - Big Design 2015
cassininazir
365
25k
Documentation Writing (for coders)
carmenintech
67
4.5k
Automating Front-end Workflow
addyosmani
1366
200k
Building a Modern Day E-commerce SEO Strategy
aleyda
38
7k
CSS Pre-Processors: Stylus, Less & Sass
bermonpainter
356
29k
4 Signs Your Business is Dying
shpigford
182
22k
Responsive Adventures: Dirty Tricks From The Dark Corners of Front-End
smashingmag
251
21k
Performance Is Good for Brains [We Love Speed 2024]
tammyeverts
7
570
Transcript
githu H O W t B U I L
D GITHUB a
githu
6.5MM REPOSITORIES LARGEST GIT HOST 1.9MM USERS SINCE 2008
6.5MM REPOSITORIES LARGEST GIT HOST 1.9MM USERS SINCE 2008 SVN
HOST
gh gh gh gh gh
gh
gh gh gh gh gh
gh SHOW YOU OUR CARDS going t
MAGIC BULLET there i n
FOUR STAGES OF GROWTH happiness the EVERYTHING automate
NO FORKING HOLMAN @ LOST YO QUIT READING THIS SHIT
ho DID WE GIT HERE
1809: PERL INVENTED
1814: COMPUTERS INVENTED
1814-2004: ANARCHY AND CHAOS AND ZOMG EVERYONE’S DYING
2005: VERSION CONTROL INVENTED git
2007: githu GLOBAL PEACE AND HAPPINESS ACHIEVED
...or something like that
PRESTON-WERNER TOM GRIT O C TOBER 9, 2 0 07
git via ruby
GRIT git via ruby github’s interface to git object-oriented, read/write
open source
repo = Grit::Repo.new('/tmp/repository') grit repo.commits
grit shelling out to git is expensive grit reimplements portions
of git in ruby native packfile and git object support 2x-100x speedup on low-level operations
grit slowly reimplement grit for speed allows for incremental improvements
LED TO GITHUB grit O C TOBER 19, 2 0
07
TODAY ADDING 2TB A MONTH 22 FILESERVER PAIRS 23TB OF
REPO DATA
GITHUB GROWTH THE FOUR STAGES of
LOCAL NETWORKED NET-SHARD GITRPC FOUR STAGES OF GROWTH GITHUB:
LOCAL NETWORKED NET-SHARD GITRPC FOUR STAGES OF GROWTH GITHUB: 2008
2009 2010 2012
LOCAL NETWORKED NET-SHARD GITRPC FOUR STAGES OF GROWTH GITHUB:
JAN 2008 DEC 2008 FOUR STAGES OF GROWTH GITHUB: 42,000
USERS
JAN 2008 DEC 2008 FOUR STAGES OF GROWTH GITHUB: 80,000
REPOSITORIES
LOCAL MULTI-VM SHARED GFS MOUNT
LOCAL MULTI-VM WEB FRONTENDS BACKGROUND WORKERS
LOCAL MULTI-VM SIMPLE ARCHITECTURE HORIZONTALLY SCALABLE-ish
LOCAL SHARED GFS MOUNT SHARED MOUNT ON EACH VM SIMILAR
PRODUCTION + DEVELOPMENT ACCESS ALLOWED LOCAL ACCESS VIA GRIT
SIMPLE APPROACH, COMMON GIT INTERFACE, QUICK TO BUILD AND SHIP
LOCAL
LOCAL NETWORKED FOUR STAGES OF GROWTH GITHUB: NET-SHARD GITRPC
2008 2009 2010 FOUR STAGES OF GROWTH GITHUB: 166,000 USERS
2008 2009 2010 FOUR STAGES OF GROWTH GITHUB: 484,000 REPOSITORIES
the problem: is slow GFS performance degraded as repos added
the problem: i/o-bound we’re read/write to disk needs to be
fast
THE PLAN NETWORKED HARDWARE MOVE DATACENTERS
NETWORKED HARDWARE bare metal servers 16 machines 6x RAM machine
roles solid datacenter got dat cloud
NETWORKED FRONTENDS FILESERVERS AUX DB LAUNCH: SERVER PAIRS
NETWORKED GRIT IS LOCAL NEEDS TO BE NETWORKED
NETWORKED smoke service is run on each fs; facilitates disk
access chimney routes the smoke, stores routing table in redis stub local grit calls, retain API usage, but send over network
NETWORKED server pairs offer failover via DRBD real servers, real
big RAM allocations
NETWORKED LATENCY networked routing adds 2-10ms per request optimize for
the roundtrip smoke contains smarter server-side logic
NETWORKED LATENCY smoke has custom git extension commands git-distinct-commits returns
commits only contained on a given branch calls to git-show-refs and git-rev-list run all calls server-side in one roundtrip
NETWORKED HORIZONTALLY-SCALABLE, LATENCY- CONSIDERATE, API-COMPATIBLE WITH GRIT
LOCAL FOUR STAGES OF GROWTH GITHUB: NET-SHARD GITRPC NETWORKED
2008 2009 2010 2011 FOUR STAGES OF GROWTH GITHUB: 510,000
USERS
2008 2009 2010 2011 FOUR STAGES OF GROWTH GITHUB: 1.3MM
REPOSITORIES
the problem: duplication data each fork is a full project
history
duplication data i create a repo you fork my
repo fs5:/data/repositories/6/nw/6b/de/92/1/1.git fs7:/data/repositories/4/na/3b/dr/72/2/2.git
duplication data 1,000 commits 1,001 commits 10MB 10MB 20MB
total disk }
duplication data 1,000 commits 1 commit 1KB 10MB 10MB
total disk }GOAL:
duplication data 75 MB repo 3.5k forks x ~250
GB x 2 fs pairs + offsite backups
NET-SHARD shard by repository network (“forks”)
NET-SHARD network.git 1.git 2.git 3.git 4.git CONTAINS DELTA }CONTAINS ALL
REFS ›
NET-SHARD network.git GIT ALTERNATES store git object data externally to
repository we fetch refs into your fork, transparently
NET-SHARD network.git PRIVACY potential leaking of refs cross-network net-shard enabled
on all-public and all-private repository networks only
NET-SHARD network.git DISK halves disk usage increase disk and kernel
cache hits
NET-SHARD network.git MIGRATION gradually transitioned repos to network.git effectively feature-flagged
by repo
NET-SHARD SAVE DISK, IMPROVE PERFORMANCE
LOCAL FOUR STAGES OF GROWTH GITHUB: GITRPC NETWORKED NET-SHARD
2008 2009 2010 2011 2012 FOUR STAGES OF GROWTH GITHUB:
1.2MM USERS
2008 2009 2010 2011 2012 AUGUST FOUR STAGES OF GROWTH
GITHUB: 1.9MM USERS
2008 2009 2010 2011 2012 FOUR STAGES OF GROWTH GITHUB:
3.4MM REPOSITORIES
2008 2009 2010 2011 2012 AUGUST FOUR STAGES OF GROWTH
GITHUB: 6.5MM REPOSITORIES
the problem: GRIT git via ruby
the problem: local, ruby-based grit ended up in a high-traffic
distributed system
the problem: inelegant code spread out everywhere
GITRPC network-oriented library for git access GitRPC
GITRPC open source fastest git implementation (C) github-sponsored project bindings
for all major languages used in our mac, windows clients
GITRPC rugged (RUBY) libgit2 (C) gitrpc (RUBY)
GITRPC like smoke, gitrpc aims to reduce latency by reducing
roundtrips LATENCY
GITRPC operations cached on library level CACHING yank out tons
of app-level cache logic
GITRPC the move to gitrpc started this summer and will
take months MIGRATION gradually replace smoke and grit; avoids a risky deploy
FAST AND STABLE NETWORKED GIT ACCESS GITRPC
LOCAL NETWORKED NET-SHARD GITRPC FOUR STAGES OF GROWTH GITHUB:
identify WHAT’S BROKEN
sma CHANGES, FAST DEVELOPMENT
realCODE BEATS IMAGINARY CODE
EVERYTHING automate automate automate automate automate AUTOMATE automate automate automate
automate automate automate
m . manage LOL DEVELOPERS SOFTWARE
DEVELOPMENT
m . manage DEADLINES MEETINGS PRIORITIES ESTIMATES
m . manage DEADLINES MEETINGS PRIORITIES ESTIMATES
EVERYONE i A MANAGER
AUTOMATE AWAY PAIN DEPLOYMENT RECOVERY DEVELOPMENT
DEVELOPMENT automate
DEVELOPMENT > ./do-work RUN THIS IN EACH PROJECT: ...AND YOU’RE
DONE! loljk
DEVELOPMENT YOU CAN AUTOMATE THE PAIN OF DEVELOPMENT
SETUP DEVELOPMENT the
SETUP DEVELOPMENT the ONE-LINER INSTALLS ALL GITHUB DEVELOPMENT DEPENDENCIES
30 min SETUP DEVELOPMENT the CLEAN MACHINE TO FULL
DEVELOPMENT ENVIRONMENT
SETUP DEVELOPMENT the NEW EMPLOYEES SHIP THEIR FIRST WEEK
SETUP DEVELOPMENT the PUPPET HANDLES ALL DEPENDENCIES
DEPLOYMENT automate
DEPLOYMENT REAL BROGRAMMERS DEPLOY WITH NO FEAR SO FUCK THAT
DEPLOYMENT DEPLOYS SHOULD BE CAUTIOUS, COMMONPLACE, AND AUTOMATED
DEPLOYMENT GITHUB DEPLOYS 20-40 TIMES A DAY
DEPLOYMENT PUSH BRANCH DEPLOY BRANCH EVERYWHERE · MACHINE CLASS ·
SPECIFIC SERVERS HUBOT RUNS TESTS IN ABOUT 200 SECONDS USUALLY OPEN A PULL REQUEST
DEPLOYMENT DEPLOY LOCKING CAN’T DEPLOY IF A BRANCH IS DEPLOYED
AUTODEPLOYS PUSHED TO MASTER WITH GREEN TESTS? DEPLOY.
DEPLOYMENT STAFF-ONLY FEATURE FLAGS LIMITS EXPOSURE · REAL-WORLD · AVOIDS
MERGES
RECOVERY automate
RECOVERY SOMETHING WILL ALWAYS BREAK
RECOVERY HUBOT IS A SYSADMIN
RECOVERY HUBOT LOAD HUBOT QUERIES HUBOT CONNS SERVER LOAD RUNNING
DB QUERIES ALL OPEN CONNECTIONS
RECOVERY HUBOT RESTORE <REPO> HUBOT PUSH-LOG <REPO> HUBOT GH-EACH <HOST>
<COMMAND> RESTORE A REPO FROM BACKUPS SEE RECENT PUSH LOGS TO A REPO RUN COMMAND ON SPECIFIC HOSTS
HIGH-LEVEL OVERVIEW IN MINUTES SPEND MORE TIME FIXING AND LESS
TIME INVESTIGATING RECOVERY
— happiness the — — — —
EMPLOYEES HAVE QUIT YEARS 5 EMPLOYEES 108 ZERO
1-2 MONTHS HIRE 1-3 MONTHS RAMP-UP 2 WEEKS LEAVE
LOSING AN EMPLOYEE CAN SET YOU BACK HALF A YEAR
remove ANY REASON TO LEAVE — — — — —
— — — — — — — — — — — —
TDD✓ PAIR PROGRAMMING ✓ BDD ✓ TEST-FIRST ✓ DESIGN-FIRST ✓
(just kidding) EMACS x NONE OF THESE ✓
WE CARE ABOUT THE WORK YOU DO, NOT ABOUT HOW
YOU DO IT
LOCATION HOURS DIRECTION
LOCATION HOURS DIRECTION GITHUB EMPLOYEES WORK REMOTELY
⅔
LOCATION HOURS DIRECTION FAMILY RELOCATION, TRAVEL FREEDOM
LOCATION HOURS DIRECTION CHOOSE YOUR SCHEDULE CHOOSE
YOUR VACATIONS FRESH, CREATIVE EMPLOYEES
LOCATION HOURS DIRECTION YOU HACK ON THINGS
THAT INTEREST YOU REDUCES BURNOUT
flexible LOCATION HOURS DIRECTION BE TOWARDS WORK/LIFE
githu
basica y, MOVE FAST = SMALL CHANGES
basica y, BE STABLE = DEPLOY CONSTANTLY
basica y, HAPPY COMPANY = HAPPY EMPLOYEES
thank
NO FORKING HOLMAN @ LOST YO QUIT READING THIS SHIT
ZACHHOLMAN.COM/TALKS