Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Testing Rails at Scale
Search
Emil Stolarsky
May 04, 2016
Technology
4.7k
2
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
Testing Rails at Scale
Emil Stolarsky
May 04, 2016
More Decks by Emil Stolarsky
See All by Emil Stolarsky
How Not to Go Boom: Lessons for SREs from Oil Refineries
es
0
94
Incident insights from NASA, NTSB, and the CDC
es
0
640
Flash Sale Engineering
es
0
96
Other Decks in Technology
See All in Technology
[チョークトーク資料]AWS DevOps Agent を使いこなす / AWS Dev Ops Agent Chalk Talk AWS Summit Japan 2026
kinunori
3
550
Oracle AI Database@Google Cloud:サービス概要のご紹介
oracle4engineer
PRO
6
1.5k
2026 TECHFRESH 畢業分享會 - AI-Native 重塑軟體工程與虛擬講師
line_developers_tw
PRO
0
1.3k
脱SaaS!FDEを支えるプロビジョニングと分離設計
knih
0
240
Oracle AI Database@AWS:サービス概要のご紹介
oracle4engineer
PRO
4
3k
「勝手に広まる」人気 AI エージェントを爆速で作ろう!(AWS Summit Japan 2026講演資料)
minorun365
PRO
7
1.9k
Claude Codeをどのように キャッチアップしているか
oikon48
13
8.5k
不要なレビューをAIにまかせて AIコーディングの環境改善を加速した
shoota
1
220
サイバーエージェントにおけるAI推進戦略と変革への取り組み
shotatsuge
0
110
いまさら聞けない「仕様駆動開発入門」 〜AI活用時代の開発プロセスを考える〜
findy_eventslides
2
160
Kiro Ambassador を目指す話
k_adachi_01
0
110
Kubernetesにおける学習基盤とLLMOpsの概要
ry
1
320
Featured
See All Featured
Large-scale JavaScript Application Architecture
addyosmani
515
110k
Fight the Zombie Pattern Library - RWD Summit 2016
marcelosomers
234
17k
Done Done
chrislema
186
16k
My Coaching Mixtape
mlcsv
0
150
4 Signs Your Business is Dying
shpigford
187
22k
30 Presentation Tips
portentint
PRO
1
330
A Modern Web Designer's Workflow
chriscoyier
698
190k
Efficient Content Optimization with Google Search Console & Apps Script
katarinadahlin
PRO
1
630
Building Flexible Design Systems
yeseniaperezcruz
330
40k
CSS Pre-Processors: Stylus, Less & Sass
bermonpainter
360
30k
Templates, Plugins, & Blocks: Oh My! Creating the theme that thinks of everything
marktimemedia
31
2.8k
How to Build an AI Search Optimization Roadmap - Criteria and Steps to Take #SEOIRL
aleyda
1
2.1k
Transcript
Testing Rails at Scale BY @E MILS TO LARSK Y
2
3 Shopify 243,000+ S H O P S $14B+ TOTA
L G M V 300M+ U N I Q U E V I S I T S / M O N T H 1000+ E M P LOY E E S
4 CI Systems
5 Scheduler Compute
6 Scheduler Compute
7 Scheduler Compute
8 Managed Provider
9 Managed Provider • Multi-tenant • Closed system • Examples
– CircleCI, Codeship, Hosted TravisCI
10 Unmanaged Provider
11 Unmanaged Provider • Self-hosted • Open system • Examples
– Jenkins, TravisCI, Strider
12 Daily CI Stats 50,000+ C O N TA I
N E R S B O OT E D F O R T E S T I N G 700 B U I L D S 42,000+ T E S T S P E R B U I L D 5 min B U I L D T I M E
13 Shopify using a Hosted Provider • 20+ minute build
times • Flakiness from resource starvation • Expensive
14 A N EW HOPE
15 Beginning of a Journey • Bring build times under
5 minutes • Restore confidence in our CI • Maintain current budget
16
17 c4.8xlarge Webhooks Code push Agent Instructions
18 Compute Cluster 5.4 TB M E M O R
Y 3240 C P U C O R E S 90 F L E E T S I Z E AT P E A K c4.8xlarge I N S TA N C E T Y P E
19 Instances • AWS Hosted • Managed with Chef •
Memory bound • IO Optimizations
20 SCROOGE
21 Auto Scaling with Scrooge c4.8xlarge Capacity Requirements Scrooge Boot/Shutdown
Nodes c4.8xlarge
• AWS specific optimizations • Improve utilization • Not one
size fits all 22 Optimizing Cost
23 Graphing Productivity Active Buildkite Agents
24 Graphing Productivity Active Buildkite Agents ? ? ?
25 Graphing Productivity Active Buildkite Agents ? ? Lunch rush
#1
26 Graphing Productivity Active Buildkite Agents ? Commit + Push
Lunch rush #1
27 Graphing Productivity Active Buildkite Agents Lunch rush #2 Commit
+ Push Lunch rush #1
28 Docker • Boot speedup • Test isolation • Distribution
29 Building Containers with Locutus • Implements custom docker build
API • Single EC2 machine • Forced debt repayment
30 Test Distribution • Tests allocated based on container index
• Ruby tests and browser tests are run on seperate containers • Outliers inflated build times
31 Artifacts • Artifacts are uploaded to S3 by Buildkite
Agents • Events log into Kafka & StatsD • Data tools are used to identify flaky tests
32 Capacity Requirements Scrooge Boot/Shutdown Nodes Agent Instructions Webhooks Pull
Containers Pull Revision
D OC K ER S T R IK ES BAC
K
34 Rebel base is under Attack • Shipping second provider
brought confusion • Locutus capacity issues • Tests times were still high
35 Battling Confusion • Botched rollout • Instability further eroded
developer confidence
36 Clustering Locutus • Make it linearly scalable • Keep
it stateless(-ish)
37 Locutus Diagram Worker Worker Worker Worker Worker Pool Cache
Ring Coordinator Docker Registry Container push New containers
38 Test Distribution v2 • Loads all tests into Redis
• Containers pull work off queue • No more container specialization
39 Capacity Requirements Scrooge Boot/Shutdown Nodes Agent Instructions Webhooks Pull
Containers Code push webhook
40 RETU RN OF TH E STA BLE B UI
LD
41 Docker • No one tests starting 10,000’s of containers/day
• Instability further eroded developer confidence • Every new version of docker had major bugs
42 Handling Infrastructure Failures • At non-trivial scale, you’re guaranteed
failures • Swallow infrastructure failures, never test failures • We still see 100+ container failures a day
43 Treating Servers as Pets 1. Wait for reports to
stream in of build issues 2. Flag node as in maintenance 3. Manually take node out of rotation 4. ssh into the node and follow playbook steps to cleanup disk
44 Treating Servers as Cattle 1. Auto detect the failures
2. Node removes itself from rotation 3. Node runs script to cleanup disk
45 I love the internet.
46
47
48 Test Distribution v3 • Containers record the tests they
ran • Allow flakey tests to be rerun • Ensure no tests are lost
49 Capacity Requirements Scrooge Boot/Shutdown Nodes Agent Instructions Webhooks Pull
Containers Code push webhook
CONCU LSO N
51 Don’t build your own CI • Build times <10
minutes • Small application
52 Build your own CI • Build times >15 minutes
• Monolithic Application • Parallelization Limits
53 Lessons Learned • Commit 100% • Beware of Rabbit
holes • Pets vs. Cattle
54 Blank Slide Thanks! Fo llow m e o n
Tw it te r @Em ilSt ol arsky
55 Credits • Image of shipping containers: https://goo.gl/bXCn1X, https://goo.gl/cDDnYy •
Images of Google DCs: https://goo.gl/UHVRc • Image of bank vault: https://goo.gl/fFN5EJ • Locutus: http://goo.gl/UyoJxx • Warehouse: https://goo.gl/5DiiR1 • Egyptian Temple: https://goo.gl/GjbLcq • Star wars: http://goo.gl/474wYG • Sinking container ship: http://goo.gl/U7rdR8, http://goo.gl/wlzlrm • Cats: http://goo.gl/9p2JXo, https://goo.gl/Ylhl60 • Cattle: http://goo.gl/IBdXmx • Star Wars: http://goo.gl/LatPEj • Creative Commons License: https://goo.gl/sZ7V7x