Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Small Data: Storage For The Rest Of Us
Search
Andrew Godwin
May 26, 2015
Programming
1
540
Small Data: Storage For The Rest Of Us
A talk I gave at PyWaw Summit 2015.
Andrew Godwin
May 26, 2015
Tweet
Share
More Decks by Andrew Godwin
See All by Andrew Godwin
Reconciling Everything
andrewgodwin
1
260
Django Through The Years
andrewgodwin
0
160
Writing Maintainable Software At Scale
andrewgodwin
0
390
A Newcomer's Guide To Airflow's Architecture
andrewgodwin
0
310
Async, Python, and the Future
andrewgodwin
2
610
How To Break Django: With Async
andrewgodwin
1
660
Taking Django's ORM Async
andrewgodwin
0
670
The Long Road To Asynchrony
andrewgodwin
0
590
The Scientist & The Engineer
andrewgodwin
1
690
Other Decks in Programming
See All in Programming
見えないメモリを観測する: PHP 8.4 `pg_result_memory_size()` とSQL結果のメモリ管理
kentaroutakeda
0
450
「とりあえず動く」コードはよい、「読みやすい」コードはもっとよい / Code that 'just works' is good, but code that is 'readable' is even better.
mkmk884
3
610
MCP with Cloudflare Workers
yusukebe
2
220
暇に任せてProxmoxコンソール 作ってみました
karugamo
2
720
フロントエンドのディレクトリ構成どうしてる? Feature-Sliced Design 導入体験談
osakatechlab
8
4.1k
コンテナをたくさん詰め込んだシステムとランタイムの変化
makihiro
1
140
Cloudflare MCP ServerでClaude Desktop からWeb APIを構築
kutakutat
1
560
わたしの星のままで一番星になる ~ 出産を機にSIerからEC事業会社に転職した話 ~
kimura_m_29
0
180
Go の GC の不得意な部分を克服したい
taiyow
3
800
rails statsで大解剖 🔍 “B/43流” のRailsの育て方を歴史とともに振り返ります
shoheimitani
2
950
선언형 UI에서의 상태관리
l2hyunwoo
0
180
技術的負債と向き合うカイゼン活動を1年続けて分かった "持続可能" なプロダクト開発
yuichiro_serita
0
120
Featured
See All Featured
YesSQL, Process and Tooling at Scale
rocio
169
14k
The Language of Interfaces
destraynor
154
24k
Building an army of robots
kneath
302
44k
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
132
33k
Practical Tips for Bootstrapping Information Extraction Pipelines
honnibal
PRO
10
810
Responsive Adventures: Dirty Tricks From The Dark Corners of Front-End
smashingmag
251
21k
The World Runs on Bad Software
bkeepers
PRO
66
11k
[Rails World 2023 - Day 1 Closing Keynote] - The Magic of Rails
eileencodes
33
2k
Rebuilding a faster, lazier Slack
samanthasiow
79
8.7k
Java REST API Framework Comparison - PWX 2021
mraible
28
8.3k
Design and Strategy: How to Deal with People Who Don’t "Get" Design
morganepeng
127
18k
How to Think Like a Performance Engineer
csswizardry
22
1.2k
Transcript
Andrew Godwin @andrewgodwin SMALL DATA STORAGE FOR THE REST OF
US
Andrew Godwin Hi, I'm Django Core Developer Senior Engineer at
Far too many hobbies
BIG DATA What does it mean?
BIG DATA What does it mean? What is 'big'?
1,000 rows? 1,000,000 rows? 1,000,000,000 rows? 1,000,000,000,000 rows?
Scalable designs are a tradeoff: NOW LATER vs
Small company? Agency? Focus on ease of change, not scalability
You don't need to scale from day one But always
leave yourself scaling points
Rapid development Continuous deployment Hardware choice Scaling 'breakpoints'
Rapid development It's all about schema change overhead
Explicit Schema ID int Name text Weight uint 1 2
3 Alice Bob Charles 76 84 65 Implicit Schema { "id": 342, "name": "David", "weight": 44, }
Silent Failure { "id": 342, "name": "David", "weight": 74, }
{ "id": 342, "name": "Ellie", "weight": "85kg", } { "id": 342, "nom": "Frankie", "weight": 77, } { "id": 342, "name": "Frankie", "weight": -67, }
Continuous deployment It's 11pm. Do you know where your locks
are?
Add NULL and backfill 1-to-1 relation and backfill DBMS-supported type
changes
Hardware choice ZOMG RUN IT ON THE CLOUD
VMs are TERRIBLE at IO Up to 10x slowdown, even
with VT-d.
Memory is king Your database loves it. Don't let other
apps steal it.
Adding more power goes far Especially with PostgreSQL or read-only
replicas
Scaling Breakpoints
Sharding point Datasets paritioned by primary key
Vertical split Entirely unrelated tables
Denormalisation It's not free!
Consistency leeway Can you take inconsistent views?
Load Shapes
Read-heavy Write-heavy Large size
Read-heavy Write-heavy Large size Wikipedia TV show website Minecraft Forums
Amazon Glacier Eventbrite Logging
Read-heavy Write-heavy Large size Offline storage Append formats In-memory cache
/ flat files Many indexes Fewer indexes
Extremes
Extreme Reads Heavy Replication Extreme Writes Sacrifice ordering or consistency
Extreme Size Sacrifice query time
Extreme Longevity Flash in cold storage Extreme Survivability Rad-hardened Flash
Extreme Auditability True append only storage
SSDs Magnetic Tape Hard Drives Consumer Flash CDs/DVDs Long-life Flash
Metal-Carbon DVDs 3-6 months 5-10 years 3-5 years 100+ years Approximate time to bit flip, unpowered at room temperature
Big Data isn't one thing It depends on type, size,
complexity, throughput, latency...
Focus on the current problems Future problems don't matter if
you never get there
Efficiency and iterating fast matters The smaller you are, the
more time is worth
Good architecture affects product You're not writing a system in
a vacuum
Thanks. Andrew Godwin @andrewgodwin