Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Small Data: Storage For The Rest Of Us
Search
Andrew Godwin
May 26, 2015
Programming
630
1
Share
Small Data: Storage For The Rest Of Us
A talk I gave at PyWaw Summit 2015.
Andrew Godwin
May 26, 2015
More Decks by Andrew Godwin
See All by Andrew Godwin
Reconciling Everything
andrewgodwin
1
380
Django Through The Years
andrewgodwin
0
300
Writing Maintainable Software At Scale
andrewgodwin
0
510
A Newcomer's Guide To Airflow's Architecture
andrewgodwin
0
400
Async, Python, and the Future
andrewgodwin
2
720
How To Break Django: With Async
andrewgodwin
1
790
Taking Django's ORM Async
andrewgodwin
0
780
The Long Road To Asynchrony
andrewgodwin
0
750
The Scientist & The Engineer
andrewgodwin
1
820
Other Decks in Programming
See All in Programming
Redox OS でのネームスペース管理と chroot の実現
isanethen
0
500
Kubernetesでセルフホストが簡単なNewSQLを求めて / Seeking a NewSQL Database That's Simple to Self-Host on Kubernetes
nnaka2992
0
190
20260320登壇資料
pharct
0
150
Cyrius ーLinux非依存にコンテナをネイティブ実行する専用OSー
n4mlz
0
270
実践ハーネスエンジニアリング #MOSHTech
kajitack
7
5.4k
Strategy for Finding a Problem for OSS: With Real Examples
kibitan
0
130
Codexに役割を持たせる 他のAIエージェントと組み合わせる実務Tips
o8n
4
1.5k
AWS re:Invent 2025の少し振り返り + DevOps AgentとBacklogを連携させてみた
satoshi256kbyte
1
110
Symfony + NelmioApiDocBundle を使った スキーマ駆動開発 / Schema Driven Development with NelmioApiDocBundle
okashoi
0
260
LM Linkで(非力な!)ノートPCでローカルLLM
seosoft
0
330
AIコードレビューの導入・運用と AI駆動開発における「AI4QA」の取り組みについて
hagevvashi
0
590
AI時代のシステム設計:ドメインモデルで変更しやすさを守る設計戦略
masuda220
PRO
6
1.2k
Featured
See All Featured
Lessons Learnt from Crawling 1000+ Websites
charlesmeaden
PRO
1
1.2k
Highjacked: Video Game Concept Design
rkendrick25
PRO
1
340
Ten Tips & Tricks for a 🌱 transition
stuffmc
0
95
Building Adaptive Systems
keathley
44
3k
Dominate Local Search Results - an insider guide to GBP, reviews, and Local SEO
greggifford
PRO
0
130
[RailsConf 2023] Rails as a piece of cake
palkan
59
6.4k
Jamie Indigo - Trashchat’s Guide to Black Boxes: Technical SEO Tactics for LLMs
techseoconnect
PRO
0
92
The Hidden Cost of Media on the Web [PixelPalooza 2025]
tammyeverts
2
260
AI in Enterprises - Java and Open Source to the Rescue
ivargrimstad
0
1.2k
Public Speaking Without Barfing On Your Shoes - THAT 2023
reverentgeek
1
350
Designing Experiences People Love
moore
143
24k
brightonSEO & MeasureFest 2025 - Christian Goodrich - Winning strategies for Black Friday CRO & PPC
cargoodrich
3
140
Transcript
Andrew Godwin @andrewgodwin SMALL DATA STORAGE FOR THE REST OF
US
Andrew Godwin Hi, I'm Django Core Developer Senior Engineer at
Far too many hobbies
BIG DATA What does it mean?
BIG DATA What does it mean? What is 'big'?
1,000 rows? 1,000,000 rows? 1,000,000,000 rows? 1,000,000,000,000 rows?
Scalable designs are a tradeoff: NOW LATER vs
Small company? Agency? Focus on ease of change, not scalability
You don't need to scale from day one But always
leave yourself scaling points
Rapid development Continuous deployment Hardware choice Scaling 'breakpoints'
Rapid development It's all about schema change overhead
Explicit Schema ID int Name text Weight uint 1 2
3 Alice Bob Charles 76 84 65 Implicit Schema { "id": 342, "name": "David", "weight": 44, }
Silent Failure { "id": 342, "name": "David", "weight": 74, }
{ "id": 342, "name": "Ellie", "weight": "85kg", } { "id": 342, "nom": "Frankie", "weight": 77, } { "id": 342, "name": "Frankie", "weight": -67, }
Continuous deployment It's 11pm. Do you know where your locks
are?
Add NULL and backfill 1-to-1 relation and backfill DBMS-supported type
changes
Hardware choice ZOMG RUN IT ON THE CLOUD
VMs are TERRIBLE at IO Up to 10x slowdown, even
with VT-d.
Memory is king Your database loves it. Don't let other
apps steal it.
Adding more power goes far Especially with PostgreSQL or read-only
replicas
Scaling Breakpoints
Sharding point Datasets paritioned by primary key
Vertical split Entirely unrelated tables
Denormalisation It's not free!
Consistency leeway Can you take inconsistent views?
Load Shapes
Read-heavy Write-heavy Large size
Read-heavy Write-heavy Large size Wikipedia TV show website Minecraft Forums
Amazon Glacier Eventbrite Logging
Read-heavy Write-heavy Large size Offline storage Append formats In-memory cache
/ flat files Many indexes Fewer indexes
Extremes
Extreme Reads Heavy Replication Extreme Writes Sacrifice ordering or consistency
Extreme Size Sacrifice query time
Extreme Longevity Flash in cold storage Extreme Survivability Rad-hardened Flash
Extreme Auditability True append only storage
SSDs Magnetic Tape Hard Drives Consumer Flash CDs/DVDs Long-life Flash
Metal-Carbon DVDs 3-6 months 5-10 years 3-5 years 100+ years Approximate time to bit flip, unpowered at room temperature
Big Data isn't one thing It depends on type, size,
complexity, throughput, latency...
Focus on the current problems Future problems don't matter if
you never get there
Efficiency and iterating fast matters The smaller you are, the
more time is worth
Good architecture affects product You're not writing a system in
a vacuum
Thanks. Andrew Godwin @andrewgodwin