Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Small Data: Storage For The Rest Of Us
Search
Andrew Godwin
May 26, 2015
Programming
1
510
Small Data: Storage For The Rest Of Us
A talk I gave at PyWaw Summit 2015.
Andrew Godwin
May 26, 2015
Tweet
Share
More Decks by Andrew Godwin
See All by Andrew Godwin
Reconciling Everything
andrewgodwin
1
230
Django Through The Years
andrewgodwin
0
130
Writing Maintainable Software At Scale
andrewgodwin
0
370
A Newcomer's Guide To Airflow's Architecture
andrewgodwin
0
280
Async, Python, and the Future
andrewgodwin
2
570
How To Break Django: With Async
andrewgodwin
1
620
Taking Django's ORM Async
andrewgodwin
0
640
The Long Road To Asynchrony
andrewgodwin
0
560
The Scientist & The Engineer
andrewgodwin
1
650
Other Decks in Programming
See All in Programming
What you can do with Ruby on WebAssembly
kateinoigakukun
0
170
開発を加速する共有Swift Package実践
elmetal
PRO
0
420
Desafios e Lições Aprendidas na Migração de Monólitos para Microsserviços em Java
jessilyneh
2
150
rails_girls_is_my_gate_to_join_the_ruby_commuinty
maimux2x
0
200
LangChainの現在とv0.3にむけて
os1ma
4
940
【TID2024】模擬講義:プログラマと一緒にゲームをデザインしてみよう!
akatsukigames_tech
0
670
The Shape of a Service Object
inem
0
520
状態管理ライブラリZustandの導入から運用まで
k1tikurisu
3
470
Kotlin 2.0 and Beyond
antonarhipov
2
150
事業フェーズの変化に対応する 開発生産性向上のゼロイチ
masaygggg
0
200
実践!難読化ガイド
mitchan
0
220
Scala アプリケーションのビルドを改善してデプロイ時間を 1/4 にした話 | How I improved the build of my Scala application and reduced deployment time by 4x
nomadblacky
1
180
Featured
See All Featured
JavaScript: Past, Present, and Future - NDC Porto 2020
reverentgeek
45
4.8k
Docker and Python
trallard
39
3k
Infographics Made Easy
chrislema
239
18k
Building Better People: How to give real-time feedback that sticks.
wjessup
359
19k
The Language of Interfaces
destraynor
153
23k
Making the Leap to Tech Lead
cromwellryan
128
8.8k
Creatively Recalculating Your Daily Design Routine
revolveconf
215
12k
Facilitating Awesome Meetings
lara
49
6k
Fight the Zombie Pattern Library - RWD Summit 2016
marcelosomers
230
17k
VelocityConf: Rendering Performance Case Studies
addyosmani
322
23k
jQuery: Nuts, Bolts and Bling
dougneiner
61
7.4k
Templates, Plugins, & Blocks: Oh My! Creating the theme that thinks of everything
marktimemedia
26
2k
Transcript
Andrew Godwin @andrewgodwin SMALL DATA STORAGE FOR THE REST OF
US
Andrew Godwin Hi, I'm Django Core Developer Senior Engineer at
Far too many hobbies
BIG DATA What does it mean?
BIG DATA What does it mean? What is 'big'?
1,000 rows? 1,000,000 rows? 1,000,000,000 rows? 1,000,000,000,000 rows?
Scalable designs are a tradeoff: NOW LATER vs
Small company? Agency? Focus on ease of change, not scalability
You don't need to scale from day one But always
leave yourself scaling points
Rapid development Continuous deployment Hardware choice Scaling 'breakpoints'
Rapid development It's all about schema change overhead
Explicit Schema ID int Name text Weight uint 1 2
3 Alice Bob Charles 76 84 65 Implicit Schema { "id": 342, "name": "David", "weight": 44, }
Silent Failure { "id": 342, "name": "David", "weight": 74, }
{ "id": 342, "name": "Ellie", "weight": "85kg", } { "id": 342, "nom": "Frankie", "weight": 77, } { "id": 342, "name": "Frankie", "weight": -67, }
Continuous deployment It's 11pm. Do you know where your locks
are?
Add NULL and backfill 1-to-1 relation and backfill DBMS-supported type
changes
Hardware choice ZOMG RUN IT ON THE CLOUD
VMs are TERRIBLE at IO Up to 10x slowdown, even
with VT-d.
Memory is king Your database loves it. Don't let other
apps steal it.
Adding more power goes far Especially with PostgreSQL or read-only
replicas
Scaling Breakpoints
Sharding point Datasets paritioned by primary key
Vertical split Entirely unrelated tables
Denormalisation It's not free!
Consistency leeway Can you take inconsistent views?
Load Shapes
Read-heavy Write-heavy Large size
Read-heavy Write-heavy Large size Wikipedia TV show website Minecraft Forums
Amazon Glacier Eventbrite Logging
Read-heavy Write-heavy Large size Offline storage Append formats In-memory cache
/ flat files Many indexes Fewer indexes
Extremes
Extreme Reads Heavy Replication Extreme Writes Sacrifice ordering or consistency
Extreme Size Sacrifice query time
Extreme Longevity Flash in cold storage Extreme Survivability Rad-hardened Flash
Extreme Auditability True append only storage
SSDs Magnetic Tape Hard Drives Consumer Flash CDs/DVDs Long-life Flash
Metal-Carbon DVDs 3-6 months 5-10 years 3-5 years 100+ years Approximate time to bit flip, unpowered at room temperature
Big Data isn't one thing It depends on type, size,
complexity, throughput, latency...
Focus on the current problems Future problems don't matter if
you never get there
Efficiency and iterating fast matters The smaller you are, the
more time is worth
Good architecture affects product You're not writing a system in
a vacuum
Thanks. Andrew Godwin @andrewgodwin