Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Billing the Cloud
Search
Pierre-Yves Ritschard
May 12, 2017
Programming
0
310
Billing the Cloud
Updated billing the cloud slides for We are Developers 2017 in Vienna
Pierre-Yves Ritschard
May 12, 2017
Tweet
Share
More Decks by Pierre-Yves Ritschard
See All by Pierre-Yves Ritschard
Meetup Camptocamp: Exoscale SKS
pyr
0
450
The (long) road to Kubernetes
pyr
0
310
From vertical to horizontal: The challenges of scalability in the cloud
pyr
0
70
Change Management at Scale
pyr
0
110
5 years of Clojure
pyr
2
1k
Taming Jenkins
pyr
0
51
Init: then and now
pyr
1
200
From Vertical to Horizontal
pyr
2
140
Billing the Cloud
pyr
7
2.2k
Other Decks in Programming
See All in Programming
Vue・React マルチプロダクト開発を支える Vite
andpad
0
110
Langfuseと歩む生成AI活用推進
licux
3
320
モバイルアプリからWebへの横展開を加速した話_Claude_Code_実践術.pdf
kazuyasakamoto
0
290
TROCCO×dbtで実現する人にもAIにもやさしいデータ基盤
nealle
0
390
開発チーム・開発組織の設計改善スキルの向上
masuda220
PRO
18
9.6k
Azure SRE Agentで運用は楽になるのか?
kkamegawa
0
1k
兎に角、コードレビュー
mitohato14
0
160
testingを眺める
matumoto
1
130
テストカバレッジ100%を10年続けて得られた学びと品質
mottyzzz
2
420
コンテキストエンジニアリング Cursor編
kinopeee
1
740
パスタの技術
yusukebe
1
550
Improving my own Ruby thereafter
sisshiki1969
1
140
Featured
See All Featured
The Pragmatic Product Professional
lauravandoore
36
6.8k
Save Time (by Creating Custom Rails Generators)
garrettdimon
PRO
32
1.5k
Improving Core Web Vitals using Speculation Rules API
sergeychernyshev
18
1.1k
Chrome DevTools: State of the Union 2024 - Debugging React & Beyond
addyosmani
7
830
Done Done
chrislema
185
16k
Scaling GitHub
holman
463
140k
Testing 201, or: Great Expectations
jmmastey
45
7.6k
Thoughts on Productivity
jonyablonski
69
4.8k
YesSQL, Process and Tooling at Scale
rocio
173
14k
Dealing with People You Can't Stand - Big Design 2015
cassininazir
367
27k
GraphQLとの向き合い方2022年版
quramy
49
14k
The Power of CSS Pseudo Elements
geoffreycrofte
77
5.9k
Transcript
@pyr Billing the cloud Real world stream processing
@pyr Three-line bio • CTO & co-founder at Exoscale •
Open Source Developer • Monitoring & Distributed Systems Enthusiast
@pyr Billing the cloud Real world stream processing
@pyr • Billing resources • Scaling methodologies • Our approach
@pyr
@pyr provider "exoscale" { api_key = "${var.exoscale_api_key}" secret_key = "${var.exoscale_secret_key}"
} resource "exoscale_instance" "web" { template = "ubuntu 17.04" disk_size = "50g" template = "ubuntu 17.04" profile = "medium" ssh_key = "production" }
None
None
@pyr Infrastructure isn’t free! (sorry)
@pyr Business Model • Provide cloud infrastructure • (???) •
Profit!
None
None
@pyr 10000 mile high view
None
Quantities
Quantities • 10 megabytes have been set from 159.100.251.251 over
the last minute
Resources
Resources • Account WAD started instance foo with profile large
today at 12:00 • Account WAD stopped instance foo today at 12:15
A bit closer to reality {:type :usage :entity :vm :action
:create :time #inst "2016-12-12T15:48:32.000-00:00" :template "ubuntu-16.04" :source :cloudstack :account "geneva-jug" :uuid "7a070a3d-66ff-4658-ab08-fe3cecd7c70f" :version 1 :offering "medium"}
A bit closer to reality message IPMeasure { /* Versioning
*/ required uint32 header = 1; required uint32 saddr = 2; required uint64 bytes = 3; /* Validity */ required uint64 start = 4; required uint64 end = 5; }
@pyr Theory
@pyr Quantities are simple
None
@pyr Resources are harder
None
@pyr This is per account
None
@pyr Solving for all events
resources = {} metering = [] def usage_metering(): for event
in fetch_all_events(): uuid = event.uuid() time = event.time() if event.action() == 'start': resources[uuid] = time else: timespan = duration(resources[uuid], time) usage = Usage(uuid, timespan) metering.append(usage) return metering
@pyr In Practice
@pyr • This is a never-ending process • Minute-precision billing
• Applied every hour
@pyr • Avoid overbilling at all cost • Avoid underbilling
(we need to eat!)
@pyr • Keep a small operational footprint
@pyr A naive approach
30 * * * * usage-metering >/dev/null 2>&1
None
@pyr Advantages
@pyr • Low operational overhead • Simple functional boundaries •
Easy to test
@pyr Drawbacks
@pyr • High pressure on SQL server • Hard to
avoid overlapping jobs • Overlaps result in longer metering intervals
You are in a room full of overlapping cron jobs.
You can hear the screams of a dying MySQL server. An Oracle vendor is here. To the West, a door is marked “Map/Reduce” To the East, a door is marked “Stream Processing”
> Talk to Oracle
You’ve been eaten by a grue.
> Go West
@pyr
@pyr • Conceptually simple • Spreads easily • Data locality
aware processing
@pyr • ETL • High latency • High operational overhead
> Go East
@pyr
@pyr • Continuous computation on an unbounded stream • Each
record processed as it arrives • Very low latency
@pyr • Conceptually harder • Where do we store intermediate
results? • How does data flow between computation steps?
@pyr Deciding factors
@pyr Our shopping list • Operational simplicity • Integration through
our whole stack • Room to grow
@pyr Operational simplicity • Experience matters • Spark and Storm
are intimidating • Hbase & Hive discarded
@pyr Integration • HDFS & Kafka require simple integration •
Spark goes hand in hand with Cassandra
@pyr Room to grow • A ton of logs •
A ton of metrics
@pyr Small confession • Previously knew Kafka
@pyr
None
@pyr • Publish & Subscribe • Processing • Store
@pyr Publish & Subscribe • Records are produced on topics
• Topics have a predefined number of partitions • Records have a key which determines their partition
@pyr • Consumers get assigned a set of partitions •
Consumers store their last consumed offset • Brokers own partitions, handle replication
None
@pyr • Stable consumer topology • Memory disaggregation • Can
rely on in-memory storage • Age expiry and log compaction
@pyr
@pyr Billing at Exoscale
None
None
None
@pyr Problem solved?
@pyr • Process crashes • Undelivered message? • Avoiding overbilling
@pyr Reconciliation • Snapshot of full inventory • Converges stored
resource state if necessary • Handles failed deliveries as well
@pyr Avoiding overbilling • Reconciler acts as logical clock •
When supplying usage, attach a unique transaction ID • Reject multiple transaction attempts on a single ID
@pyr Avoiding overbilling • Reconciler acts as logical clock •
When supplying usage, attach a unique transaction ID • Reject multiple transaction attempts on a single ID
@pyr Parting words
@pyr Looking back • Things stay simple (roughly 600 LoC)
• Room to grow • Stable and resilient • DNS, Logs, Metrics, Event Sourcing
@pyr What about batch? • Streaming doesn’t work for everything
• Sometimes throughput matters more than latency • Building models in batch, applying with stream processing
@pyr Thanks! Questions?