Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Streaming Algorithms
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
majek04
March 15, 2016
Technology
600
1
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
Streaming Algorithms
majek04
March 15, 2016
More Decks by majek04
See All by majek04
Keeloq_AD_2025__1_.pdf
majek04
0
44
BPF programmable socket lookup
majek04
0
680
Linux at Cloudflare
majek04
3
8.9k
DDoS Landscape
majek04
0
440
Inside Cloudbleed
majek04
3
3k
Golang sucks
majek04
21
53k
Gatelogic - Somewhat functional reactive framework in Python
majek04
1
5.3k
How Cloudflare deals with largest DDoS attacks?
majek04
2
3.6k
Why we chose Service Worker API
majek04
0
2.9k
Other Decks in Technology
See All in Technology
クレデンシャル流出 ― 攻撃 3 時間 vs 復旧 10 時間。この非対称性にどう備えるか
kazzpapa3
3
570
AIペネトレーションテスト・ セキュリティ検証「AgenticSec」紹介資料
laysakura
2
7.6k
SteampipeとExcel Power QueryでAWS構成定義書の作成を自動化する
jhashimoto
0
180
Flow 不死:AI 時代 DevOps 的不變本質
cheng_wei_chen
2
520
AIチャットの改善から見えた、良いAI体験とは / What Constitutes a Good AI Experience: Insights from Improving AI Chat
kubode
0
120
IaC コードを資産へ:AWS CDK 社内ライブラリと横断展開 / aws-summit-japan-2026
gotok365
10
1.6k
フルカイテン株式会社 エンジニア向け採用資料
fullkaiten
0
11k
いまさら聞けない「仕様駆動開発入門」 〜AI活用時代の開発プロセスを考える〜
findy_eventslides
2
210
時期が悪い!それでもRaspberry Piを買って遊んで活用するには / 20260627-osc26do-rpi-jikigawarui
akkiesoft
0
830
AIチャット検索改善の3週間
kworkdev
PRO
2
180
[AWS Summit Japan 2026]迷っているあなたへ_小さな一歩が、やがて自分を助けてくれる
sh_fk2
2
420
データレイクの「見えない問題」を可視化する
sansantech
PRO
1
200
Featured
See All Featured
Visual Storytelling: How to be a Superhuman Communicator
reverentgeek
2
560
Helping Users Find Their Own Way: Creating Modern Search Experiences
danielanewman
31
3.2k
What’s in a name? Adding method to the madness
productmarketing
PRO
24
4.1k
Agile that works and the tools we love
rasmusluckow
331
22k
Intergalactic Javascript Robots from Outer Space
tanoku
273
27k
The AI Revolution Will Not Be Monopolized: How open-source beats economies of scale, even for LLMs
inesmontani
PRO
3
3.5k
Introduction to Domain-Driven Design and Collaborative software design
baasie
1
860
The AI Search Optimization Roadmap by Aleyda Solis
aleyda
1
5.9k
KATA
mclloyd
PRO
35
15k
SEO Brein meetup: CTRL+C is not how to scale international SEO
lindahogenes
1
2.7k
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
333
23k
Site-Speed That Sticks
csswizardry
13
1.2k
Transcript
Automatic DoS mitigation (streaming algorithms) Marek Majkowski marek@cloudflare.com @majek04
2 Who we are
3 Large network
4 Content neutral
5 DoS is a problem DoS events per day
6 X example.com Defending from DoS is hard
• L3 - spoofed IP packets • source IP addresses
are fake • very large • this is what you hear in news • L7 - fully established TCP connections • IP reputation is effective 7 Two DoS types
8 L3 Volume per server Packets per second
9 Automatic attack handling Attack Detection Mitigation database Reactive Automation
sflow iptables
10 Automatic attack detection Attack Detection sflow
• infinite data stream on input • approximate 11 Streaming
algorithms Streaming algorithm Data stream Results
• sflow packets samples as input • detected attacks on
output 12 Attack detection is streaming! Streaming algorithms Packet samples Attacks
• EWMA - Exponentially weighted moving average • Counting rates
of packets • Space saving • Known as Top-N or Heavy Hitters • Simplified hierarchical heavy hitters • Hyper log log • Cardinality estimation - Counting unique things 13 Streaming algorithms
14 The problem: PPS ! Mpps Descr! 3.878 --ip=141.245.59.191/32! 2.878
--ip=141.245.59.192/32! 1.878 --ip=141.245.59.193/32! 1.878 --ip=141.245.59.194/32! 1.878 --ip=141.245.59.195/32! 1.878 --ip=141.245.59.196/32! 1.878 --ip=141.245.59.197/32! 1.878 --ip=141.245.59.198/32! 1.878 --ip=141.245.59.199/32! ...!
15 Naive approach pps IP 12.2M 1.2.3.4 2.4M 42.1.2.4 0.01M
2.4.3.1 0.01M 192.168.1.1
16 There is no such thing as pps
17 Naive: Moving average 1.0s 1.1s 1.3s 1.8s 1.99s 2.1s
2.4s 2.41s t=2.50s Precisely 5 samples
18 Not-smoothed values 1.0s 1.1s 1.3s 1.8s 1.99s 2.1s 2.4s
2.41s 100 3 50 5 2 5 10 raw pps=
19 Not-smoothed values
20 Linux load average - charge
21 Linux load average - discharge
22 Better: EWMA old load difference dampening factor measurement frequency
half-life time
23
24
• Smoothed average • The same maths as Linux "load
average" • Charges slow (half-life) • Discharges quickly • Can be also used to count rates of packets 25 EWMA - summary
26 The problem: PPS ! Mpps Descr! 3.878 --ip=141.245.59.191/32! 2.878
--ip=141.245.59.192/32! 1.878 --ip=141.245.59.193/32! 1.878 --ip=141.245.59.194/32! 1.878 --ip=141.245.59.195/32! 1.878 --ip=141.245.59.196/32! 1.878 --ip=141.245.59.197/32! 1.878 --ip=141.245.59.198/32! 1.878 --ip=141.245.59.199/32! ...!
27 The problem: Memory pps IP 12.2M 1.2.3.4 2.4M 42.1.2.4
0.01M 2.4.3.1 0.01M 192.168.1.1 ...
• aka: heavy hitters • A fixed-memory data structure •
That can "count" top-N items • think: top url's, top customer IP's, etc • Count-Min sketch, Space Saving 28 Top-N problem
29 Space saving error count key
30 Space saving error count key 0 1 Alice Alice
31 Space saving error count key 0 2 Alice Alice
32 Space saving error count key 0 2 Alice 0
1 Ben Ben
33 Space saving error count key 0 2 Alice 0
1 Ben 0 1 Charlie Charlie
34 Space saving error count key 0 2 Alice 0
1 Ben 0 1 Charlie Eric?
35 Space saving error count key 0 2 Alice 0
1 Ben 0 1 Charlie Eric?
36 Space saving error count key 0 2 Alice 1
0 Eric 0 1 Charlie + Eric
37 Space saving error count key 0 2 Alice 1
1 Eric 0 1 Charlie Eric
38 Space saving error count key 0 2 Alice 1
1 Eric 0 1 Charlie 2 Counter? 1 .. 2 1
39
What about rates? 40 • It's hard • was: GetAll()
• now: GetAll(time.Time) • No longer O(1) • Instead O(log n)
41
• Top-N / Heavy-hitter algorithm • Fixed memory size •
Strong error guarantees 42 Summary - Space saving
43 Aggregating attacks ! Mpps Descr! 3.878 --ip=141.245.59.191/32! 2.878 --ip=141.245.59.192/32!
1.878 --ip=141.245.59.193/32! 1.878 --ip=141.245.59.194/32! 1.878 --ip=141.245.59.195/32! 1.878 --ip=141.245.59.196/32! 1.878 --ip=141.245.59.197/32! 1.878 --ip=141.245.59.198/32! 1.878 --ip=141.245.59.199/32! ...! ! Mpps Descr! 35.878 --ip=141.245.59.0/24! vs
44 Hierarchical Heavy Hitters
45 Simplified HHH
46 Multiple dimensions pps IP:port 12.2M 1.2.3.4:53 2.4M 42.1.2.4:80 0.01M
2.4.3.1:80 0.01M 192.168.1.1:443 pps IP 12.2M 1.2.3.4 2.4M 42.1.2.4 0.01M 2.4.3.1 0.01M 192.168.1.1 pps subnet 12.2M 1.2.3.0/24 2.4M 42.1.2.0/24 0.01M 2.4.3.0/24 0.01M 192.168.1.0/24
47 Multiple dimensions pps IP:port 12.2M 1.2.3.4:53 2.4M 42.1.2.4:80 0.01M
2.4.3.1:80 0.01M 192.168.1.1:443 pps IP 12.2M 1.2.3.4 2.4M 42.1.2.4 0.01M 2.4.3.1 0.01M 192.168.1.1 pps subnet 12.2M 1.2.3.0/24 2.4M 42.1.2.0/24 0.01M 2.4.3.0/24 0.01M 192.168.1.0/24 incoming sample: 42.1.2.4:80
48 Multiple dimensions pps IP:port 12.2M 1.2.3.4:53 2.4M 42.1.2.4:80 0.01M
2.4.3.1:80 0.01M 192.168.1.1:443 pps IP 12.2M 1.2.3.4 2.4M 42.1.2.4 0.01M 2.4.3.1 0.01M 192.168.1.1 pps subnet 12.2M 1.2.3.0/24 2.4M 42.1.2.0/24 0.01M 2.4.3.0/24 0.01M 192.168.1.0/24 reporting threshold: 1M
49 Attack report ! Mpps Descr! 12.2 --ip=1.2.3.4 --port=53! 2.4
--ip=42.1.2.4 --port=80! 12.2 --ip=1.2.3.4! 2.4 --ip=42.1.2.4! 12.2 --ip=1.2.3.0/24! 2.4 --ip=42.1.2.0/24!
50 Multiple dimensions pps IP:port 12.2M 1.2.3.4:53 2.4M 42.1.2.4:80 0.01M
2.4.3.1:80 0.01M 192.168.1.1:443 pps IP 0.1M 1.2.3.4 0M 42.1.2.4 0.01M 2.4.3.1 0.01M 192.168.1.1 pps subnet 0.1M 1.2.3.0/24 0M 42.1.2.0/24 0.01M 2.4.3.0/24 0.01M 192.168.1.0/24 incoming sample: 42.1.2.4:80
51 Attack report ! Mpps Descr! 12.2 --ip=1.2.3.4 --port=53! 2.4
--ip=42.1.2.4 --port=80!
52 Scales well
• Approximate • High error in pps • Works well
in practice • Scales well • Fast and simple to implement 53 Summary - SHHH
54 Spoofed Source IP? ! Mpps Description! 23.833 --target=173.245.59.2 --agent=WAW
--iface=659 Est= 57364! 23.067 --target=173.245.58.1 --agent=WAW --iface=659 Est= 56995! 7.139 --target=173.245.58.1 --agent=DUS --iface=893 Est= 11493! 6.366 --target=173.245.59.2 --agent=DUS --iface=893 Est= 11240! 2.590 --target=173.245.58.1 --agent=SIN --iface=657 Est=219987! 2.557 --target=173.245.59.2 --agent=SIN --iface=657 Est=220380! 1.045 --target=173.245.58.1 --agent=MAN --iface=756 Est= 207! 1.039 --target=173.245.59.2 --agent=MAN --iface=756 Est= 200!
55 Hyper log log "Alice" 22 unique items! HLL
56 Hyper log log OR 44 unique items ( )
= HLL#1 HLL#2
57
58 What about rates? HLL #1 HLL #2 HLL #3
HLL #4
59 Hard drives
• Attack detection is a streaming problem • Streaming algorithms
are awesome • Applicable to many more problems 60 Summary Thanks! marek@cloudflare.com