Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Probablistic Data Structures
Search
Sergey Arkhipov
November 11, 2017
Programming
0
250
Probablistic Data Structures
My talk on rannts #18 (11.11.2017)
Sergey Arkhipov
November 11, 2017
Tweet
Share
More Decks by Sergey Arkhipov
See All by Sergey Arkhipov
Fingerprinting
9seconds
0
160
Concurrency Models
9seconds
0
220
Own Mustache
9seconds
0
330
Daemonize
9seconds
0
330
Stuff That Works
9seconds
0
340
Evidence
9seconds
0
94
Redneck Monads
9seconds
1
100
Latency
9seconds
0
130
Oh Blindfold Russia!
9seconds
0
300
Other Decks in Programming
See All in Programming
ゼロダウンタイムでミドルウェアの バージョンアップを実現した手法と課題
wind111
0
210
Reactive Thinking with Signals and the new Resource API
manfredsteyer
PRO
0
110
Feature Flags Suck! - KubeCon Atlanta 2025
phodgson
0
150
GeistFabrik and AI-augmented software development
adewale
PRO
0
110
AIの弱点、やっぱりプログラミングは人間が(も)勉強しよう / YAPC AI and Programming
kishida
9
5.1k
Patterns of Patterns (and why we need them)
denyspoltorak
0
100
FlutterKaigi 2025 システム裏側
yumnumm
0
1.1k
仕様がそのままテストになる!Javaで始める振る舞い駆動開発
ohmori_yusuke
8
4.6k
Private APIの呼び出し方
kishikawakatsumi
3
890
詳細の決定を遅らせつつ実装を早くする
shimabox
1
1.3k
Claude Code on the Web を超える!? Codex Cloud の実践テク5選
sunagaku
0
580
AI駆動開発ライフサイクル(AI-DLC)のホワイトペーパーを解説
swxhariu5
0
1.2k
Featured
See All Featured
Performance Is Good for Brains [We Love Speed 2024]
tammyeverts
12
1.3k
RailsConf 2023
tenderlove
30
1.3k
Let's Do A Bunch of Simple Stuff to Make Websites Faster
chriscoyier
508
140k
Typedesign – Prime Four
hannesfritz
42
2.9k
Bootstrapping a Software Product
garrettdimon
PRO
307
110k
Reflections from 52 weeks, 52 projects
jeffersonlam
355
21k
Designing Dashboards & Data Visualisations in Web Apps
destraynor
231
54k
Side Projects
sachag
455
43k
The MySQL Ecosystem @ GitHub 2015
samlambert
251
13k
Optimising Largest Contentful Paint
csswizardry
37
3.5k
Exploring the Power of Turbo Streams & Action Cable | RailsConf2023
kevinliebholz
36
6.1k
Fight the Zombie Pattern Library - RWD Summit 2016
marcelosomers
234
17k
Transcript
Вероятностные структуры данных Сергей Архипов, 2017
None
None
curl http://site.com
curl -x myproxy.ru:3128 http://site.com
curl -x proxy.crawlera.com:8010 http://site.com
None
evt evt evt evt evt
{ "user": "sarkhipov", "hostname": "rannts.ru", "status": "ok", "status_description": "" }
{ "user": "sarkhipov", "hostname": "rannts.ru", "status": "ok", "status_description": "" }
None
collector collector collector
{ "user": "sarkhipov", "hostname": "rannts.ru", "status": "ok", "status_description": "" }
{ "user": "sarkhipov", "hostname": "rannts.ru", "status": "ok", "status_description": "" } { "user": "sarkhipov", "hostname": "rannts.ru", "status": "ok", "status_description": "" } { "user": "sarkhipov", "hostname": "rannts.ru", "ok": 234, "banned": 12, "errors": 3, }
Consumer 1 { "user": "sarkhipov", "hostname": "rannts.ru", "ok": 234, "banned":
12, "errors": 3, } Consumer 2 { "user": "sarkhipov", "hostname": "rannts.ru", "ok": 250, "banned": 3, "errors": 0, } Consumer 3 { "user": "sarkhipov", "hostname": "rannts.ru", "ok": 0, "banned": 124, "errors": 84, }
INSERT INTO stats ( date, user, hostname, ok, ban, error
) VALUES ( :date, :user, :hostname, :ok, :ban, :error ) ON DUPLICATE KEY UPDATE ok = ok + VALUES(ok), ban = ban + VALUES(ban), error = error + VALUES(error);
{ "user": "sarkhipov", "hostname": "rannts.ru", "status": "ok", "status_description": "", "response_time":
2861, }
(20 + 10) + 11 = (20 + 11) +
10
F(x)=P{σ<x} { P(x⩽x α )⩾α P(x⩾x α )⩾1−α
Ω(N 1 p )
collector collector collector pworker pworker pworker
None
None
var memCount = 75604275; var memPerSec = 1.38176367782; function updateCount()
{ next = -(1000 / memPerSec) * Math.log(Math.random()); memCountString = ''+memCount; len = memCountString.length; memCountString = memCountString.substr(0, len - 6) + ’ < span style = ”font - size: 8 px” > < /span>’+memCountString.substr(len-6,3)+‘ < span style = ”font - size: 8 px” > < /span>’+memCountString.substr(len-3,3); ge(‘memCount’).innerHTML = memCountString; memCount = memCount + 1; setTimeout(updateCount, next); } addEvent(window, ‘load’, updateCount);
3500 3671 3400 3502 3463 3371 3607 6012 6168 6211
6017
3500 3507 3671 3667 3400 3410 3502 3502 3463 3466
3371 3330 3607 3599 6012 6009 6168 6152 6211 6215 6017 6016
Count-Min Sketch 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Count-Min Sketch 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Count-Min Sketch 0 0 1 0 0 0 0 0
1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 1
Count-Min Sketch 32 11 1 18 200 126 184 78
1 0 91 59 30 24 8 82 76 34 48 72 11 200 129 136 14
Count-Min Sketch 32 11 1 18 200 126 184 78
1 0 91 59 30 24 8 82 76 34 48 72 11 200 129 136 14
MinHash J (A , B)= |A∩B| |A∪B| k=[ 1 ε2
]
HyperLogLog 010010000110010101101100011011000110111100100001 b 26 = 64 1001 b = 9
100001 b = 33 σ= 1.04 √2k E= α(k)4k ∑ j 2−M j
t-digest
t-digest
t-digest
t-digest X=x 1 , x 2 ,…, x n X={s
1 ,s 2 ,…,s m } s i ={x l e f t(i) ,…, x r i ght(i) }
t-digest k(q,δ)≝δ (sin−1 (2q−1) π + 1 2 ) K(i)≝k(
r i ght(i) n ,δ)−k( le f t(i)−1 n ,δ) K (i)⩽1 K(i)+K (i+1)>1
t-digest
t-digest
collector collector collector pworker pworker pworker
None
Q/A