Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Probablistic Data Structures
Search
Sergey Arkhipov
November 11, 2017
Programming
0
240
Probablistic Data Structures
My talk on rannts #18 (11.11.2017)
Sergey Arkhipov
November 11, 2017
Tweet
Share
More Decks by Sergey Arkhipov
See All by Sergey Arkhipov
Fingerprinting
9seconds
0
160
Concurrency Models
9seconds
0
210
Own Mustache
9seconds
0
320
Daemonize
9seconds
0
320
Stuff That Works
9seconds
0
330
Evidence
9seconds
0
90
Redneck Monads
9seconds
1
96
Latency
9seconds
0
120
Oh Blindfold Russia!
9seconds
0
290
Other Decks in Programming
See All in Programming
さようなら Date。 ようこそTemporal! 3年間先行利用して得られた知見の共有
8beeeaaat
3
1.4k
Zendeskのチケットを Amazon Bedrockで 解析した
ryokosuge
3
290
Deep Dive into Kotlin Flow
jmatsu
1
310
ソフトウェアテスト徹底指南書の紹介
goyoki
1
150
TDD 実践ミニトーク
contour_gara
1
290
意外と簡単!?フロントエンドでパスキー認証を実現する WebAuthn
teamlab
PRO
2
730
奥深くて厄介な「改行」と仲良くなる20分
oguemon
1
520
Navigation 2 を 3 に移行する(予定)ためにやったこと
yokomii
0
140
2025 年のコーディングエージェントの現在地とエンジニアの仕事の変化について
azukiazusa1
23
12k
そのAPI、誰のため? Androidライブラリ設計における利用者目線の実践テクニック
mkeeda
2
270
Vue・React マルチプロダクト開発を支える Vite
andpad
0
110
Azure SRE Agentで運用は楽になるのか?
kkamegawa
0
2.1k
Featured
See All Featured
Building Better People: How to give real-time feedback that sticks.
wjessup
368
19k
GitHub's CSS Performance
jonrohan
1032
460k
Build your cross-platform service in a week with App Engine
jlugia
231
18k
The Cost Of JavaScript in 2023
addyosmani
53
8.9k
Fireside Chat
paigeccino
39
3.6k
Typedesign – Prime Four
hannesfritz
42
2.8k
Building Applications with DynamoDB
mza
96
6.6k
Improving Core Web Vitals using Speculation Rules API
sergeychernyshev
18
1.1k
Responsive Adventures: Dirty Tricks From The Dark Corners of Front-End
smashingmag
252
21k
Balancing Empowerment & Direction
lara
3
620
The Power of CSS Pseudo Elements
geoffreycrofte
77
6k
Side Projects
sachag
455
43k
Transcript
Вероятностные структуры данных Сергей Архипов, 2017
None
None
curl http://site.com
curl -x myproxy.ru:3128 http://site.com
curl -x proxy.crawlera.com:8010 http://site.com
None
evt evt evt evt evt
{ "user": "sarkhipov", "hostname": "rannts.ru", "status": "ok", "status_description": "" }
{ "user": "sarkhipov", "hostname": "rannts.ru", "status": "ok", "status_description": "" }
None
collector collector collector
{ "user": "sarkhipov", "hostname": "rannts.ru", "status": "ok", "status_description": "" }
{ "user": "sarkhipov", "hostname": "rannts.ru", "status": "ok", "status_description": "" } { "user": "sarkhipov", "hostname": "rannts.ru", "status": "ok", "status_description": "" } { "user": "sarkhipov", "hostname": "rannts.ru", "ok": 234, "banned": 12, "errors": 3, }
Consumer 1 { "user": "sarkhipov", "hostname": "rannts.ru", "ok": 234, "banned":
12, "errors": 3, } Consumer 2 { "user": "sarkhipov", "hostname": "rannts.ru", "ok": 250, "banned": 3, "errors": 0, } Consumer 3 { "user": "sarkhipov", "hostname": "rannts.ru", "ok": 0, "banned": 124, "errors": 84, }
INSERT INTO stats ( date, user, hostname, ok, ban, error
) VALUES ( :date, :user, :hostname, :ok, :ban, :error ) ON DUPLICATE KEY UPDATE ok = ok + VALUES(ok), ban = ban + VALUES(ban), error = error + VALUES(error);
{ "user": "sarkhipov", "hostname": "rannts.ru", "status": "ok", "status_description": "", "response_time":
2861, }
(20 + 10) + 11 = (20 + 11) +
10
F(x)=P{σ<x} { P(x⩽x α )⩾α P(x⩾x α )⩾1−α
Ω(N 1 p )
collector collector collector pworker pworker pworker
None
None
var memCount = 75604275; var memPerSec = 1.38176367782; function updateCount()
{ next = -(1000 / memPerSec) * Math.log(Math.random()); memCountString = ''+memCount; len = memCountString.length; memCountString = memCountString.substr(0, len - 6) + ’ < span style = ”font - size: 8 px” > < /span>’+memCountString.substr(len-6,3)+‘ < span style = ”font - size: 8 px” > < /span>’+memCountString.substr(len-3,3); ge(‘memCount’).innerHTML = memCountString; memCount = memCount + 1; setTimeout(updateCount, next); } addEvent(window, ‘load’, updateCount);
3500 3671 3400 3502 3463 3371 3607 6012 6168 6211
6017
3500 3507 3671 3667 3400 3410 3502 3502 3463 3466
3371 3330 3607 3599 6012 6009 6168 6152 6211 6215 6017 6016
Count-Min Sketch 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Count-Min Sketch 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Count-Min Sketch 0 0 1 0 0 0 0 0
1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 1
Count-Min Sketch 32 11 1 18 200 126 184 78
1 0 91 59 30 24 8 82 76 34 48 72 11 200 129 136 14
Count-Min Sketch 32 11 1 18 200 126 184 78
1 0 91 59 30 24 8 82 76 34 48 72 11 200 129 136 14
MinHash J (A , B)= |A∩B| |A∪B| k=[ 1 ε2
]
HyperLogLog 010010000110010101101100011011000110111100100001 b 26 = 64 1001 b = 9
100001 b = 33 σ= 1.04 √2k E= α(k)4k ∑ j 2−M j
t-digest
t-digest
t-digest
t-digest X=x 1 , x 2 ,…, x n X={s
1 ,s 2 ,…,s m } s i ={x l e f t(i) ,…, x r i ght(i) }
t-digest k(q,δ)≝δ (sin−1 (2q−1) π + 1 2 ) K(i)≝k(
r i ght(i) n ,δ)−k( le f t(i)−1 n ,δ) K (i)⩽1 K(i)+K (i+1)>1
t-digest
t-digest
collector collector collector pworker pworker pworker
None
Q/A