Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Probablistic Data Structures
Search
Sponsored
·
SiteGround - Reliable hosting with speed, security, and support you can count on.
→
Sergey Arkhipov
November 11, 2017
Programming
0
260
Probablistic Data Structures
My talk on rannts #18 (11.11.2017)
Sergey Arkhipov
November 11, 2017
Tweet
Share
More Decks by Sergey Arkhipov
See All by Sergey Arkhipov
Fingerprinting
9seconds
0
170
Concurrency Models
9seconds
0
240
Own Mustache
9seconds
0
350
Daemonize
9seconds
0
340
Stuff That Works
9seconds
0
370
Evidence
9seconds
0
100
Redneck Monads
9seconds
1
120
Latency
9seconds
0
140
Oh Blindfold Russia!
9seconds
0
320
Other Decks in Programming
See All in Programming
API Platformを活用したPHPによる本格的なWeb API開発 / api-platform-book-intro
ttskch
1
110
2026年は Rust 置き換えが流行る! / 20260220-niigata-5min-tech
girigiribauer
0
210
Agent Skills Workshop - AIへの頼み方を仕組み化する
gotalab555
13
7.5k
The Past, Present, and Future of Enterprise Java
ivargrimstad
0
350
あなたはユーザーではない #PdENight
kajitack
4
290
Ruby x Terminal
a_matsuda
5
530
AIエージェントのキホンから学ぶ「エージェンティックコーディング」実践入門
masahiro_nishimi
7
1.2k
CSC307 Lecture 11
javiergs
PRO
0
580
Geminiの機能を調べ尽くしてみた
naruyoshimi
0
190
AIによる開発の民主化を支える コンテキスト管理のこれまでとこれから
mulyu
3
2.2k
AI活用のコスパを最大化する方法
ochtum
0
120
AI巻き込み型コードレビューのススメ
nealle
2
2.5k
Featured
See All Featured
GitHub's CSS Performance
jonrohan
1032
470k
More Than Pixels: Becoming A User Experience Designer
marktimemedia
3
340
エンジニアに許された特別な時間の終わり
watany
106
240k
The Spectacular Lies of Maps
axbom
PRO
1
570
Conquering PDFs: document understanding beyond plain text
inesmontani
PRO
4
2.4k
Between Models and Reality
mayunak
1
210
Building Applications with DynamoDB
mza
96
6.9k
GraphQLの誤解/rethinking-graphql
sonatard
75
11k
End of SEO as We Know It (SMX Advanced Version)
ipullrank
3
4k
Designing Dashboards & Data Visualisations in Web Apps
destraynor
231
54k
Practical Tips for Bootstrapping Information Extraction Pipelines
honnibal
25
1.8k
Future Trends and Review - Lecture 12 - Web Technologies (1019888BNR)
signer
PRO
0
3.2k
Transcript
Вероятностные структуры данных Сергей Архипов, 2017
None
None
curl http://site.com
curl -x myproxy.ru:3128 http://site.com
curl -x proxy.crawlera.com:8010 http://site.com
None
evt evt evt evt evt
{ "user": "sarkhipov", "hostname": "rannts.ru", "status": "ok", "status_description": "" }
{ "user": "sarkhipov", "hostname": "rannts.ru", "status": "ok", "status_description": "" }
None
collector collector collector
{ "user": "sarkhipov", "hostname": "rannts.ru", "status": "ok", "status_description": "" }
{ "user": "sarkhipov", "hostname": "rannts.ru", "status": "ok", "status_description": "" } { "user": "sarkhipov", "hostname": "rannts.ru", "status": "ok", "status_description": "" } { "user": "sarkhipov", "hostname": "rannts.ru", "ok": 234, "banned": 12, "errors": 3, }
Consumer 1 { "user": "sarkhipov", "hostname": "rannts.ru", "ok": 234, "banned":
12, "errors": 3, } Consumer 2 { "user": "sarkhipov", "hostname": "rannts.ru", "ok": 250, "banned": 3, "errors": 0, } Consumer 3 { "user": "sarkhipov", "hostname": "rannts.ru", "ok": 0, "banned": 124, "errors": 84, }
INSERT INTO stats ( date, user, hostname, ok, ban, error
) VALUES ( :date, :user, :hostname, :ok, :ban, :error ) ON DUPLICATE KEY UPDATE ok = ok + VALUES(ok), ban = ban + VALUES(ban), error = error + VALUES(error);
{ "user": "sarkhipov", "hostname": "rannts.ru", "status": "ok", "status_description": "", "response_time":
2861, }
(20 + 10) + 11 = (20 + 11) +
10
F(x)=P{σ<x} { P(x⩽x α )⩾α P(x⩾x α )⩾1−α
Ω(N 1 p )
collector collector collector pworker pworker pworker
None
None
var memCount = 75604275; var memPerSec = 1.38176367782; function updateCount()
{ next = -(1000 / memPerSec) * Math.log(Math.random()); memCountString = ''+memCount; len = memCountString.length; memCountString = memCountString.substr(0, len - 6) + ’ < span style = ”font - size: 8 px” > < /span>’+memCountString.substr(len-6,3)+‘ < span style = ”font - size: 8 px” > < /span>’+memCountString.substr(len-3,3); ge(‘memCount’).innerHTML = memCountString; memCount = memCount + 1; setTimeout(updateCount, next); } addEvent(window, ‘load’, updateCount);
3500 3671 3400 3502 3463 3371 3607 6012 6168 6211
6017
3500 3507 3671 3667 3400 3410 3502 3502 3463 3466
3371 3330 3607 3599 6012 6009 6168 6152 6211 6215 6017 6016
Count-Min Sketch 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Count-Min Sketch 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Count-Min Sketch 0 0 1 0 0 0 0 0
1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 1
Count-Min Sketch 32 11 1 18 200 126 184 78
1 0 91 59 30 24 8 82 76 34 48 72 11 200 129 136 14
Count-Min Sketch 32 11 1 18 200 126 184 78
1 0 91 59 30 24 8 82 76 34 48 72 11 200 129 136 14
MinHash J (A , B)= |A∩B| |A∪B| k=[ 1 ε2
]
HyperLogLog 010010000110010101101100011011000110111100100001 b 26 = 64 1001 b = 9
100001 b = 33 σ= 1.04 √2k E= α(k)4k ∑ j 2−M j
t-digest
t-digest
t-digest
t-digest X=x 1 , x 2 ,…, x n X={s
1 ,s 2 ,…,s m } s i ={x l e f t(i) ,…, x r i ght(i) }
t-digest k(q,δ)≝δ (sin−1 (2q−1) π + 1 2 ) K(i)≝k(
r i ght(i) n ,δ)−k( le f t(i)−1 n ,δ) K (i)⩽1 K(i)+K (i+1)>1
t-digest
t-digest
collector collector collector pworker pworker pworker
None
Q/A