Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Probablistic Data Structures
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
Sergey Arkhipov
November 11, 2017
Programming
270
0
Share
Probablistic Data Structures
My talk on rannts #18 (11.11.2017)
Sergey Arkhipov
November 11, 2017
More Decks by Sergey Arkhipov
See All by Sergey Arkhipov
Fingerprinting
9seconds
0
180
Concurrency Models
9seconds
0
280
Own Mustache
9seconds
0
360
Daemonize
9seconds
0
360
Stuff That Works
9seconds
0
380
Evidence
9seconds
0
110
Redneck Monads
9seconds
1
130
Latency
9seconds
0
150
Oh Blindfold Russia!
9seconds
0
320
Other Decks in Programming
See All in Programming
Augmenting AI with the Power of Jakarta EE
ivargrimstad
0
320
Agentic UI beyond Chats Architecture Patterns & Open Standards @ngMunich 05/2026
manfredsteyer
PRO
0
170
Spec-Driven Development with AI-Agents: From High-Level Requirements to Working Software
antonarhipov
2
380
生成AI時代にこそ効くGo | Why Go Works in the Age of Generative AI
mom0tomo
8
2.9k
Technical Debt: Understanding it Rightly, Engaging it Rightly #LaravelLiveJP
shogogg
0
160
OCRを使ってゲームのアイテムをデータ化する
kishikawakatsumi
0
120
Swiftのレキシカルスコープ管理
kntkymt
0
200
デフォルト運用のCodeRabbit、1年で何が変わったか / How CodeRabbit Changed Our Code Review in 1 Year
bake0937
1
110
TypeScriptだけでAIエージェントを作る フロント・エージェント・インフラのフルスタック実践
har1101
6
1.2k
要はバランスからの卒業 #yumemi_grow
kajitack
0
200
ビジネスモデルから紐解く、AI+型駆動開発
hirokiomote
2
3.7k
今さら聞けないCancellationToken
htkym
0
200
Featured
See All Featured
Utilizing Notion as your number one productivity tool
mfonobong
4
310
Into the Great Unknown - MozCon
thekraken
41
2.5k
Sam Torres - BigQuery for SEOs
techseoconnect
PRO
0
280
Abbi's Birthday
coloredviolet
2
7.8k
Lightning talk: Run Django tests with GitHub Actions
sabderemane
0
190
Agile Actions for Facilitating Distributed Teams - ADO2019
mkilby
0
200
Raft: Consensus for Rubyists
vanstee
141
7.5k
Side Projects
sachag
455
43k
Why Our Code Smells
bkeepers
PRO
340
58k
Measuring & Analyzing Core Web Vitals
bluesmoon
9
840
Making Projects Easy
brettharned
120
6.6k
Avoiding the “Bad Training, Faster” Trap in the Age of AI
tmiket
0
160
Transcript
Вероятностные структуры данных Сергей Архипов, 2017
None
None
curl http://site.com
curl -x myproxy.ru:3128 http://site.com
curl -x proxy.crawlera.com:8010 http://site.com
None
evt evt evt evt evt
{ "user": "sarkhipov", "hostname": "rannts.ru", "status": "ok", "status_description": "" }
{ "user": "sarkhipov", "hostname": "rannts.ru", "status": "ok", "status_description": "" }
None
collector collector collector
{ "user": "sarkhipov", "hostname": "rannts.ru", "status": "ok", "status_description": "" }
{ "user": "sarkhipov", "hostname": "rannts.ru", "status": "ok", "status_description": "" } { "user": "sarkhipov", "hostname": "rannts.ru", "status": "ok", "status_description": "" } { "user": "sarkhipov", "hostname": "rannts.ru", "ok": 234, "banned": 12, "errors": 3, }
Consumer 1 { "user": "sarkhipov", "hostname": "rannts.ru", "ok": 234, "banned":
12, "errors": 3, } Consumer 2 { "user": "sarkhipov", "hostname": "rannts.ru", "ok": 250, "banned": 3, "errors": 0, } Consumer 3 { "user": "sarkhipov", "hostname": "rannts.ru", "ok": 0, "banned": 124, "errors": 84, }
INSERT INTO stats ( date, user, hostname, ok, ban, error
) VALUES ( :date, :user, :hostname, :ok, :ban, :error ) ON DUPLICATE KEY UPDATE ok = ok + VALUES(ok), ban = ban + VALUES(ban), error = error + VALUES(error);
{ "user": "sarkhipov", "hostname": "rannts.ru", "status": "ok", "status_description": "", "response_time":
2861, }
(20 + 10) + 11 = (20 + 11) +
10
F(x)=P{σ<x} { P(x⩽x α )⩾α P(x⩾x α )⩾1−α
Ω(N 1 p )
collector collector collector pworker pworker pworker
None
None
var memCount = 75604275; var memPerSec = 1.38176367782; function updateCount()
{ next = -(1000 / memPerSec) * Math.log(Math.random()); memCountString = ''+memCount; len = memCountString.length; memCountString = memCountString.substr(0, len - 6) + ’ < span style = ”font - size: 8 px” > < /span>’+memCountString.substr(len-6,3)+‘ < span style = ”font - size: 8 px” > < /span>’+memCountString.substr(len-3,3); ge(‘memCount’).innerHTML = memCountString; memCount = memCount + 1; setTimeout(updateCount, next); } addEvent(window, ‘load’, updateCount);
3500 3671 3400 3502 3463 3371 3607 6012 6168 6211
6017
3500 3507 3671 3667 3400 3410 3502 3502 3463 3466
3371 3330 3607 3599 6012 6009 6168 6152 6211 6215 6017 6016
Count-Min Sketch 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Count-Min Sketch 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Count-Min Sketch 0 0 1 0 0 0 0 0
1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 1
Count-Min Sketch 32 11 1 18 200 126 184 78
1 0 91 59 30 24 8 82 76 34 48 72 11 200 129 136 14
Count-Min Sketch 32 11 1 18 200 126 184 78
1 0 91 59 30 24 8 82 76 34 48 72 11 200 129 136 14
MinHash J (A , B)= |A∩B| |A∪B| k=[ 1 ε2
]
HyperLogLog 010010000110010101101100011011000110111100100001 b 26 = 64 1001 b = 9
100001 b = 33 σ= 1.04 √2k E= α(k)4k ∑ j 2−M j
t-digest
t-digest
t-digest
t-digest X=x 1 , x 2 ,…, x n X={s
1 ,s 2 ,…,s m } s i ={x l e f t(i) ,…, x r i ght(i) }
t-digest k(q,δ)≝δ (sin−1 (2q−1) π + 1 2 ) K(i)≝k(
r i ght(i) n ,δ)−k( le f t(i)−1 n ,δ) K (i)⩽1 K(i)+K (i+1)>1
t-digest
t-digest
collector collector collector pworker pworker pworker
None
Q/A