Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
知っておくと便利な Bloom Filter / Bloom Filter
Search
Yoshiaki Yoshida
October 14, 2016
Technology
6.9k
2
Share
知っておくと便利な Bloom Filter / Bloom Filter
Blog /
https://kakakakakku.hatenablog.com/entry/2016/10/14/220055
Yoshiaki Yoshida
October 14, 2016
More Decks by Yoshiaki Yoshida
See All by Yoshiaki Yoshida
技術ブロガーを育てる!ブログメンタリングで何を教えているのか / Passion for Blog Mentoring
kakakakakku
8
38k
プログラミング初心者に教えるときは「身近な比喩」が重要なのだ! / Metaphor is Important for Beginner Programmer
kakakakakku
2
5.8k
プロジェクトの成功を支える ZenHub と モブプログラミング / ZenHub and Mob Programming
kakakakakku
1
5.9k
楽しく!アウトプットを習慣化しよう / Let's Enjoy Output
kakakakakku
3
7k
さぁ!今すぐプロジェクトリーダーに立候補しよう / Be a Project Leader
kakakakakku
3
10k
プロジェクトをリードする技術 (Kyash 社 再演) / Project Leading is Skill for Kyash
kakakakakku
4
2.3k
プロジェクトをリードする技術 / Project Leading is Skill
kakakakakku
45
52k
Mackerel で ECS をどこまでモニタリングできるのか / Monitoring ECS with Mackerel
kakakakakku
0
14k
[2018/01/30] Redash 初心者向けハンズオン / Redash Meetup #0.1
kakakakakku
0
2.5k
Other Decks in Technology
See All in Technology
Azure Static Web Apps の自動ビルドがタイムアウトしやすくなった状況に対応した件/global-azure2026
thara0402
0
320
MLOps導入のための組織作りの第一歩
akasan
0
210
AI時代における技術的負債への取り組み
codenote
0
280
Zero-Downtime Migration: Moving a Massive, Historic iOS App from CocoaPods to SPM and Tuist without Stopping Feature Delivery
kagemiku
0
240
こんなアーキテクチャ図はいやだ / Anti-pattern in AWS Architecture Diagrams
naospon
1
380
研究開発部メンバーの働き⽅ / Sansan R&D Profile
sansan33
PRO
4
23k
インターネットの技術 / Internet technology
ks91
PRO
0
120
終盤で崩壊させないAI駆動開発
j5ik2o
2
2.2k
猫でもわかるKiro CLI(CDKコーディング編)
kentapapa
1
110
ある製造業の会社全体のAI化に1エンジニアが挑んだ話
kitami
2
990
🀄️ on swiftc
giginet
PRO
0
380
Bill One 開発エンジニア 紹介資料
sansan33
PRO
5
18k
Featured
See All Featured
The Curious Case for Waylosing
cassininazir
0
300
Tips & Tricks on How to Get Your First Job In Tech
honzajavorek
1
490
Visualizing Your Data: Incorporating Mongo into Loggly Infrastructure
mongodb
49
9.9k
Design of three-dimensional binary manipulators for pick-and-place task avoiding obstacles (IECON2024)
konakalab
0
400
Testing 201, or: Great Expectations
jmmastey
46
8.1k
CoffeeScript is Beautiful & I Never Want to Write Plain JavaScript Again
sstephenson
162
16k
Exploring the relationship between traditional SERPs and Gen AI search
raygrieselhuber
PRO
2
3.8k
The Mindset for Success: Future Career Progression
greggifford
PRO
0
310
Optimising Largest Contentful Paint
csswizardry
37
3.6k
It's Worth the Effort
3n
188
29k
Data-driven link building: lessons from a $708K investment (BrightonSEO talk)
szymonslowik
1
1k
Product Roadmaps are Hard
iamctodd
PRO
55
12k
Transcript
͓ͬͯ͘ͱศརͳ Bloom Filter 2016-10-14 ࣾษڧձ @kakakakakku
Bloom Filter • 1970ʹߟҊ͞Εͨ • ߟҊऀ Burton Howard Bloom ࢯ
• ۭؒޮͷྑ͍֬తσʔλߏ • ཁૉ͕ू߹ͷதʹؚ·ΕΔ͔Λఆ͢ΔͨΊʹ͏ • σʔλྔʹґଘͤͣ O(k) ͷܭࢉྔͰߴʹఆͰ͖Δ
ू߹ ؚ·ΕΔ? ؚ·Εͳ͍? ؚ·ΕΔ? ؚ·Εͳ͍?
׆༻ྫΛΕ Bloom Filter Λ ͬͱۙʹײ͡ΒΕΔͣʂ
• Cassandra • Key Λݕࡧ͢Δͱ͖ͷ I/O Λݮ͢ΔͨΊ SSTable ʹ Bloom
Filter Λॻ͖ࠐΜͰ͍Δ • HBase • HFile ʹಛఆͷσʔλؚ͕·Ε͍ͯͳ͍͜ͱΛ ݕࡧ͢ΔͨΊʹ Bloom Filter Λ׆༻͍ͯ͠Δ Bloom Filter ׆༻ྫ 1
Bloom Filter ׆༻ྫ 2 • H2O • ແବͳαʔόϓογϡΛૹ৴͠ͳ͍ͨΊʹ ϒϥβΩϟογϡใͷ Bloom
Filter Λ ѹॖͯ͠ HTTP Ͱฦ͍ͯ͠Δ CASPER (Cache-Aware Server PushER) • Bitcoin • τϥϯβΫγϣϯͷݕࡧʹ׆༻͍ͯ͠Δ? ʢৄ֬͘͠ೝͰ͖ͯͳ͍ʣ
Bloom Filter ׆༻ྫ 3 • pixiv • ࡞ʹ͍ͨλάͱ pixiv ඦՊࣄయͷλάू߹ͷ
ଘࡏ֬ೝʹ׆༻͍ͯ͠Δ? http://inside.pixiv.net/entry/2014/07/22/132103
ʘ Bloom Filter ͷڍಈ ʗ
0 0 1 0 2 0 3 0 4 0
5 0 6 0 7 0 8 0 9 0 10 0 11 0 12 0 13 0 14 0 15 0 • m bit ͷྻΛ༻ҙ͢Δ • ࠓճ ྻۭؒ = m = 16 ͱ͢Δ • શͯ 0 ͰॳظԽ͓ͯ͘͠
0 0 1 0 2 0 3 0 4 0
5 0 6 0 7 0 8 0 9 0 10 0 11 0 12 0 13 0 14 0 15 0 • ҙͷϋογϡؔΛ༻ҙ͢Δ • ࠓճ k = 2 ݸͷؔΛ͏ • h1(key) = (key * 1) mod m • h2(key) = (key * 2) mod m
0 1 1 0 2 0 3 0 4 0
5 0 6 0 7 0 8 1 9 0 10 0 11 0 12 0 13 0 14 0 15 0 • key = 1000 ΛՃ͢Δ • h1(1000) = 8 • h2(1000) = 0 [ 1000 ]
0 1 1 0 2 1 3 0 4 0
5 0 6 0 7 0 8 1 9 1 10 0 11 0 12 0 13 0 14 0 15 0 • key = 1001 ΛՃ͢Δ • h1(1001) = 9 • h2(1001) = 2 [ 1000, 1001 ]
0 1 1 0 2 1 3 0 4 0
5 0 6 0 7 0 8 1 9 1 10 0 11 0 12 1 13 0 14 0 15 0 • key = 1004 ΛՃ͢Δ • h1(1004) = 12 • h2(1004) = 8 • h1(1000) = 8 ͱॏෳ͍ͯ͠Δ • ϑϥά 1 ͷ··ʹ͢Δ [ 1000, 1001, 1004 ]
0 1 1 0 2 1 3 0 4 0
5 0 6 0 7 0 8 1 9 1 10 0 11 0 12 1 13 0 14 0 15 0 • Query : key = 1005 ଘࡏ͢Δ ? • h1(1005) = 13 • h2(1005) = 10 • h1(1005) = h2(1005) = 0 • ʮଘࡏ͠ͳ͍ʯͱஅݴͰ͖Δ [ 1000, 1001, 1004 ]
0 1 1 0 2 1 3 0 4 0
5 0 6 0 7 0 8 1 9 1 10 0 11 0 12 1 13 0 14 0 15 0 • Query : key = 1000 ଘࡏ͢Δ ? • h1(1000) = 8 • h2(1000) = 0 • h1(1000) = h2(1000) = 1 • ʮଘࡏ͢Δʯ͔͠Εͳ͍ • 1000 ࣮ࡍʹଘࡏ͢Δ [ 1000, 1001, 1004 ]
0 1 1 0 2 1 3 0 4 0
5 0 6 0 7 0 8 1 9 1 10 0 11 0 12 1 13 0 14 0 15 0 • Query : key = 1020 ଘࡏ͢Δ ? • h1(1020) = 12 • h2(1020) = 8 • h1(1000) = h2(1000) = 1 • ʮଘࡏ͢Δʯ͔͠Εͳ͍ • 1020 ࣮ࡍʹଘࡏ͠ͳ͍ [ 1000, 1001, 1004 ]
(ƅшƅ) Űō? ޡఆͯ͠Δ͚Ͳ…?
False Positive ِཅੑ False Negative ِӄੑ ʮଘࡏ͠ͳ͍ʯͱ͖ʹʮଘࡏ͢Δʯͱఆͯ͠͠·͏͜ͱ ʮଘࡏ͢Δʯͱ͖ʹʮଘࡏ͠ͳ͍ʯͱఆͯ͠͠·͏͜ͱ
False Positive ِཅੑ False Negative ِӄੑ ʮଘࡏ͠ͳ͍ʯͱ͖ʹʮଘࡏ͢Δʯͱఆͯ͠͠·͏͜ͱ ʮଘࡏ͢Δʯͱ͖ʹʮଘࡏ͠ͳ͍ʯͱఆͯ͠͠·͏͜ͱ ↑ Bloom
Filter ʹ False Positive ͷՄೳੑ͕͋Δ
False Positive ͷՄೳੑ • O(k) ͰߴʹఆͰ͖Δঈͱͯ͠ False Positive ͷՄೳੑ͕͋Δ •
Αͬͯ key = 1020 ͷΑ͏ʹ ʮଘࡏ͢Δʯͱޡݕͯ͠͠·͏߹͕͋Δ • ͨͩ͠ False Negative 100% ͋Γಘͳ͍
ϝϦοτ • ܭࢉྔ O(k) • ઢܗ୳ࡧͩͱ O(N) • ೋ୳ࡧͩͱ O(log
N) • ϋογϡςʔϒϧͳΒ O(1) ͩͬ͠ͱߴ? • k = 1 ͳΒ Bloom Filter O(1) ʹͳΔ • σʔλΛอ࣋͢Δඞཁ͕ͳۭؒ͘ޮ͕ྑ͍
ʘ Bloom Filter ཁૉ͕আͰ͖ͳ͍ ʗ
0 0 1 0 2 1 3 0 4 0
5 0 6 0 7 0 8 0 9 1 10 0 11 0 12 1 13 0 14 0 15 0 • 1000 Λআ͢Δͱ • h1(1005) = 8 • h2(1005) = 0 • 1004 আ͞Εͯ͠·͏ !!! • h1(1005) = 12 • h2(1005) = 8 [ 1000, 1001, 1004 ]
ʘ ཁૉΛআ͢ΔͳΒ Counting Filter ʗ Bloom Filter Λ֦ுͨ͠ΞϧΰϦζϜ
0 1 1 0 2 0 3 0 4 0
5 0 6 0 7 0 8 1 9 0 10 0 11 0 12 0 13 0 14 0 15 0 • key = 1000 ΛՃ͢Δ • h1(1000) = 8 • h2(1000) = 0 [ 1000 ]
0 1 1 0 2 1 3 0 4 0
5 0 6 0 7 0 8 1 9 1 10 0 11 0 12 0 13 0 14 0 15 0 • key = 1001 ΛՃ͢Δ • h1(1001) = 9 • h2(1001) = 2 [ 1000, 1001 ]
0 1 1 0 2 1 3 0 4 0
5 0 6 0 7 0 8 2 9 1 10 0 11 0 12 1 13 0 14 0 15 0 • key = 1004 ΛՃ͢Δ • h1(1004) = 12 • h2(1004) = 8 • ॏෳͨ͠ΒΠϯΫϦϝϯτ͢Δ [ 1000, 1001, 1004 ] ϏοτͰͳ͘ ΧϯλʔͰදݱ͢Δ͕ Bloom Filter ͱҟͳΔ
0 0 1 0 2 1 3 0 4 0
5 0 6 0 7 0 8 1 9 1 10 0 11 0 12 1 13 0 14 0 15 0 • key = 1000 Λআ͢Δ • h1(1004) = 8 • h2(1004) = 0 • σΫϦϝϯτ͢Δ [ 1000, 1001, 1004 ]
ʘ False Positive ֬ ʗ
Bloom Filter ެࣜͰࢉग़ m : ྻۭؒ (bit) n : ొཁૉ
ৄ͘͠ Wikipedia ʹʂ https://ja.wikipedia.org/wiki/ϒϧʔϜϑΟϧλ False Positive Λ࠷খʹ͢Δ ࠷దͳϋογϡؔͷۙࣅ ࠷దͳ k Λͬͨ߹ͷ False Positive ֬
ʘ ৺ແ༻ ʗ ࠷దͳ k Λ͑ False Positive ΛݶΓͳ͘͘Ͱ͖Δ
ʘ ·ͱΊ ʗ False Positive ͷՄೳੑ͋Δ͠ ཁૉͷআͰ͖ͳ͍͚Ͳ τϨʔυΦϑΛ࠷େݶ׆༻ͯ͠ ߴ &
ۭؒޮͷྑ͍ॲཧ͕Ͱ͖Δʂ
ʘ ͓ͬͯ͘ͱศརͳ Bloom Filter ʗ