Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
確率的データ構造を Java で扱いたい! #JJUG
Search
KOMIYA Atsushi
August 23, 2017
Programming
6
2.3k
確率的データ構造を Java で扱いたい! #JJUG
JJUG ナイト・セミナー 「ビール片手にLT&納涼会 2017」 の発表資料です。
https://jjug.doorkeeper.jp/events/63719
KOMIYA Atsushi
August 23, 2017
Tweet
Share
More Decks by KOMIYA Atsushi
See All by KOMIYA Atsushi
#JJUG Java における乱数生成器とのつき合い方
komiya_atsushi
5
5.3k
#JJUG Fork/Join フレームワークを効率的に正しく使いたい
komiya_atsushi
0
490
[#JSUG] SmartNews における container friendly な Spring Boot アプリケーション開発
komiya_atsushi
1
11k
Java のデータ圧縮ライブラリを極める #jjug_ccc #ccc_c7
komiya_atsushi
4
4.9k
#devsumi 自然言語処理・機械学習によるファクトチェック業務の支援
komiya_atsushi
1
4.5k
SmartNews Ads における機械学習の活用とその運用 #mlops
komiya_atsushi
3
19k
GBDT によるクリック率予測を高速化したい #オレシカナイト vol.4
komiya_atsushi
5
1.3k
Maven central repository の artifact をランキングする #渋谷java
komiya_atsushi
0
1.4k
High-performance Jackson #渋谷Java
komiya_atsushi
2
17k
Other Decks in Programming
See All in Programming
MCP調べてみました! / Exploring MCP
uhzz
2
2.3k
Lambda(Python)の リファクタリングが好きなんです
komakichi
3
200
Compose Hot Reload is here, stop re-launching your apps! (Android Makers 2025)
zsmb
1
500
Memory API : Patterns, Performance et Cas d'Utilisation
josepaumard
0
130
Road to RubyKaigi: Making Tinny Chiptunes with Ruby
makicamel
4
120
「影響が少ない」を自分の目でみてみる
o0h
PRO
2
1.1k
これだけは知っておきたいクラス設計の基礎知識 version 2
masuda220
PRO
24
6.4k
生成AIを使ったQAアプリケーションの作成 - ハンズオン補足資料
oracle4engineer
PRO
3
220
Making TCPSocket.new "Happy"!
coe401_
1
1.2k
Sharing features among Android applications: experience feedback
jbvincey
0
110
スモールスタートで始めるためのLambda×モノリス(Lambdalith)
akihisaikeda
2
280
API for docs
soutaro
2
1.3k
Featured
See All Featured
ReactJS: Keep Simple. Everything can be a component!
pedronauck
666
120k
Fireside Chat
paigeccino
37
3.4k
Easily Structure & Communicate Ideas using Wireframe
afnizarnur
194
16k
Typedesign – Prime Four
hannesfritz
41
2.6k
Rails Girls Zürich Keynote
gr2m
94
13k
For a Future-Friendly Web
brad_frost
176
9.7k
Save Time (by Creating Custom Rails Generators)
garrettdimon
PRO
31
1.1k
Producing Creativity
orderedlist
PRO
344
40k
個人開発の失敗を避けるイケてる考え方 / tips for indie hackers
panda_program
104
19k
How GitHub (no longer) Works
holman
314
140k
Unsuck your backbone
ammeep
670
57k
Learning to Love Humans: Emotional Interface Design
aarron
273
40k
Transcript
֬తσʔλߏΛ Java Ͱѻ͍͍ͨʂ 2017-08-23 JJUG night seminar LT KOMIYA Atsushi
@komiya_atsushi
Today’s topic
֬తσʔλߏ
֬తσʔλߏͱʁ • ֬తಛੑΛར༻ͨ͠σʔλߏ • ͋ΔΛɺ࣌ؒతۭؒ͘͠తʹޮΑ͘ (≅লϝϞϦͰ) ղ͘͜ͱΛతͱ͢Δ • ࠓճʮۭؒޮͷΑ͍σʔλߏʯʹண •
σʔλߏʹΑͬͯɺݫີղͰͳۙ͘ࣅղ ͕ಘΒΕΔ͜ͱ͕͋Δ • ਫ਼ͱۭؒޮτϨʔυΦϑͷؔ
ͲΜͳͱ͖ʹ͏ͷ͔ʁ
ͲΜͳͱ͖ʹ͏ͷ͔ʁ • ϦΞϧλΠϜ͔ͭେྔʹൃੜ͢ΔσʔλΛ ΦϯϥΠϯͰॲཧ͍ͨ͠ • ϝϞϦʹऩ·Γ͖Βͳ͍େنͳσʔλΛ ඇྗͳ PC Ͱॲཧ͍ͨ͠ •
ࢄॲཧͰ͖Δڥ͕͋ΔͳΒɺ͋͑ͯ ֬తσʔλߏΛ͏ඞཁͳ͍
Java Ͱ ֬తσʔλߏΛѻ͏
ࣗલ࣮ʁ ϥΠϒϥϦ͏ʁ • ଟ͘ͷ֬తσʔλߏɺͦͷจ͕͙͙ ΕӾཡՄೳͳঢ়ଶͰ͙͢ʹݟ͔ͭΔ • ͦΕΛಡΜͰࣗલ࣮͢ΔͷΑ͠ • ҰํͰ Maven
central ʹ͍ͭ͘ͷطଘ࣮ ͕ଘࡏ͍ͯ͠Δ • ڊਓͷݞͷ্ʹཱͭͷ͕ݡ͍Γํ
֬తσʔλߏͷ Java ࣮ • stream-lib ‘com.addthis:stream-lib’ • Membership query /
cardinality estimation / frequency counting / quantile estimation • Google Guava ‘com.google.guava:guava’ • Membership query • java-hll ‘net.agkn:hll’ • Cardinality estimation • t-digest ‘com.tdunning:t-digest’ • Quantile estimation
֬తσʔλߏͷ Java ࣮ • stream-lib ‘com.addthis:stream-lib’ • Membership query /
cardinality estimation / frequency counting / quantile estimation • Google Guava ‘com.google.guava:guava’ • Membership query • java-hll ‘net.agkn:hll’ • Cardinality estimation • t-digest ‘com.tdunning:t-digest’ • Quantile estimation
stream-lib ʹΑΔ ֬తσʔλߏͷར༻ํ๏
http://bit.ly/JJUG-2017-08- probds-code
Membership query
ཁૉ͕ू߹ʹଐ͢Δ͔൱͔Λఆ͢Δ
ཁૉ͕ू߹ʹଐ͢Δ͔൱͔Λఆ͢Δ Set<T> Λ༻ҙͯ͠ Set#contains(T) Ͱଘ൱Λఆ͠ Set#add(T) Ͱू߹ʹཁૉΛՃ͢Δ
Bloom filter • ֬తʹؒҧͬͨ͑ʢଘ൱݁ՌʣΛฦ͢ • ِཅੑ (ଘࡏ͠ͳ͍ͷΛଘࡏ͢Δͱޡೝ͢ Δࣄ) ੜ͡Δ͕ɺِӄੑੜ͡ͳ͍ •
ʮఆ͞ΕΔཁૉͷछྨʯʮڐ༰Ͱ͖Δِ ཅੑͷ֬ʯΛࢦఆͯ͠ɺώʔϓ༻ྔΛ੍ޚ Ͱ͖Δ • ཁૉͷՃͰ͖Δ͕ɺআ͍͠
stream-lib ͷ Bloom filter
stream-lib ͷ Bloom filter ཁૉͱِཅੑ֬Λࢦఆͯ͠ BloomFilter Λ༻ҙ͠ BloomFilter#isPresent(String) Ͱଘ൱Λఆ Set
ͱಉ༷ʹ add() ͢Δ
ώʔϓ༻ྔΛ֬ೝͯ͠ΈΔ • “Lorem ipsum” ͷςΩετΛྫʹɺJOL (Java Object Layout) Ͱώʔϓ༻ྔΛଌఆ •
http://openjdk.java.net/projects/code- tools/jol/ • Set: 6,032 bytes • stream-lib BloomFilter: 136 bytes 97.8% smaller !
Cardinality estimation
ҟͳΓΛٻΊΔ
ҟͳΓΛٻΊΔ Set<T> Λ༻ҙ͠ɺ Set#add() Ͱͻͨ͢ΒಥͬࠐΉ Set#size() ͰҟͳΓ͕ಘΒΕΔ
HyperLogLog++ (1/2) • ҟͳΓΛਪఆ͢Δσʔλߏ • ಘΒΕΔਪఆɺຊདྷͷҟͳΓʹର্ͯ͠ৼΕɾԼৼ Εͱʹى͜Γ͏Δ • Redshift /
BigQuery / Presto ͳͲͰɺCOUNT(DISTINCT x) Λۙࣅ͢Δखஈͱͯ͠ΘΕ͍ͯΔ • https://aws.amazon.com/jp/about-aws/whats-new/ 2013/11/11/amazon-redshift-new-performance-data- loading-security-features/ • https://cloud.google.com/blog/big-data/2017/07/ counting-uniques-faster-in-bigquery-with-hyperloglog
HyperLogLog++ (2/2) • ʮਪఆͷਫ਼ pʯΛௐ͢Δ͜ͱͰɺώʔϓ༻ྔΛ੍ ޚ͢Δ͜ͱ͕Ͱ͖Δ • Λେ͖͘͢Δͱਫ਼͕ߴ͘ͳΔ & ۭؒޮѱԽ͢Δ
• ఆ͞ΕΔҟͳΓඞཁͱ͞ΕΔਫ਼ɺώʔϓͷ੍ Λߟྀͯ͠ p Λܾఆ͢Δ • HyperLogLog ͷΈΛཧղ͢ΔʹɺҎԼͷϒϩάΤϯ τϦ͕͓͢͢Ί • http://blog.brainpad.co.jp/entry/2016/06/27/110000
stream-lib ͷ HyperLogLog++
stream-lib ͷ HyperLogLog++ ਫ਼Λࢦఆͯ͠ HyperLogLogPlus() Λ༻ҙ͢Δ HyperLogLogPlus#offer() ͰཁૉΛՃ͍ͯ͘͠ HyperLogLogPlus#cardinality() ͰҟͳΓ͕ಘΒΕΔ
Frequency counting
ཁૉͷසΛ্͑͛Δ
ཁૉͷසΛ্͑͛Δ Map Ͱཁૉ͝ͱͷΧϯλΛදݱ͢Δ ͻͨ͢Βཁૉ͝ͱʹ্͑͛Δ
Count-min sketch (1/2) • ཁૉͷසΛਪఆ͢ΔσʔλߏͷҰͭ • ࣮ࡍͷසΑΓେ͖͍ਪఆΛฦ͢͜ͱ͕ ͋ΔҰํͰɺখ͍͞ਪఆΛฦ͢͜ͱͳ͍ • ස͕খ͍͞ཁૉ΄Ͳɺ͜ͷόΠΞεͷӨ
ڹΛड͚͘͢ͳΔ
Count-min sketch (2/2) • width ͱ depth ͷೋͭͷύϥϝʔλͰɺۭؒ ޮਫ਼Λ੍ޚ͢Δ •
width * depth ͷݸͷΧϯλ͕࡞ΒΕΔ • Χϯλ 2࣍ݩྻͰදݱ • depth ͷ͚ͩϋογϡ͕࣮ؔߦ͞ΕΔͷ ͰɺతͳύϑΥʔϚϯεʹӨڹΛ༩͑Δ
stream-lib ͷ Count-min sketch
stream-lib ͷ Count-min sketch width:10 * depth:30 ͷΧϯλʹΑΔ Count-Min sketch
Λ༻ҙ͢Δ CountMinSketch#add(String, int) ͰΧϯτ͍ͯ͘͠
Quantile estimation
ύʔηϯλΠϧΛٻΊΔ
ύʔηϯλΠϧΛٻΊΔ ιʔτ͞Εͨঢ়ଶͰྻԽ͢Δ ͋ͱ n ύʔηϯλΠϧΛࢀর͢Δ͚ͩ
t-digest • ྻͷҐΛਪఆ͢Δσʔλߏ • ܦݧΛۙࣅతʹදݱ͢Δ • ύʔηϯλΠϧɺ͜ͷܦݧͷۙࣅදݱ͔Βૠ Λ༻͍ͯࢉग़͞ΕΔ • ʮѹॖύϥϝʔλʯʹΑͬͯɺਫ਼ͱۭؒޮͷτϨʔυ
ΦϑΛௐ͢Δ • Λେ͖͘͢Δ͜ͱͰɺਫ਼ΛߴΊΔ͜ͱ͕Ͱ͖Δ
stream-lib ͷ t-digest
stream-lib ͷ t-digest ѹॖύϥϝʔλΛࢦఆͯ͠ TDigest Λ༻ҙ͢Δ TDigest#add(double) ͰΛՃ͍ͯ͘͠ TDigest#quantile(double) ͰύʔηϯλΠϧΛಘΔ
·ͱΊ
·ͱΊ • ֬తσʔλߏΛ༻͍Δ͜ͱͰɺେنσʔλॲཧ ΦϯϥΠϯॲཧΛޮతʹ࣮ݱͰ͖Δʢ͔ʣ • Java Ͱ֬తσʔλߏΛ͓खܰʹѻ͍͍ͨͳΒɺ ·ͣstream-lib ͷར༻Λݕ౼ͯ͠ΈΔ •
ਪఆਫ਼ͱۭؒޮͷτϨʔυΦϑΛ੍ޚ͢Δ ύϥϝʔλͷௐɺ৬ਓܳʹͳΓ͕ͪ • JOL JMH Λ༻͍ͯɺ࣮ࡍͷۭؒޮͱ࣌ؒޮΛ ͖ͪΜͱଌఆ͠ͳ͕Βௐ͢Δ͜ͱΛ͓͢͢Ί͍ͨ͠
Thank you!