Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
確率的データ構造を Java で扱いたい! #JJUG
Search
KOMIYA Atsushi
August 23, 2017
Programming
6
2.2k
確率的データ構造を Java で扱いたい! #JJUG
JJUG ナイト・セミナー 「ビール片手にLT&納涼会 2017」 の発表資料です。
https://jjug.doorkeeper.jp/events/63719
KOMIYA Atsushi
August 23, 2017
Tweet
Share
More Decks by KOMIYA Atsushi
See All by KOMIYA Atsushi
#JJUG Java における乱数生成器とのつき合い方
komiya_atsushi
5
5.2k
#JJUG Fork/Join フレームワークを効率的に正しく使いたい
komiya_atsushi
0
480
[#JSUG] SmartNews における container friendly な Spring Boot アプリケーション開発
komiya_atsushi
1
11k
Java のデータ圧縮ライブラリを極める #jjug_ccc #ccc_c7
komiya_atsushi
4
4.9k
#devsumi 自然言語処理・機械学習によるファクトチェック業務の支援
komiya_atsushi
1
4.4k
SmartNews Ads における機械学習の活用とその運用 #mlops
komiya_atsushi
3
19k
GBDT によるクリック率予測を高速化したい #オレシカナイト vol.4
komiya_atsushi
5
1.3k
Maven central repository の artifact をランキングする #渋谷java
komiya_atsushi
0
1.3k
High-performance Jackson #渋谷Java
komiya_atsushi
2
17k
Other Decks in Programming
See All in Programming
Return of the Full-Stack Developer
simas
PRO
1
310
本当だってば!俺もTRICK 2022に入賞してたんだってば!
jinroq
0
250
Fluent UI Blazor 5 (alpha)の紹介
tomokusaba
0
140
Let's Take a Peek at PHP Parser 5.x!
inouehi
0
100
PHPによる"非"構造化プログラミング入門 -本当に熱いスパゲティコードを求めて- #phperkaigi
o0h
PRO
0
1.1k
SLI/SLOの設定を進めるその前に アラート品質の改善に取り組んだ話
tanden
2
730
AI時代のプログラミング教育 / programming education in ai era
kishida
23
20k
RCPと宣言型ポリシーについてのお話し
kokitamura
2
150
AHC 044 混合整数計画ソルバー解法
kiri8128
0
300
Modern Angular:Renovation for Your Applications @angularDays 2025 Munich
manfredsteyer
PRO
0
140
‘무차별 LGTM~👍’만 외치던 우리가 ‘고봉밥 코드 리뷰’를?
hannah0731
0
530
フロントエンドテストの育て方
quramy
9
2.5k
Featured
See All Featured
Keith and Marios Guide to Fast Websites
keithpitt
411
22k
Optimizing for Happiness
mojombo
377
70k
Fireside Chat
paigeccino
37
3.3k
Java REST API Framework Comparison - PWX 2021
mraible
29
8.5k
Build your cross-platform service in a week with App Engine
jlugia
229
18k
The Cost Of JavaScript in 2023
addyosmani
48
7.6k
Creating an realtime collaboration tool: Agile Flush - .NET Oxford
marcduiker
28
2k
Why Our Code Smells
bkeepers
PRO
336
57k
Done Done
chrislema
183
16k
Building Your Own Lightsaber
phodgson
104
6.3k
VelocityConf: Rendering Performance Case Studies
addyosmani
328
24k
Let's Do A Bunch of Simple Stuff to Make Websites Faster
chriscoyier
507
140k
Transcript
֬తσʔλߏΛ Java Ͱѻ͍͍ͨʂ 2017-08-23 JJUG night seminar LT KOMIYA Atsushi
@komiya_atsushi
Today’s topic
֬తσʔλߏ
֬తσʔλߏͱʁ • ֬తಛੑΛར༻ͨ͠σʔλߏ • ͋ΔΛɺ࣌ؒతۭؒ͘͠తʹޮΑ͘ (≅লϝϞϦͰ) ղ͘͜ͱΛతͱ͢Δ • ࠓճʮۭؒޮͷΑ͍σʔλߏʯʹண •
σʔλߏʹΑͬͯɺݫີղͰͳۙ͘ࣅղ ͕ಘΒΕΔ͜ͱ͕͋Δ • ਫ਼ͱۭؒޮτϨʔυΦϑͷؔ
ͲΜͳͱ͖ʹ͏ͷ͔ʁ
ͲΜͳͱ͖ʹ͏ͷ͔ʁ • ϦΞϧλΠϜ͔ͭେྔʹൃੜ͢ΔσʔλΛ ΦϯϥΠϯͰॲཧ͍ͨ͠ • ϝϞϦʹऩ·Γ͖Βͳ͍େنͳσʔλΛ ඇྗͳ PC Ͱॲཧ͍ͨ͠ •
ࢄॲཧͰ͖Δڥ͕͋ΔͳΒɺ͋͑ͯ ֬తσʔλߏΛ͏ඞཁͳ͍
Java Ͱ ֬తσʔλߏΛѻ͏
ࣗલ࣮ʁ ϥΠϒϥϦ͏ʁ • ଟ͘ͷ֬తσʔλߏɺͦͷจ͕͙͙ ΕӾཡՄೳͳঢ়ଶͰ͙͢ʹݟ͔ͭΔ • ͦΕΛಡΜͰࣗલ࣮͢ΔͷΑ͠ • ҰํͰ Maven
central ʹ͍ͭ͘ͷطଘ࣮ ͕ଘࡏ͍ͯ͠Δ • ڊਓͷݞͷ্ʹཱͭͷ͕ݡ͍Γํ
֬తσʔλߏͷ Java ࣮ • stream-lib ‘com.addthis:stream-lib’ • Membership query /
cardinality estimation / frequency counting / quantile estimation • Google Guava ‘com.google.guava:guava’ • Membership query • java-hll ‘net.agkn:hll’ • Cardinality estimation • t-digest ‘com.tdunning:t-digest’ • Quantile estimation
֬తσʔλߏͷ Java ࣮ • stream-lib ‘com.addthis:stream-lib’ • Membership query /
cardinality estimation / frequency counting / quantile estimation • Google Guava ‘com.google.guava:guava’ • Membership query • java-hll ‘net.agkn:hll’ • Cardinality estimation • t-digest ‘com.tdunning:t-digest’ • Quantile estimation
stream-lib ʹΑΔ ֬తσʔλߏͷར༻ํ๏
http://bit.ly/JJUG-2017-08- probds-code
Membership query
ཁૉ͕ू߹ʹଐ͢Δ͔൱͔Λఆ͢Δ
ཁૉ͕ू߹ʹଐ͢Δ͔൱͔Λఆ͢Δ Set<T> Λ༻ҙͯ͠ Set#contains(T) Ͱଘ൱Λఆ͠ Set#add(T) Ͱू߹ʹཁૉΛՃ͢Δ
Bloom filter • ֬తʹؒҧͬͨ͑ʢଘ൱݁ՌʣΛฦ͢ • ِཅੑ (ଘࡏ͠ͳ͍ͷΛଘࡏ͢Δͱޡೝ͢ Δࣄ) ੜ͡Δ͕ɺِӄੑੜ͡ͳ͍ •
ʮఆ͞ΕΔཁૉͷछྨʯʮڐ༰Ͱ͖Δِ ཅੑͷ֬ʯΛࢦఆͯ͠ɺώʔϓ༻ྔΛ੍ޚ Ͱ͖Δ • ཁૉͷՃͰ͖Δ͕ɺআ͍͠
stream-lib ͷ Bloom filter
stream-lib ͷ Bloom filter ཁૉͱِཅੑ֬Λࢦఆͯ͠ BloomFilter Λ༻ҙ͠ BloomFilter#isPresent(String) Ͱଘ൱Λఆ Set
ͱಉ༷ʹ add() ͢Δ
ώʔϓ༻ྔΛ֬ೝͯ͠ΈΔ • “Lorem ipsum” ͷςΩετΛྫʹɺJOL (Java Object Layout) Ͱώʔϓ༻ྔΛଌఆ •
http://openjdk.java.net/projects/code- tools/jol/ • Set: 6,032 bytes • stream-lib BloomFilter: 136 bytes 97.8% smaller !
Cardinality estimation
ҟͳΓΛٻΊΔ
ҟͳΓΛٻΊΔ Set<T> Λ༻ҙ͠ɺ Set#add() Ͱͻͨ͢ΒಥͬࠐΉ Set#size() ͰҟͳΓ͕ಘΒΕΔ
HyperLogLog++ (1/2) • ҟͳΓΛਪఆ͢Δσʔλߏ • ಘΒΕΔਪఆɺຊདྷͷҟͳΓʹର্ͯ͠ৼΕɾԼৼ Εͱʹى͜Γ͏Δ • Redshift /
BigQuery / Presto ͳͲͰɺCOUNT(DISTINCT x) Λۙࣅ͢Δखஈͱͯ͠ΘΕ͍ͯΔ • https://aws.amazon.com/jp/about-aws/whats-new/ 2013/11/11/amazon-redshift-new-performance-data- loading-security-features/ • https://cloud.google.com/blog/big-data/2017/07/ counting-uniques-faster-in-bigquery-with-hyperloglog
HyperLogLog++ (2/2) • ʮਪఆͷਫ਼ pʯΛௐ͢Δ͜ͱͰɺώʔϓ༻ྔΛ੍ ޚ͢Δ͜ͱ͕Ͱ͖Δ • Λେ͖͘͢Δͱਫ਼͕ߴ͘ͳΔ & ۭؒޮѱԽ͢Δ
• ఆ͞ΕΔҟͳΓඞཁͱ͞ΕΔਫ਼ɺώʔϓͷ੍ Λߟྀͯ͠ p Λܾఆ͢Δ • HyperLogLog ͷΈΛཧղ͢ΔʹɺҎԼͷϒϩάΤϯ τϦ͕͓͢͢Ί • http://blog.brainpad.co.jp/entry/2016/06/27/110000
stream-lib ͷ HyperLogLog++
stream-lib ͷ HyperLogLog++ ਫ਼Λࢦఆͯ͠ HyperLogLogPlus() Λ༻ҙ͢Δ HyperLogLogPlus#offer() ͰཁૉΛՃ͍ͯ͘͠ HyperLogLogPlus#cardinality() ͰҟͳΓ͕ಘΒΕΔ
Frequency counting
ཁૉͷසΛ্͑͛Δ
ཁૉͷසΛ্͑͛Δ Map Ͱཁૉ͝ͱͷΧϯλΛදݱ͢Δ ͻͨ͢Βཁૉ͝ͱʹ্͑͛Δ
Count-min sketch (1/2) • ཁૉͷසΛਪఆ͢ΔσʔλߏͷҰͭ • ࣮ࡍͷසΑΓେ͖͍ਪఆΛฦ͢͜ͱ͕ ͋ΔҰํͰɺখ͍͞ਪఆΛฦ͢͜ͱͳ͍ • ස͕খ͍͞ཁૉ΄Ͳɺ͜ͷόΠΞεͷӨ
ڹΛड͚͘͢ͳΔ
Count-min sketch (2/2) • width ͱ depth ͷೋͭͷύϥϝʔλͰɺۭؒ ޮਫ਼Λ੍ޚ͢Δ •
width * depth ͷݸͷΧϯλ͕࡞ΒΕΔ • Χϯλ 2࣍ݩྻͰදݱ • depth ͷ͚ͩϋογϡ͕࣮ؔߦ͞ΕΔͷ ͰɺతͳύϑΥʔϚϯεʹӨڹΛ༩͑Δ
stream-lib ͷ Count-min sketch
stream-lib ͷ Count-min sketch width:10 * depth:30 ͷΧϯλʹΑΔ Count-Min sketch
Λ༻ҙ͢Δ CountMinSketch#add(String, int) ͰΧϯτ͍ͯ͘͠
Quantile estimation
ύʔηϯλΠϧΛٻΊΔ
ύʔηϯλΠϧΛٻΊΔ ιʔτ͞Εͨঢ়ଶͰྻԽ͢Δ ͋ͱ n ύʔηϯλΠϧΛࢀর͢Δ͚ͩ
t-digest • ྻͷҐΛਪఆ͢Δσʔλߏ • ܦݧΛۙࣅతʹදݱ͢Δ • ύʔηϯλΠϧɺ͜ͷܦݧͷۙࣅදݱ͔Βૠ Λ༻͍ͯࢉग़͞ΕΔ • ʮѹॖύϥϝʔλʯʹΑͬͯɺਫ਼ͱۭؒޮͷτϨʔυ
ΦϑΛௐ͢Δ • Λେ͖͘͢Δ͜ͱͰɺਫ਼ΛߴΊΔ͜ͱ͕Ͱ͖Δ
stream-lib ͷ t-digest
stream-lib ͷ t-digest ѹॖύϥϝʔλΛࢦఆͯ͠ TDigest Λ༻ҙ͢Δ TDigest#add(double) ͰΛՃ͍ͯ͘͠ TDigest#quantile(double) ͰύʔηϯλΠϧΛಘΔ
·ͱΊ
·ͱΊ • ֬తσʔλߏΛ༻͍Δ͜ͱͰɺେنσʔλॲཧ ΦϯϥΠϯॲཧΛޮతʹ࣮ݱͰ͖Δʢ͔ʣ • Java Ͱ֬తσʔλߏΛ͓खܰʹѻ͍͍ͨͳΒɺ ·ͣstream-lib ͷར༻Λݕ౼ͯ͠ΈΔ •
ਪఆਫ਼ͱۭؒޮͷτϨʔυΦϑΛ੍ޚ͢Δ ύϥϝʔλͷௐɺ৬ਓܳʹͳΓ͕ͪ • JOL JMH Λ༻͍ͯɺ࣮ࡍͷۭؒޮͱ࣌ؒޮΛ ͖ͪΜͱଌఆ͠ͳ͕Βௐ͢Δ͜ͱΛ͓͢͢Ί͍ͨ͠
Thank you!