Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
犬でもわかる Minimal Acyclic Subsequential Transducer...
Search
Takuya Asano
June 27, 2019
Technology
2
1.4k
犬でもわかる Minimal Acyclic Subsequential Transducer / Introduction to Minimal Acyclic Subsequential Transducer
はてなの技術勉強会で LT 発表したときの資料です。
Takuya Asano
June 27, 2019
Tweet
Share
More Decks by Takuya Asano
See All by Takuya Asano
Research Paper Introduction in IR Reading 2022 Fall
takuyaa
0
3.3k
Introducing PTHash - Paper Reading Session (2021-11-19)
takuyaa
0
410
Research Paper Introduction in IR Reading 2021 Fall
takuyaa
0
270
Lucene Index Deep Dive
takuyaa
0
620
Introduction to Apache Lucene
takuyaa
5
1.3k
Research Paper Introduction in IR Reading 2020 Fall
takuyaa
2
4.1k
Research paper introduction (2019-11-12)
takuyaa
2
1k
Research paper introduction (2019-11-07)
takuyaa
1
430
IR Reading 2019秋 論文紹介 / IR Reading 2019Fall
takuyaa
2
1.2k
Other Decks in Technology
See All in Technology
5分でカオスエンジニアリングを分かった気になろう
pandayumi
0
260
まずはマネコンでちゃちゃっと作ってから、それをCDKにしてみよか。
yamada_r
2
120
20250913_JAWS_sysad_kobe
takuyay0ne
2
250
これでもう迷わない!Jetpack Composeの書き方実践ガイド
zozotech
PRO
0
1.1k
Automating Web Accessibility Testing with AI Agents
maminami373
0
1.3k
フルカイテン株式会社 エンジニア向け採用資料
fullkaiten
0
8.8k
IoT x エッジAI - リアルタイ ムAI活用のPoCを今すぐ始め る方法 -
niizawat
0
120
Snowflake×dbtを用いたテレシーのデータ基盤のこれまでとこれから
sagara
0
120
【NoMapsTECH 2025】AI Edge Computing Workshop
akit37
0
230
実践!カスタムインストラクション&スラッシュコマンド
puku0x
0
540
react-callを使ってダイヤログをいろんなとこで再利用しよう!
shinaps
2
270
品質視点から考える組織デザイン/Organizational Design from Quality
mii3king
0
210
Featured
See All Featured
StorybookのUI Testing Handbookを読んだ
zakiyama
31
6.1k
CSS Pre-Processors: Stylus, Less & Sass
bermonpainter
358
30k
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
46
7.6k
Designing for Performance
lara
610
69k
[RailsConf 2023] Rails as a piece of cake
palkan
57
5.8k
How to Ace a Technical Interview
jacobian
279
23k
Principles of Awesome APIs and How to Build Them.
keavy
126
17k
Practical Orchestrator
shlominoach
190
11k
JavaScript: Past, Present, and Future - NDC Porto 2020
reverentgeek
52
5.6k
Performance Is Good for Brains [We Love Speed 2024]
tammyeverts
12
1.1k
Facilitating Awesome Meetings
lara
55
6.5k
Responsive Adventures: Dirty Tricks From The Dark Corners of Front-End
smashingmag
252
21k
Transcript
ݘͰΘ͔Δ Minimal Acyclic Subsequential Transducer 2019-06-27 ͯͳٕज़ษڧձ id:takuya-a
FSA ͱ FST • FSA (Finite State Automaton) • ༗ݶঢ়ଶΦʔτϚτϯ
• ೖྗྻΛडཧ͢Δ͔Ͳ͏͔ͷ bool Λฦ͢ • FST (Finite State Transducer) • ༗ݶঢ়ଶมث • FSA ͷҰछ • ೖྗྻΛडཧͨ͠ͱ͖ɺग़ྗྻΛฦ͢ • Minimal Acyclic Subsequential Transducer FST ͷҰछ { “onk” } { “onk” => “͓Μ͘” }
FST ͷ͍Έͪ • ͍ΘΏΔʮࣙॻҾ͖ʯʹ͑Δ • ΩʔͱͷϖΞΛอଘͰ͖ΔʢPerl Ͱ͍͏ͱϋογϡͱͯ͑͠Δʣ • ঢ়ଶΛͨͲΔ͚ͩͳͷͰݕࡧ͕ߴ •
ͱ͘ʹ ڞ௨಄ࣙݕࡧ (common prefix search) Ͱ༗ར • ͪΖΜ શҰகݕࡧ (exact match) Ͱ͖Δ • ಄ࣙඌ͕ࣙڞ༗͞ΕΔͷͰলϝϞϦ
FST ͷԠ༻ઌ • ݕࡧΤϯδϯͷࣙॻͱͯ͠ • Apache Lucene ͷίΞΞϧΰϦζϜͱͯ͠ɺ৭Μͳͱ͜ΖͰΘΕ͍ͯΔ • ओʹ୯ޠΛϧοΫΞοϓ͢ΔͨΊʹΘΕΔ
• ܗଶૉղੳثͷࣙॻͱͯ͠ • Janome (Python), Kuromoji (Java) Ͱ࠾༻͞Ε͍ͯΔ • ߴͳ common prefix search ͕ඞཁ • ԻೝࣝͷݴޠϞσϧͱͯ͠ • ॏΈ͖ FST (Weighted FST; WFST) ͕ΘΕΔ • https://www.slideshare.net/JiroNishitoba/wfst-61929888
Minimal Acyclic Subsequential Transducer Minimal ࠷খͷ Acyclic ϧʔϓͷͳ͍ Subsequential ෦(จࣈ)ྻͷ
Transducer มث “takuya” => “a” “takaya” => “n”
TRIE • ಄ࣙͷΈΛڞ༗͢Δσʔλߏ • πϦʔʹͳΔ • ඌࣙڞ༗Ͱ͖ͳ͍ • TAIL ྻͱ͍͏ςΫχοΫͰ
Ұ෦ڞ༗Ͱ͖Δ FST TRIE
Minimal Acyclic Subsequential Transducer ͷߏங • ཧ্࠷খͷ FST Λஞ࣍తʹߏஙͰ͖ΔΞϧΰϦζϜ͕͋Δ •
ৄ͘͠ҎԼͷจΛಡΜͰʂ • Mihov & Maurel (2001), Direct Construction of Minimal Acyclic Subsequential Transducers http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.24.3698 • จதͷٙࣅίʔυɺ46ߦ͕ؒҧ͑ͯΔ͔ΒؾΛ͚ͭͯͶ • ޡ: SET_OUTPUT • ਖ਼: SET_STATE_OUTPUT
Minimal Acyclic Subsequential Transducer ͷ࣮ • https://github.com/takuyaa/cdarts • Java Ͱॻ͍ͨ
• Lucene ͷ FST jdartsclone ͱൺֱ͢ΔͨΊ • ଞͷ࣮ • Java: https://github.com/apache/lucene-solr/tree/master/lucene/core/src/java/org/apache/lucene/util/fst • Go: https://github.com/ikawaha/mast • Python: https://github.com/mocobeta/janome/blob/master/janome/fst.py • Rust: https://github.com/BurntSushi/fst
࣮ݧʂ
සग़ӳ୯ޠͷ TRIE ͱ FST • Lucene ͷετοϓϫʔυΛΩʔɺ࿈൪Λͱͯ͠ߏங • શΩʔ: 33
• શจࣈ: 97 • TRIE • ঢ়ଶ: 58 • ભҠ: 57 • FST (Minimal Acyclic Subsequential Transducer) • ঢ়ଶ: 25 • ભҠ: 51 FST TRIE
ϙέϞϯӳมثͷ TRIE ͱ FST • ϙέϞϯͷӳޠ໊ΛΩʔɺຊޠ໊Λͱͯ͠ߏங • શΩʔ: 151 •
શจࣈ: 1103 • TRIE • ঢ়ଶ: 809 • ભҠ: 808 • FST (Minimal Acyclic Subsequential Transducer) • ঢ়ଶ: 459 • ભҠ: 604 FST TRIE
FST Λ֦େͨ͠ͷ ※ UTF-8 ͰΤϯίʔυ͍ͯͯ͠ 1όΠτ͚ͩڞ༗͞ΕͨΓ͢Δ ͷͰද্ࣔจࣈԽ͚ͯ͠·͢
ࢀߟ • Mihov & Maurel (2001), Direct Construction of Minimal
Acyclic Subsequential Transducers http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.24.3698 • Finite-state automata and directed acyclic graphs http://www.jandaciuk.pl/Fsm_algorithms/ • Changing Bits: Using Finite State Transducers in Lucene http://blog.mikemccandless.com/2010/12/using-finite-state-transducers-in.html • moco(beta)'s backup: [༁] Using Finite State Transducers in Lucene https://mocobeta-backup.tumblr.com/post/105777650158/using-finite-state-transducers-in-lucene • Index 1,600,000,000 Keys with Automata and Rust - Andrew Gallant's Blog https://blog.burntsushi.net/transducers/ • moco(beta)'s backup: Lucene FST ͷΞϧΰϦζϜ (1) ʙਤղฤʙ https://mocobeta-backup.tumblr.com/post/111076688132/lucene-fst-1 • moco(beta)'s backup: Lucene FST ͷΞϧΰϦζϜ (2) ʙ࣮ฤʙ https://mocobeta-backup.tumblr.com/post/113693778372/lucene-fst-2 • LuceneͰΘΕͯΔFSTΛ࣮ͯ͠Έͨʢਖ਼نදݱϚονɿVMΞϓϩʔνͷটʣ - Qiita https://qiita.com/ikawaha/items/be95304a803020e1b2d1 • Minimal Acyclic Subsequential TransducerͰ༡Ϳ - Negative/Positive Thinking https://jetbead.hatenablog.com/entry/20151014/1444756877