Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
犬でもわかる Minimal Acyclic Subsequential Transducer...
Search
Takuya Asano
June 27, 2019
Technology
2
1.3k
犬でもわかる Minimal Acyclic Subsequential Transducer / Introduction to Minimal Acyclic Subsequential Transducer
はてなの技術勉強会で LT 発表したときの資料です。
Takuya Asano
June 27, 2019
Tweet
Share
More Decks by Takuya Asano
See All by Takuya Asano
Research Paper Introduction in IR Reading 2022 Fall
takuyaa
0
3.2k
Introducing PTHash - Paper Reading Session (2021-11-19)
takuyaa
0
390
Research Paper Introduction in IR Reading 2021 Fall
takuyaa
0
260
Lucene Index Deep Dive
takuyaa
0
600
Introduction to Apache Lucene
takuyaa
5
1.2k
Research Paper Introduction in IR Reading 2020 Fall
takuyaa
2
4k
Research paper introduction (2019-11-12)
takuyaa
2
1k
Research paper introduction (2019-11-07)
takuyaa
1
420
IR Reading 2019秋 論文紹介 / IR Reading 2019Fall
takuyaa
2
1.2k
Other Decks in Technology
See All in Technology
2025年8月から始まるAWS Lambda INITフェーズ課金/AWS Lambda INIT phase billing changes
quiver
1
960
GraphQLを活用したリアーキテクチャに対応するSLI/Oの再設計
coconala_engineer
0
220
ソフトウェアテスト 最初の一歩 〜テスト設計技法をワークで体験しながら学ぶ〜 #JaSSTTokyo / SoftwareTestingFirstStep
nihonbuson
PRO
1
140
Computer Use〜OpenAIとAnthropicの比較と将来の展望〜
pharma_x_tech
6
1k
計測による継続的なCI/CDの改善
sansantech
PRO
1
250
使えるデータ基盤を作る技術選定の秘訣 / selecting-the-right-data-technology
pei0804
5
1k
AIによるコードレビューで開発体験を向上させよう!
moongift
PRO
0
420
クラウドネイティブ環境の脅威モデリング
kyohmizu
2
410
Google Cloud Next 2025 Recap アプリケーション開発を加速する機能アップデート / Application development-related features of Google Cloud
ryokotmng
0
140
非root化Androidスマホでも動く仮想マシンアプリを試してみた
arkw
0
120
250510 StepFunctionのテスト自動化始めました vol.1
east_takumi
1
210
Global Azure2025(GitHub Copilot ハンズオン)
tomokusaba
2
730
Featured
See All Featured
Optimising Largest Contentful Paint
csswizardry
37
3.2k
Being A Developer After 40
akosma
91
590k
GitHub's CSS Performance
jonrohan
1031
460k
Art, The Web, and Tiny UX
lynnandtonic
298
20k
Designing for Performance
lara
608
69k
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
280
13k
The Cost Of JavaScript in 2023
addyosmani
49
7.8k
The Illustrated Children's Guide to Kubernetes
chrisshort
48
49k
Writing Fast Ruby
sferik
628
61k
Fireside Chat
paigeccino
37
3.4k
How to Ace a Technical Interview
jacobian
276
23k
Gamification - CAS2011
davidbonilla
81
5.3k
Transcript
ݘͰΘ͔Δ Minimal Acyclic Subsequential Transducer 2019-06-27 ͯͳٕज़ษڧձ id:takuya-a
FSA ͱ FST • FSA (Finite State Automaton) • ༗ݶঢ়ଶΦʔτϚτϯ
• ೖྗྻΛडཧ͢Δ͔Ͳ͏͔ͷ bool Λฦ͢ • FST (Finite State Transducer) • ༗ݶঢ়ଶมث • FSA ͷҰछ • ೖྗྻΛडཧͨ͠ͱ͖ɺग़ྗྻΛฦ͢ • Minimal Acyclic Subsequential Transducer FST ͷҰछ { “onk” } { “onk” => “͓Μ͘” }
FST ͷ͍Έͪ • ͍ΘΏΔʮࣙॻҾ͖ʯʹ͑Δ • ΩʔͱͷϖΞΛอଘͰ͖ΔʢPerl Ͱ͍͏ͱϋογϡͱͯ͑͠Δʣ • ঢ়ଶΛͨͲΔ͚ͩͳͷͰݕࡧ͕ߴ •
ͱ͘ʹ ڞ௨಄ࣙݕࡧ (common prefix search) Ͱ༗ར • ͪΖΜ શҰகݕࡧ (exact match) Ͱ͖Δ • ಄ࣙඌ͕ࣙڞ༗͞ΕΔͷͰলϝϞϦ
FST ͷԠ༻ઌ • ݕࡧΤϯδϯͷࣙॻͱͯ͠ • Apache Lucene ͷίΞΞϧΰϦζϜͱͯ͠ɺ৭Μͳͱ͜ΖͰΘΕ͍ͯΔ • ओʹ୯ޠΛϧοΫΞοϓ͢ΔͨΊʹΘΕΔ
• ܗଶૉղੳثͷࣙॻͱͯ͠ • Janome (Python), Kuromoji (Java) Ͱ࠾༻͞Ε͍ͯΔ • ߴͳ common prefix search ͕ඞཁ • ԻೝࣝͷݴޠϞσϧͱͯ͠ • ॏΈ͖ FST (Weighted FST; WFST) ͕ΘΕΔ • https://www.slideshare.net/JiroNishitoba/wfst-61929888
Minimal Acyclic Subsequential Transducer Minimal ࠷খͷ Acyclic ϧʔϓͷͳ͍ Subsequential ෦(จࣈ)ྻͷ
Transducer มث “takuya” => “a” “takaya” => “n”
TRIE • ಄ࣙͷΈΛڞ༗͢Δσʔλߏ • πϦʔʹͳΔ • ඌࣙڞ༗Ͱ͖ͳ͍ • TAIL ྻͱ͍͏ςΫχοΫͰ
Ұ෦ڞ༗Ͱ͖Δ FST TRIE
Minimal Acyclic Subsequential Transducer ͷߏங • ཧ্࠷খͷ FST Λஞ࣍తʹߏஙͰ͖ΔΞϧΰϦζϜ͕͋Δ •
ৄ͘͠ҎԼͷจΛಡΜͰʂ • Mihov & Maurel (2001), Direct Construction of Minimal Acyclic Subsequential Transducers http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.24.3698 • จதͷٙࣅίʔυɺ46ߦ͕ؒҧ͑ͯΔ͔ΒؾΛ͚ͭͯͶ • ޡ: SET_OUTPUT • ਖ਼: SET_STATE_OUTPUT
Minimal Acyclic Subsequential Transducer ͷ࣮ • https://github.com/takuyaa/cdarts • Java Ͱॻ͍ͨ
• Lucene ͷ FST jdartsclone ͱൺֱ͢ΔͨΊ • ଞͷ࣮ • Java: https://github.com/apache/lucene-solr/tree/master/lucene/core/src/java/org/apache/lucene/util/fst • Go: https://github.com/ikawaha/mast • Python: https://github.com/mocobeta/janome/blob/master/janome/fst.py • Rust: https://github.com/BurntSushi/fst
࣮ݧʂ
සग़ӳ୯ޠͷ TRIE ͱ FST • Lucene ͷετοϓϫʔυΛΩʔɺ࿈൪Λͱͯ͠ߏங • શΩʔ: 33
• શจࣈ: 97 • TRIE • ঢ়ଶ: 58 • ભҠ: 57 • FST (Minimal Acyclic Subsequential Transducer) • ঢ়ଶ: 25 • ભҠ: 51 FST TRIE
ϙέϞϯӳมثͷ TRIE ͱ FST • ϙέϞϯͷӳޠ໊ΛΩʔɺຊޠ໊Λͱͯ͠ߏங • શΩʔ: 151 •
શจࣈ: 1103 • TRIE • ঢ়ଶ: 809 • ભҠ: 808 • FST (Minimal Acyclic Subsequential Transducer) • ঢ়ଶ: 459 • ભҠ: 604 FST TRIE
FST Λ֦େͨ͠ͷ ※ UTF-8 ͰΤϯίʔυ͍ͯͯ͠ 1όΠτ͚ͩڞ༗͞ΕͨΓ͢Δ ͷͰද্ࣔจࣈԽ͚ͯ͠·͢
ࢀߟ • Mihov & Maurel (2001), Direct Construction of Minimal
Acyclic Subsequential Transducers http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.24.3698 • Finite-state automata and directed acyclic graphs http://www.jandaciuk.pl/Fsm_algorithms/ • Changing Bits: Using Finite State Transducers in Lucene http://blog.mikemccandless.com/2010/12/using-finite-state-transducers-in.html • moco(beta)'s backup: [༁] Using Finite State Transducers in Lucene https://mocobeta-backup.tumblr.com/post/105777650158/using-finite-state-transducers-in-lucene • Index 1,600,000,000 Keys with Automata and Rust - Andrew Gallant's Blog https://blog.burntsushi.net/transducers/ • moco(beta)'s backup: Lucene FST ͷΞϧΰϦζϜ (1) ʙਤղฤʙ https://mocobeta-backup.tumblr.com/post/111076688132/lucene-fst-1 • moco(beta)'s backup: Lucene FST ͷΞϧΰϦζϜ (2) ʙ࣮ฤʙ https://mocobeta-backup.tumblr.com/post/113693778372/lucene-fst-2 • LuceneͰΘΕͯΔFSTΛ࣮ͯ͠Έͨʢਖ਼نදݱϚονɿVMΞϓϩʔνͷটʣ - Qiita https://qiita.com/ikawaha/items/be95304a803020e1b2d1 • Minimal Acyclic Subsequential TransducerͰ༡Ϳ - Negative/Positive Thinking https://jetbead.hatenablog.com/entry/20151014/1444756877