Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
犬でもわかる Minimal Acyclic Subsequential Transducer...
Search
Takuya Asano
June 27, 2019
Technology
2
1.4k
犬でもわかる Minimal Acyclic Subsequential Transducer / Introduction to Minimal Acyclic Subsequential Transducer
はてなの技術勉強会で LT 発表したときの資料です。
Takuya Asano
June 27, 2019
Tweet
Share
More Decks by Takuya Asano
See All by Takuya Asano
Research Paper Introduction in IR Reading 2022 Fall
takuyaa
0
3.4k
Introducing PTHash - Paper Reading Session (2021-11-19)
takuyaa
0
440
Research Paper Introduction in IR Reading 2021 Fall
takuyaa
0
280
Lucene Index Deep Dive
takuyaa
0
660
Introduction to Apache Lucene
takuyaa
5
1.3k
Research Paper Introduction in IR Reading 2020 Fall
takuyaa
2
4.3k
Research paper introduction (2019-11-12)
takuyaa
2
1k
Research paper introduction (2019-11-07)
takuyaa
1
430
IR Reading 2019秋 論文紹介 / IR Reading 2019Fall
takuyaa
2
1.3k
Other Decks in Technology
See All in Technology
20260208_第66回 コンピュータビジョン勉強会
keiichiito1978
0
130
All About Sansan – for New Global Engineers
sansan33
PRO
1
1.3k
Context Engineeringが企業で不可欠になる理由
hirosatogamo
PRO
3
570
MCPでつなぐElasticsearchとLLM - 深夜の障害対応を楽にしたい / Bridging Elasticsearch and LLMs with MCP
sashimimochi
0
160
予期せぬコストの急増を障害のように扱う――「コスト版ポストモーテム」の導入とその後の改善
muziyoshiz
1
1.8k
セキュリティについて学ぶ会 / 2026 01 25 Takamatsu WordPress Meetup
rocketmartue
1
300
Kiro IDEのドキュメントを全部読んだので地味だけどちょっと嬉しい機能を紹介する
khmoryz
0
180
ZOZOにおけるAI活用の現在 ~開発組織全体での取り組みと試行錯誤~
zozotech
PRO
5
5.3k
日本の85%が使う公共SaaSは、どう育ったのか
taketakekaho
1
150
Bill One 開発エンジニア 紹介資料
sansan33
PRO
4
17k
SREチームをどう作り、どう育てるか ― Findy横断SREのマネジメント
rvirus0817
0
220
こんなところでも(地味に)活躍するImage Modeさんを知ってるかい?- Image Mode for OpenShift -
tsukaman
0
130
Featured
See All Featured
Dealing with People You Can't Stand - Big Design 2015
cassininazir
367
27k
Tips & Tricks on How to Get Your First Job In Tech
honzajavorek
0
430
A brief & incomplete history of UX Design for the World Wide Web: 1989–2019
jct
1
300
GraphQLの誤解/rethinking-graphql
sonatard
74
11k
Unlocking the hidden potential of vector embeddings in international SEO
frankvandijk
0
170
WCS-LA-2024
lcolladotor
0
450
Evolving SEO for Evolving Search Engines
ryanjones
0
120
Chasing Engaging Ingredients in Design
codingconduct
0
110
Java REST API Framework Comparison - PWX 2021
mraible
34
9.1k
Crafting Experiences
bethany
1
48
From Legacy to Launchpad: Building Startup-Ready Communities
dugsong
0
140
YesSQL, Process and Tooling at Scale
rocio
174
15k
Transcript
ݘͰΘ͔Δ Minimal Acyclic Subsequential Transducer 2019-06-27 ͯͳٕज़ษڧձ id:takuya-a
FSA ͱ FST • FSA (Finite State Automaton) • ༗ݶঢ়ଶΦʔτϚτϯ
• ೖྗྻΛडཧ͢Δ͔Ͳ͏͔ͷ bool Λฦ͢ • FST (Finite State Transducer) • ༗ݶঢ়ଶมث • FSA ͷҰछ • ೖྗྻΛडཧͨ͠ͱ͖ɺग़ྗྻΛฦ͢ • Minimal Acyclic Subsequential Transducer FST ͷҰछ { “onk” } { “onk” => “͓Μ͘” }
FST ͷ͍Έͪ • ͍ΘΏΔʮࣙॻҾ͖ʯʹ͑Δ • ΩʔͱͷϖΞΛอଘͰ͖ΔʢPerl Ͱ͍͏ͱϋογϡͱͯ͑͠Δʣ • ঢ়ଶΛͨͲΔ͚ͩͳͷͰݕࡧ͕ߴ •
ͱ͘ʹ ڞ௨಄ࣙݕࡧ (common prefix search) Ͱ༗ར • ͪΖΜ શҰகݕࡧ (exact match) Ͱ͖Δ • ಄ࣙඌ͕ࣙڞ༗͞ΕΔͷͰলϝϞϦ
FST ͷԠ༻ઌ • ݕࡧΤϯδϯͷࣙॻͱͯ͠ • Apache Lucene ͷίΞΞϧΰϦζϜͱͯ͠ɺ৭Μͳͱ͜ΖͰΘΕ͍ͯΔ • ओʹ୯ޠΛϧοΫΞοϓ͢ΔͨΊʹΘΕΔ
• ܗଶૉղੳثͷࣙॻͱͯ͠ • Janome (Python), Kuromoji (Java) Ͱ࠾༻͞Ε͍ͯΔ • ߴͳ common prefix search ͕ඞཁ • ԻೝࣝͷݴޠϞσϧͱͯ͠ • ॏΈ͖ FST (Weighted FST; WFST) ͕ΘΕΔ • https://www.slideshare.net/JiroNishitoba/wfst-61929888
Minimal Acyclic Subsequential Transducer Minimal ࠷খͷ Acyclic ϧʔϓͷͳ͍ Subsequential ෦(จࣈ)ྻͷ
Transducer มث “takuya” => “a” “takaya” => “n”
TRIE • ಄ࣙͷΈΛڞ༗͢Δσʔλߏ • πϦʔʹͳΔ • ඌࣙڞ༗Ͱ͖ͳ͍ • TAIL ྻͱ͍͏ςΫχοΫͰ
Ұ෦ڞ༗Ͱ͖Δ FST TRIE
Minimal Acyclic Subsequential Transducer ͷߏங • ཧ্࠷খͷ FST Λஞ࣍తʹߏஙͰ͖ΔΞϧΰϦζϜ͕͋Δ •
ৄ͘͠ҎԼͷจΛಡΜͰʂ • Mihov & Maurel (2001), Direct Construction of Minimal Acyclic Subsequential Transducers http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.24.3698 • จதͷٙࣅίʔυɺ46ߦ͕ؒҧ͑ͯΔ͔ΒؾΛ͚ͭͯͶ • ޡ: SET_OUTPUT • ਖ਼: SET_STATE_OUTPUT
Minimal Acyclic Subsequential Transducer ͷ࣮ • https://github.com/takuyaa/cdarts • Java Ͱॻ͍ͨ
• Lucene ͷ FST jdartsclone ͱൺֱ͢ΔͨΊ • ଞͷ࣮ • Java: https://github.com/apache/lucene-solr/tree/master/lucene/core/src/java/org/apache/lucene/util/fst • Go: https://github.com/ikawaha/mast • Python: https://github.com/mocobeta/janome/blob/master/janome/fst.py • Rust: https://github.com/BurntSushi/fst
࣮ݧʂ
සग़ӳ୯ޠͷ TRIE ͱ FST • Lucene ͷετοϓϫʔυΛΩʔɺ࿈൪Λͱͯ͠ߏங • શΩʔ: 33
• શจࣈ: 97 • TRIE • ঢ়ଶ: 58 • ભҠ: 57 • FST (Minimal Acyclic Subsequential Transducer) • ঢ়ଶ: 25 • ભҠ: 51 FST TRIE
ϙέϞϯӳมثͷ TRIE ͱ FST • ϙέϞϯͷӳޠ໊ΛΩʔɺຊޠ໊Λͱͯ͠ߏங • શΩʔ: 151 •
શจࣈ: 1103 • TRIE • ঢ়ଶ: 809 • ભҠ: 808 • FST (Minimal Acyclic Subsequential Transducer) • ঢ়ଶ: 459 • ભҠ: 604 FST TRIE
FST Λ֦େͨ͠ͷ ※ UTF-8 ͰΤϯίʔυ͍ͯͯ͠ 1όΠτ͚ͩڞ༗͞ΕͨΓ͢Δ ͷͰද্ࣔจࣈԽ͚ͯ͠·͢
ࢀߟ • Mihov & Maurel (2001), Direct Construction of Minimal
Acyclic Subsequential Transducers http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.24.3698 • Finite-state automata and directed acyclic graphs http://www.jandaciuk.pl/Fsm_algorithms/ • Changing Bits: Using Finite State Transducers in Lucene http://blog.mikemccandless.com/2010/12/using-finite-state-transducers-in.html • moco(beta)'s backup: [༁] Using Finite State Transducers in Lucene https://mocobeta-backup.tumblr.com/post/105777650158/using-finite-state-transducers-in-lucene • Index 1,600,000,000 Keys with Automata and Rust - Andrew Gallant's Blog https://blog.burntsushi.net/transducers/ • moco(beta)'s backup: Lucene FST ͷΞϧΰϦζϜ (1) ʙਤղฤʙ https://mocobeta-backup.tumblr.com/post/111076688132/lucene-fst-1 • moco(beta)'s backup: Lucene FST ͷΞϧΰϦζϜ (2) ʙ࣮ฤʙ https://mocobeta-backup.tumblr.com/post/113693778372/lucene-fst-2 • LuceneͰΘΕͯΔFSTΛ࣮ͯ͠Έͨʢਖ਼نදݱϚονɿVMΞϓϩʔνͷটʣ - Qiita https://qiita.com/ikawaha/items/be95304a803020e1b2d1 • Minimal Acyclic Subsequential TransducerͰ༡Ϳ - Negative/Positive Thinking https://jetbead.hatenablog.com/entry/20151014/1444756877