Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
犬でもわかる Minimal Acyclic Subsequential Transducer...
Search
Takuya Asano
June 27, 2019
Technology
2
1.4k
犬でもわかる Minimal Acyclic Subsequential Transducer / Introduction to Minimal Acyclic Subsequential Transducer
はてなの技術勉強会で LT 発表したときの資料です。
Takuya Asano
June 27, 2019
Tweet
Share
More Decks by Takuya Asano
See All by Takuya Asano
Research Paper Introduction in IR Reading 2022 Fall
takuyaa
0
3.4k
Introducing PTHash - Paper Reading Session (2021-11-19)
takuyaa
0
440
Research Paper Introduction in IR Reading 2021 Fall
takuyaa
0
280
Lucene Index Deep Dive
takuyaa
0
660
Introduction to Apache Lucene
takuyaa
5
1.3k
Research Paper Introduction in IR Reading 2020 Fall
takuyaa
2
4.2k
Research paper introduction (2019-11-12)
takuyaa
2
1k
Research paper introduction (2019-11-07)
takuyaa
1
430
IR Reading 2019秋 論文紹介 / IR Reading 2019Fall
takuyaa
2
1.3k
Other Decks in Technology
See All in Technology
Digitization部 紹介資料
sansan33
PRO
1
6.6k
[PR] はじめてのデジタルアイデンティティという本を書きました
ritou
1
810
Databricks Free Edition講座 データエンジニアリング編
taka_aki
0
2.7k
アウトプットはいいぞ / output_iizo
uhooi
0
120
形式手法特論:コンパイラの「正しさ」は証明できるか? #burikaigi / BuriKaigi 2026
ytaka23
17
6.2k
人工知能のための哲学塾 ニューロフィロソフィ篇 第零夜 「ニューロフィロソフィとは何か?」
miyayou
0
470
戰略轉變:從建構 AI 代理人到發展可擴展的技能生態系統
appleboy
0
200
コミュニティが持つ「学びと成長の場」としての作用 / RSGT2026
ama_ch
2
330
田舎で20年スクラム(後編):一個人が企業で長期戦アジャイルに挑む意味
chinmo
1
1.5k
Proxmoxで作る自宅クラウド入門
koinunopochi
0
110
WebDriver BiDi 2025年のふりかえり
yotahada3
1
150
わが10年の叡智をぶつけたカオスなクラウドインフラが、なくなるということ。
sogaoh
PRO
1
630
Featured
See All Featured
Building an army of robots
kneath
306
46k
Tips & Tricks on How to Get Your First Job In Tech
honzajavorek
0
410
Marketing Yourself as an Engineer | Alaka | Gurzu
gurzu
0
110
職位にかかわらず全員がリーダーシップを発揮するチーム作り / Building a team where everyone can demonstrate leadership regardless of position
madoxten
54
49k
StorybookのUI Testing Handbookを読んだ
zakiyama
31
6.5k
Navigating the Design Leadership Dip - Product Design Week Design Leaders+ Conference 2024
apolaine
0
150
Heart Work Chapter 1 - Part 1
lfama
PRO
5
35k
The B2B funnel & how to create a winning content strategy
katarinadahlin
PRO
0
250
What’s in a name? Adding method to the madness
productmarketing
PRO
24
3.9k
コードの90%をAIが書く世界で何が待っているのか / What awaits us in a world where 90% of the code is written by AI
rkaga
58
41k
Sam Torres - BigQuery for SEOs
techseoconnect
PRO
0
170
AI: The stuff that nobody shows you
jnunemaker
PRO
2
170
Transcript
ݘͰΘ͔Δ Minimal Acyclic Subsequential Transducer 2019-06-27 ͯͳٕज़ษڧձ id:takuya-a
FSA ͱ FST • FSA (Finite State Automaton) • ༗ݶঢ়ଶΦʔτϚτϯ
• ೖྗྻΛडཧ͢Δ͔Ͳ͏͔ͷ bool Λฦ͢ • FST (Finite State Transducer) • ༗ݶঢ়ଶมث • FSA ͷҰछ • ೖྗྻΛडཧͨ͠ͱ͖ɺग़ྗྻΛฦ͢ • Minimal Acyclic Subsequential Transducer FST ͷҰछ { “onk” } { “onk” => “͓Μ͘” }
FST ͷ͍Έͪ • ͍ΘΏΔʮࣙॻҾ͖ʯʹ͑Δ • ΩʔͱͷϖΞΛอଘͰ͖ΔʢPerl Ͱ͍͏ͱϋογϡͱͯ͑͠Δʣ • ঢ়ଶΛͨͲΔ͚ͩͳͷͰݕࡧ͕ߴ •
ͱ͘ʹ ڞ௨಄ࣙݕࡧ (common prefix search) Ͱ༗ར • ͪΖΜ શҰகݕࡧ (exact match) Ͱ͖Δ • ಄ࣙඌ͕ࣙڞ༗͞ΕΔͷͰলϝϞϦ
FST ͷԠ༻ઌ • ݕࡧΤϯδϯͷࣙॻͱͯ͠ • Apache Lucene ͷίΞΞϧΰϦζϜͱͯ͠ɺ৭Μͳͱ͜ΖͰΘΕ͍ͯΔ • ओʹ୯ޠΛϧοΫΞοϓ͢ΔͨΊʹΘΕΔ
• ܗଶૉղੳثͷࣙॻͱͯ͠ • Janome (Python), Kuromoji (Java) Ͱ࠾༻͞Ε͍ͯΔ • ߴͳ common prefix search ͕ඞཁ • ԻೝࣝͷݴޠϞσϧͱͯ͠ • ॏΈ͖ FST (Weighted FST; WFST) ͕ΘΕΔ • https://www.slideshare.net/JiroNishitoba/wfst-61929888
Minimal Acyclic Subsequential Transducer Minimal ࠷খͷ Acyclic ϧʔϓͷͳ͍ Subsequential ෦(จࣈ)ྻͷ
Transducer มث “takuya” => “a” “takaya” => “n”
TRIE • ಄ࣙͷΈΛڞ༗͢Δσʔλߏ • πϦʔʹͳΔ • ඌࣙڞ༗Ͱ͖ͳ͍ • TAIL ྻͱ͍͏ςΫχοΫͰ
Ұ෦ڞ༗Ͱ͖Δ FST TRIE
Minimal Acyclic Subsequential Transducer ͷߏங • ཧ্࠷খͷ FST Λஞ࣍తʹߏஙͰ͖ΔΞϧΰϦζϜ͕͋Δ •
ৄ͘͠ҎԼͷจΛಡΜͰʂ • Mihov & Maurel (2001), Direct Construction of Minimal Acyclic Subsequential Transducers http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.24.3698 • จதͷٙࣅίʔυɺ46ߦ͕ؒҧ͑ͯΔ͔ΒؾΛ͚ͭͯͶ • ޡ: SET_OUTPUT • ਖ਼: SET_STATE_OUTPUT
Minimal Acyclic Subsequential Transducer ͷ࣮ • https://github.com/takuyaa/cdarts • Java Ͱॻ͍ͨ
• Lucene ͷ FST jdartsclone ͱൺֱ͢ΔͨΊ • ଞͷ࣮ • Java: https://github.com/apache/lucene-solr/tree/master/lucene/core/src/java/org/apache/lucene/util/fst • Go: https://github.com/ikawaha/mast • Python: https://github.com/mocobeta/janome/blob/master/janome/fst.py • Rust: https://github.com/BurntSushi/fst
࣮ݧʂ
සग़ӳ୯ޠͷ TRIE ͱ FST • Lucene ͷετοϓϫʔυΛΩʔɺ࿈൪Λͱͯ͠ߏங • શΩʔ: 33
• શจࣈ: 97 • TRIE • ঢ়ଶ: 58 • ભҠ: 57 • FST (Minimal Acyclic Subsequential Transducer) • ঢ়ଶ: 25 • ભҠ: 51 FST TRIE
ϙέϞϯӳมثͷ TRIE ͱ FST • ϙέϞϯͷӳޠ໊ΛΩʔɺຊޠ໊Λͱͯ͠ߏங • શΩʔ: 151 •
શจࣈ: 1103 • TRIE • ঢ়ଶ: 809 • ભҠ: 808 • FST (Minimal Acyclic Subsequential Transducer) • ঢ়ଶ: 459 • ભҠ: 604 FST TRIE
FST Λ֦େͨ͠ͷ ※ UTF-8 ͰΤϯίʔυ͍ͯͯ͠ 1όΠτ͚ͩڞ༗͞ΕͨΓ͢Δ ͷͰද্ࣔจࣈԽ͚ͯ͠·͢
ࢀߟ • Mihov & Maurel (2001), Direct Construction of Minimal
Acyclic Subsequential Transducers http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.24.3698 • Finite-state automata and directed acyclic graphs http://www.jandaciuk.pl/Fsm_algorithms/ • Changing Bits: Using Finite State Transducers in Lucene http://blog.mikemccandless.com/2010/12/using-finite-state-transducers-in.html • moco(beta)'s backup: [༁] Using Finite State Transducers in Lucene https://mocobeta-backup.tumblr.com/post/105777650158/using-finite-state-transducers-in-lucene • Index 1,600,000,000 Keys with Automata and Rust - Andrew Gallant's Blog https://blog.burntsushi.net/transducers/ • moco(beta)'s backup: Lucene FST ͷΞϧΰϦζϜ (1) ʙਤղฤʙ https://mocobeta-backup.tumblr.com/post/111076688132/lucene-fst-1 • moco(beta)'s backup: Lucene FST ͷΞϧΰϦζϜ (2) ʙ࣮ฤʙ https://mocobeta-backup.tumblr.com/post/113693778372/lucene-fst-2 • LuceneͰΘΕͯΔFSTΛ࣮ͯ͠Έͨʢਖ਼نදݱϚονɿVMΞϓϩʔνͷটʣ - Qiita https://qiita.com/ikawaha/items/be95304a803020e1b2d1 • Minimal Acyclic Subsequential TransducerͰ༡Ϳ - Negative/Positive Thinking https://jetbead.hatenablog.com/entry/20151014/1444756877