Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
俺の全文検索エンジン(Go製)を作り始めた
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
kotaroooo0
November 11, 2020
Programming
0
120
俺の全文検索エンジン(Go製)を作り始めた
kotaroooo0
November 11, 2020
Tweet
Share
More Decks by kotaroooo0
See All by kotaroooo0
データ鮮度を落とさずに安全にReindexしたい
kotaroooo0
0
100
検索エンジン自作入門 Go Conference 2021 Spring
kotaroooo0
17
7.5k
転置インデックスでどう検索しているか
kotaroooo0
0
360
ぼくのかんがえたさいきょうのDocker Build
kotaroooo0
0
97
Other Decks in Programming
See All in Programming
The free-lunch guide to idea circularity
hollycummins
0
270
AI 開発合宿を通して得た学び
niftycorp
PRO
0
150
Codexに役割を持たせる 他のAIエージェントと組み合わせる実務Tips
o8n
4
1.4k
エラーログのマスキングの仕組みづくりに役立ったASTの話
kumoichi
0
240
Fundamentals of Software Engineering In the Age of AI
therealdanvega
2
260
Symfony + NelmioApiDocBundle を使った スキーマ駆動開発 / Schema Driven Development with NelmioApiDocBundle
okashoi
0
170
どんと来い、データベース信頼性エンジニアリング / Introduction to DBRE
nnaka2992
1
300
PHPのバージョンアップ時にも役立ったAST(2026年版)
matsuo_atsushi
0
130
Claude Codeログ基盤の構築
giginet
PRO
7
3.4k
生成 AI 時代のスナップショットテストってやつを見せてあげますよ(α版)
ojun9
0
260
GoのDB アクセスにおける 「型安全」と「柔軟性」の両立 - Bob という選択肢
tak848
0
210
技術検証結果の整理と解析をAIに任せよう!
keisukeikeda
0
130
Featured
See All Featured
The Straight Up "How To Draw Better" Workshop
denniskardys
239
140k
AI Search: Implications for SEO and How to Move Forward - #ShenzhenSEOConference
aleyda
1
1.2k
Why Your Marketing Sucks and What You Can Do About It - Sophie Logan
marketingsoph
0
110
Why Our Code Smells
bkeepers
PRO
340
58k
Getting science done with accelerated Python computing platforms
jacobtomlinson
2
140
Lightning Talk: Beautiful Slides for Beginners
inesmontani
PRO
1
480
Tell your own story through comics
letsgokoyo
1
850
How to Grow Your eCommerce with AI & Automation
katarinadahlin
PRO
1
150
Navigating the Design Leadership Dip - Product Design Week Design Leaders+ Conference 2024
apolaine
0
230
Making Projects Easy
brettharned
120
6.6k
Faster Mobile Websites
deanohume
310
31k
Music & Morning Musume
bryan
47
7.1k
Transcript
2020/11/11 @kotaroooo0 ԶͷશจݕࡧΤϯδϯ(Go) Λ࡞Γ࢝Ίͨ
͜ͷLTΛฉ͘ͱ… 1. ͳΜͱͳ͘શจݕࡧΤϯδϯͷΈ͕͔ Δ 2. GoͰશจݕࡧΤϯδϯΛ࡞Γ࢝ΊΒΕΔ
-શจݕࡧΤϯδϯͷΈΛΔ -ElasticsearchͰΘΕ͍ͯΔApache LuceneΆ͍༷ͷͷΛ࡞Δ -GoΛֶΜͰ͍ΔͷͰԿ͔࡞Γ͍ͨ -Twitter BotΛ࡞͍ͬͯΔ͕ɺͪΐ͏Ͳ͍͍શจݕࡧΤϯδϯ͕ͳ͍ -ܰྔɺ͔ͭॏΈ͖ϨʔϕϯγϡλΠϯڑΛܭࢉͰ͖Δͭ શจݕࡧΤϯδϯΛ࡞Δཧ༝
3Ͱ͔ΔશจݕࡧΤϯδϯ
શจݕࡧͷΈ INDEXING ୯ޠ จॻ have 1,2 pen 1 we 2
Desk 2 จॻ1 “I have a pen.” จॻ2 “We have desk.” CHAR FILTER TOKENIZER TOKEN FILTER Analyzer
શจݕࡧͷΈ SEARCH ୯ޠ จॻ have 1,2 pen 1 we 2
Desk 2 ݕࡧϫʔυ: “pen” จॻ1͕ώοτ CHAR FILTER TOKENIZER TOKEN FILTER Analyzer
ANALYZERͳͥඞཁ? - τʔΫϯׂͯ͘͠ΕΔͨΊ - “I have a pen.” ͜ͷ··ͰసஔΠϯσοΫε Λ࡞Ͱ͖ͳ͍ͷͰɺI,
have, a, penͱτʔΫ ϯׂ͍ͨ͠ - ΫΤϦͷදه༳ΕΛٵऩͨ͠Γ͢ΔͨΊ - “GOD”ͱ͍͏୯ޠΛؚΉυΩϡϝϯτ ɺ”god”Ͱώοτ͢ΔΑ͏ʹখจࣈʹ౷Ұ ͍ͤͨ͞ - ແବͳτʔΫϯͷϑΟϧλϦϯά - theͳͲΠϯσΩγϯάͯ͠ແବ Analyzeલ Analyzeޙ “I have a BIG pen!” have, big, pen
۩ମతͳANALYZERྫ - Char Filter(Tokenizerͷલʹॲཧ͢Δ) 0ݸҎ্ - Mapping: إจࣈΛ୯ޠʹมͳͲ - HTMLstrip:
HTMLΛύʔε - Tokenizer(τʔΫϯׂ͢Δ) 1ݸ - Standard: εϖʔεͳͲϧʔϧʹैׂͬͯ - Kuromoji: ܗଶૉղੳͰׂ - Ngram: Nจࣈ͝ͱʹׂ - Token Filter(Tokenizerͷޙʹॲཧ͢Δ) 0ݸҎ্ - Lowercase: খจࣈ - Stopword: ετοϓϫʔυআڈ - Stemming: දه༳Ε CHAR FILTER TOKENIZER TOKEN FILTER Analyzer
࣮
ANALYZERͷ࣮ߦΠϝʔδ Analyzeલ MappingCharFilter StandardTokenizer LowercaseFilter StopWordFilter StemmerFilter I have a
lot of TASKs. I am very sad :( I have a lot of TASKs. I am very sad _sad_ I, have, a, lot, of, TASKs, I, am, very, sad, sad I, have, a, lot, of, tasks, I, am, very, sad, sad lot, tasks, am, very, sad, sad lot, task, am, very, sad, sad ॲཧͷྲྀΕ
ANALYZERͷ࣮
CHAR FILTERͷ࣮
TOKENIZERͷ࣮
TOKEN FILTERͷ࣮
INDEX - సஔΠϯσοΫε map[string][]int - సஔΠϯσοΫεΛϑΟʔϧυ ໊͝ͱʹ࣋ͭ - υΩϡϝϯτɺIDͱϑΟʔϧ υΛ࣋ͭ
- Indexing͢Δͱ͖AnalyzerΛ ௨͢
SEARCH -ANDݕࡧͱORݕࡧ -ΫΤϦ: “pink orange blue” -ANDݕࡧ: 3 -OR: ݕࡧ1,2,3,4,5,6
pink Orange blue 6 3 4 5 2 1
ಈ͔ͯ͠ΈΔ
ݕࡧྫ - ※Analyzer͖ͬ͞ͱಉ༷ - IndexSearchAnalyzerΛ௨͢ - ”foxes”Ͱ”fox”ΛؚΉυΩϡϝϯτ͕Ϛον - ”happy”Ͱ”:)”ΛؚΉυΩϡϝϯτ͕Ϛον -
ANDݕࡧ - fine,FaX,foxes,happy͕શؚͯ·Ε͍ͯΔυΩϡϝ ϯτ1,2͕ώοτ͍ͯ͠Δ
ࠓޙ -͍͋·͍ݕࡧΛ࣮͢Δ -Fuzzy QueryɺSuggesters -είΞܭࢉΛ࣮͢Δ -IFIDF, BM25
ࢀߟ -https://github.com/kotaroooo0/stalefish -https://artem.krylysov.com/blog/2020/07/28/lets-build-a-full-text-search-engine/