$30 off During Our Annual Pro Sale. View Details »
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
俺の全文検索エンジン(Go製)を作り始めた
Search
kotaroooo0
November 11, 2020
Programming
0
110
俺の全文検索エンジン(Go製)を作り始めた
kotaroooo0
November 11, 2020
Tweet
Share
More Decks by kotaroooo0
See All by kotaroooo0
データ鮮度を落とさずに安全にReindexしたい
kotaroooo0
0
95
検索エンジン自作入門 Go Conference 2021 Spring
kotaroooo0
17
7.5k
転置インデックスでどう検索しているか
kotaroooo0
0
340
ぼくのかんがえたさいきょうのDocker Build
kotaroooo0
0
91
Other Decks in Programming
See All in Programming
令和最新版Android Studioで化石デバイス向けアプリを作る
arkw
0
440
안드로이드 9년차 개발자, 프론트엔드 주니어로 커리어 리셋하기
maryang
1
130
まだ間に合う!Claude Code元年をふりかえる
nogu66
5
890
Full-Cycle Reactivity in Angular: SignalStore mit Signal Forms und Resources
manfredsteyer
PRO
0
170
re:Invent 2025 のイケてるサービスを紹介する
maroon1st
0
150
Cap'n Webについて
yusukebe
0
150
GoLab2025 Recap
kuro_kurorrr
0
780
Context is King? 〜Verifiability時代とコンテキスト設計 / Beyond "Context is King"
rkaga
10
1.4k
Cell-Based Architecture
larchanjo
0
140
開発に寄りそう自動テストの実現
goyoki
2
1.4k
認証・認可の基本を学ぼう前編
kouyuume
0
270
Graviton と Nitro と私
maroon1st
0
130
Featured
See All Featured
[RailsConf 2023] Rails as a piece of cake
palkan
58
6.2k
Information Architects: The Missing Link in Design Systems
soysaucechin
0
710
Mobile First: as difficult as doing things right
swwweet
225
10k
Speed Design
sergeychernyshev
33
1.4k
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
333
22k
Discover your Explorer Soul
emna__ayadi
2
1k
Imperfection Machines: The Place of Print at Facebook
scottboms
269
13k
Navigating the Design Leadership Dip - Product Design Week Design Leaders+ Conference 2024
apolaine
0
120
Improving Core Web Vitals using Speculation Rules API
sergeychernyshev
21
1.3k
A brief & incomplete history of UX Design for the World Wide Web: 1989–2019
jct
1
260
Prompt Engineering for Job Search
mfonobong
0
120
Understanding Cognitive Biases in Performance Measurement
bluesmoon
32
2.8k
Transcript
2020/11/11 @kotaroooo0 ԶͷશจݕࡧΤϯδϯ(Go) Λ࡞Γ࢝Ίͨ
͜ͷLTΛฉ͘ͱ… 1. ͳΜͱͳ͘શจݕࡧΤϯδϯͷΈ͕͔ Δ 2. GoͰશจݕࡧΤϯδϯΛ࡞Γ࢝ΊΒΕΔ
-શจݕࡧΤϯδϯͷΈΛΔ -ElasticsearchͰΘΕ͍ͯΔApache LuceneΆ͍༷ͷͷΛ࡞Δ -GoΛֶΜͰ͍ΔͷͰԿ͔࡞Γ͍ͨ -Twitter BotΛ࡞͍ͬͯΔ͕ɺͪΐ͏Ͳ͍͍શจݕࡧΤϯδϯ͕ͳ͍ -ܰྔɺ͔ͭॏΈ͖ϨʔϕϯγϡλΠϯڑΛܭࢉͰ͖Δͭ શจݕࡧΤϯδϯΛ࡞Δཧ༝
3Ͱ͔ΔશจݕࡧΤϯδϯ
શจݕࡧͷΈ INDEXING ୯ޠ จॻ have 1,2 pen 1 we 2
Desk 2 จॻ1 “I have a pen.” จॻ2 “We have desk.” CHAR FILTER TOKENIZER TOKEN FILTER Analyzer
શจݕࡧͷΈ SEARCH ୯ޠ จॻ have 1,2 pen 1 we 2
Desk 2 ݕࡧϫʔυ: “pen” จॻ1͕ώοτ CHAR FILTER TOKENIZER TOKEN FILTER Analyzer
ANALYZERͳͥඞཁ? - τʔΫϯׂͯ͘͠ΕΔͨΊ - “I have a pen.” ͜ͷ··ͰసஔΠϯσοΫε Λ࡞Ͱ͖ͳ͍ͷͰɺI,
have, a, penͱτʔΫ ϯׂ͍ͨ͠ - ΫΤϦͷදه༳ΕΛٵऩͨ͠Γ͢ΔͨΊ - “GOD”ͱ͍͏୯ޠΛؚΉυΩϡϝϯτ ɺ”god”Ͱώοτ͢ΔΑ͏ʹখจࣈʹ౷Ұ ͍ͤͨ͞ - ແବͳτʔΫϯͷϑΟϧλϦϯά - theͳͲΠϯσΩγϯάͯ͠ແବ Analyzeલ Analyzeޙ “I have a BIG pen!” have, big, pen
۩ମతͳANALYZERྫ - Char Filter(Tokenizerͷલʹॲཧ͢Δ) 0ݸҎ্ - Mapping: إจࣈΛ୯ޠʹมͳͲ - HTMLstrip:
HTMLΛύʔε - Tokenizer(τʔΫϯׂ͢Δ) 1ݸ - Standard: εϖʔεͳͲϧʔϧʹैׂͬͯ - Kuromoji: ܗଶૉղੳͰׂ - Ngram: Nจࣈ͝ͱʹׂ - Token Filter(Tokenizerͷޙʹॲཧ͢Δ) 0ݸҎ্ - Lowercase: খจࣈ - Stopword: ετοϓϫʔυআڈ - Stemming: දه༳Ε CHAR FILTER TOKENIZER TOKEN FILTER Analyzer
࣮
ANALYZERͷ࣮ߦΠϝʔδ Analyzeલ MappingCharFilter StandardTokenizer LowercaseFilter StopWordFilter StemmerFilter I have a
lot of TASKs. I am very sad :( I have a lot of TASKs. I am very sad _sad_ I, have, a, lot, of, TASKs, I, am, very, sad, sad I, have, a, lot, of, tasks, I, am, very, sad, sad lot, tasks, am, very, sad, sad lot, task, am, very, sad, sad ॲཧͷྲྀΕ
ANALYZERͷ࣮
CHAR FILTERͷ࣮
TOKENIZERͷ࣮
TOKEN FILTERͷ࣮
INDEX - సஔΠϯσοΫε map[string][]int - సஔΠϯσοΫεΛϑΟʔϧυ ໊͝ͱʹ࣋ͭ - υΩϡϝϯτɺIDͱϑΟʔϧ υΛ࣋ͭ
- Indexing͢Δͱ͖AnalyzerΛ ௨͢
SEARCH -ANDݕࡧͱORݕࡧ -ΫΤϦ: “pink orange blue” -ANDݕࡧ: 3 -OR: ݕࡧ1,2,3,4,5,6
pink Orange blue 6 3 4 5 2 1
ಈ͔ͯ͠ΈΔ
ݕࡧྫ - ※Analyzer͖ͬ͞ͱಉ༷ - IndexSearchAnalyzerΛ௨͢ - ”foxes”Ͱ”fox”ΛؚΉυΩϡϝϯτ͕Ϛον - ”happy”Ͱ”:)”ΛؚΉυΩϡϝϯτ͕Ϛον -
ANDݕࡧ - fine,FaX,foxes,happy͕શؚͯ·Ε͍ͯΔυΩϡϝ ϯτ1,2͕ώοτ͍ͯ͠Δ
ࠓޙ -͍͋·͍ݕࡧΛ࣮͢Δ -Fuzzy QueryɺSuggesters -είΞܭࢉΛ࣮͢Δ -IFIDF, BM25
ࢀߟ -https://github.com/kotaroooo0/stalefish -https://artem.krylysov.com/blog/2020/07/28/lets-build-a-full-text-search-engine/