Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
SQLiteで高速全文検索〜日本語編〜
Search
shoby
August 20, 2014
Technology
3
4k
SQLiteで高速全文検索〜日本語編〜
【第8回】potatotips (iOS/Android開発Tips共有会)
shoby
August 20, 2014
Tweet
Share
More Decks by shoby
See All by shoby
フリルのCMを支える分析 (メディアバイイング編)
shoby
0
450
FirebaseとBigQueryによるアプリのKPI分析(CMの効果検証編)
shoby
2
2.6k
ユーザーに受け入れられ、問題を起こしづらい大規模リニューアルの進め方
shoby
60
13k
ReactiveCocoaで作る快適な登録フォーム
shoby
0
340
フリルの商品を色で検索できるようにした話
shoby
5
4.3k
Search fashion items by colors
shoby
0
160
iOS 7をサポート対象外にして開発を健全化する
shoby
3
4k
Apple Watch Tips
shoby
0
4.1k
WWDC 2015で 発表された新機能と サービスへの活かし方
shoby
2
7.4k
Other Decks in Technology
See All in Technology
要件定義・デザインフェーズでもAIを活用して、コミュニケーションの密度を高める
kazukihayase
0
120
実践!カスタムインストラクション&スラッシュコマンド
puku0x
0
540
新規プロダクトでプロトタイプから正式リリースまでNext.jsで開発したリアル
kawanoriku0
1
220
「Linux」という言葉が指すもの
sat
PRO
4
140
Webアプリケーションにオブザーバビリティを実装するRust入門ガイド
nwiizo
7
890
roppongirb_20250911
igaiga
1
250
Oracle Base Database Service 技術詳細
oracle4engineer
PRO
10
75k
LLM時代のパフォーマンスチューニング:MongoDB運用で試したコンテキスト活用の工夫
ishikawa_pro
0
170
「全員プロダクトマネージャー」を実現する、Cursorによる仕様検討の自動運転
applism118
22
12k
Create Ruby native extension gem with Go
sue445
0
130
ブロックテーマ時代における、テーマの CSS について考える Toro_Unit / 2025.09.13 @ Shinshu WordPress Meetup
torounit
0
130
OCI Oracle Database Services新機能アップデート(2025/06-2025/08)
oracle4engineer
PRO
0
180
Featured
See All Featured
Making Projects Easy
brettharned
117
6.4k
Building Flexible Design Systems
yeseniaperezcruz
329
39k
How to Create Impact in a Changing Tech Landscape [PerfNow 2023]
tammyeverts
53
3k
Designing for Performance
lara
610
69k
Exploring the Power of Turbo Streams & Action Cable | RailsConf2023
kevinliebholz
34
6k
Writing Fast Ruby
sferik
628
62k
[RailsConf 2023 Opening Keynote] The Magic of Rails
eileencodes
30
9.7k
How GitHub (no longer) Works
holman
315
140k
Practical Tips for Bootstrapping Information Extraction Pipelines
honnibal
PRO
23
1.4k
Faster Mobile Websites
deanohume
309
31k
Become a Pro
speakerdeck
PRO
29
5.5k
The Language of Interfaces
destraynor
161
25k
Transcript
42-JUFͰ ߴશจݕࡧ ʙຊޠฤʙ @shobyshoby
ࣗݾհ pixivͰΞϓϦͭͬͯ͘·͢
ຊ
iOSΞϓϦͰ SQLiteΛͬͯ ຊޠΛ ߴʹશจݕࡧ͍ͨ͠
֓ཁ
42-JUFͷશจݕࡧ֦ு ຊޠͷτʔΫϯׂ ݕࡧॱҐ σϞ
42-JUFͷ શจݕࡧ֦ு
SQLite FTS3 and FTS4 Extensions http://www.sqlite.org/fts3.html
iOS 6 SDK͔Β σϑΥϧτͰ͑Δ
جຊతͳ͍ํ
CREATE VIRTUAL TABLE article USING fts4 (title, body); FTS4 ςʔϒϧ
INSERT INSERT INTO article(title, body) VALUES ('Apple', 'I have iPhone4
and iPhone5s');
جຊతͳQuery SELECT * FROM article WHERE body MATCH 'iPhone5s'; term
match SELECT * FROM article WHERE body MATCH 'iPhone*'; term-prefix match
phrase match SELECT * FROM article WHERE body MATCH 'have
iPhone*'; ◦I have iPhone5s ×I have Android and iPhone5s
ಛघͳQuery SELECT * FROM article WHERE article MATCH 'iPhone*'; all
columns SELECT * FROM article WHERE article MATCH 'title:Apple iPhone*'; specified column
؆୯ʂ
※ͨͩ͠ӳޠʹݶΔ
ຊޠͷ߹ɺ FTSͷTokenizer͕ ͏·͘tokenʹ͚ΒΕͳ͍
FTSͷTokenizer͕ ͏·͘ѻ͑Δܗࣜʹ Ճͯ͠σʔλΛೖΕΑ͏ʂ
ຊޠͷ τʔΫϯׂ
FTSͷςʔϒϧʹ ֯εϖʔεͰ ׂͨ͠σʔλΛೖΕΔ
CREATE TABLE article (id, title, body); CREATE VIRTUAL TABLE articleTokens
USING fts4 (articleId, titleTokens, bodyTokens); ݩσʔλΛೖΕΔςʔϒϧ FTSςʔϒϧ
INSERT͍ͨ͠σʔλ INSERT INTO article(id, title, body) VALUES (1, 'Ξοϓϧ', 'ࢲΞΠϑΥʔϯ4ͱΞΠϑΥʔϯ5sΛ͍࣋ͬͯ·
͢'); ! INSERT INTO articleTokens(articleId, titleTokens, bodyTokens) VALUES (1, 'Ξοϓϧ', 'ࢲ ΞΠϑΥʔϯ4 ͱ ΞΠϑΥʔϯ5s Λ ࣋ͬ ͍ͯ·͢');
͍͛ͨSELECT SELECT * FROM article JOIN ( SELECT articleId FROM
articleTokens WHERE bodyTokens MATCH 'ΞΠϑΥʔϯ*' LIMIT 100; ) AS result ON article.id = result.articleId ORDER BY article.id;
τʔΫϯׂ͢Δ
CFStringTokenizer
෦ͰMeCabΛͬͯ ܗଶૉղੳͯ͠ΔΒ͍͠ http://stackoverflow.com/questions/8280824/how-to-use- cfstringtokenizer-with-chinese-and-japanese
- (NSArray *)tokenArrayWithString:(NSString *)string { NSLocale *locale = [[NSLocale alloc]
initWithLocaleIdentifier:@"ja"]; CFRange range = CFRangeMake(0, CFStringGetLength((CFStringRef)string)); CFStringTokenizerRef tokenizer = CFStringTokenizerCreate(kCFAllocatorDefault, (CFStringRef)string, range, kCFStringTokenizerUnitWordBoundary, (CFLocaleRef)locale); NSMutableArray *tokenArray = [NSMutableArray array]; while(CFStringTokenizerAdvanceToNextToken(tokenizer) != kCFStringTokenizerTokenNone) { CFRange tokenRange = CFStringTokenizerGetCurrentTokenRange(tokenizer); if(range.location != kCFNotFound) { NSString *token = [string substringWithRange:NSMakeRange(tokenRange.location, tokenRange.length)]; [tokenArray addObject:token]; } } CFRelease(tokenizer); return tokenArray; }
INSERT
tokenArrayΛ ֯εϖʔεͰ࿈݁ͯ͠ INSERT͢Δ [tokenArray componentsJoinedByString:@" "];
ݕࡧ
֤Tokenʹ * ΛՃ͠ ֯εϖʔεͰ࿈݁ͯ͠ QueryΛΈཱͯΔ NSMutableArray *searchTokens = [NSMutableArray arrayWithCapacity:tokenArray.count];
for (NSString *token in tokenArray) { [searchTokens addObject:[token stringByAppendingString:@"*"]]; } [searchTokens componentsJoinedByString:@" "];
ݕࡧॱҐ
Ϛον͕ߴ͍ॱʹ ݁ՌΛฦ͍ͨ͠
SQLite ಠࣗͷSQLؔΛ ఆٛͰ͖Δ
rank()ͱ͍͏ SQL͕ؔ ͋ͬͨͱ͢Δͱ…
͍͛ͨSELECT SELECT * FROM article JOIN ( SELECT articleId, rank(matchinfo(articleTokens,
'pcnalx')) AS rank FROM articleTokens WHERE bodyTokens MATCH 'ΞΠϑΥʔϯ*' LIMIT 100; ) AS result ON article.id = result.articleId ORDER BY rank DESC;
match_info() ! ϚονใΛฦ͢ FTSͷؔ http://www.sqlite.org/fts3.html#matchinfo
match_info()͔Β ϚονείΞΛฦ͢ SQLؔΛ࡞Ζ͏ʂ
CݴޠͰ…
ϚονείΞͷ ΞϧΰϦζϜ
Okapi BM25 http://en.wikipedia.org/wiki/Okapi_BM25
sqlite-okapi-bm25 https://github.com/rads/sqlite-okapi-bm25
ंྠͷ࠶ൃ໌ ආ͚ΒΕͨ ※ҙ ୯Ұͷcolumnʹ͔͠ରԠ͍ͯ͠ͳ͍ͷͰ ඞཁͳΒFork͢Δ
σϞ
·ͱΊ
42-JUFͷ'54Λ͏ͱɺߴʹ શจݕࡧͰ͖Δ ຊޠࣄલʹ5PLFOׂ͕ඞཁ ݕࡧॱҐಠࣗ42-ؔΛ࡞ͬͯ ରԠ͢Δ
͓͠·͍