Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
SQLiteで高速全文検索〜日本語編〜
Search
Sponsored
·
Ship Features Fearlessly
Turn features on and off without deploys. Used by thousands of Ruby developers.
→
shoby
August 20, 2014
Technology
4.1k
3
Share
SQLiteで高速全文検索〜日本語編〜
【第8回】potatotips (iOS/Android開発Tips共有会)
shoby
August 20, 2014
More Decks by shoby
See All by shoby
フリルのCMを支える分析 (メディアバイイング編)
shoby
0
490
FirebaseとBigQueryによるアプリのKPI分析(CMの効果検証編)
shoby
2
2.7k
ユーザーに受け入れられ、問題を起こしづらい大規模リニューアルの進め方
shoby
60
14k
ReactiveCocoaで作る快適な登録フォーム
shoby
0
370
フリルの商品を色で検索できるようにした話
shoby
5
4.4k
Search fashion items by colors
shoby
0
220
iOS 7をサポート対象外にして開発を健全化する
shoby
3
4.1k
Apple Watch Tips
shoby
0
4.2k
WWDC 2015で 発表された新機能と サービスへの活かし方
shoby
2
7.5k
Other Decks in Technology
See All in Technology
OpenClawとHermesAgentでAI新入社員を作った話
takanoriyanada
0
140
シンデレラなんかになりたくない!ガラスの靴が割れた時代にどう歩く?
nomizone
0
220
AI時代から振り返るTerraform drift運用の歴史 / AI Age Reflections on the History of Terraform Drift Operations
aeonpeople
0
590
Kiro CLI v2.0.0がやってきた!
kentapapa
0
220
Oracle AI Database@Azure:サービス概要のご紹介
oracle4engineer
PRO
6
1.8k
基礎から解説!Icebergで紐解くSnowflake×Databricks連携の現在地
cm_yasuhara
0
390
Terraformモジュールは、なぜ「魔境」化するのか
hayama17
1
120
Spring AI × MCP 入門〜AIエージェントへのツール公開、境界設計から始める最小構成 〜
yuyamiyamoto
0
180
GitHub Copilot CLIでWebアクセシビリティを改善した話
tomokusaba
0
130
Agentic Design Patterns
glaforge
0
280
Amazon Bedrock 経由の Claude Cowork を試してみよう・MCP にも繋いでみよう
sugimomoto
0
250
テストコードのないプロジェクトにテストを根付かせる
tttol
0
230
Featured
See All Featured
The Psychology of Web Performance [Beyond Tellerrand 2023]
tammyeverts
49
3.4k
Git: the NoSQL Database
bkeepers
PRO
432
67k
Kristin Tynski - Automating Marketing Tasks With AI
techseoconnect
PRO
0
260
BBQ
matthewcrist
89
10k
Marketing Yourself as an Engineer | Alaka | Gurzu
gurzu
0
210
GitHub's CSS Performance
jonrohan
1033
470k
Utilizing Notion as your number one productivity tool
mfonobong
4
310
Impact Scores and Hybrid Strategies: The future of link building
tamaranovitovic
0
290
Bioeconomy Workshop: Dr. Julius Ecuru, Opportunities for a Bioeconomy in West Africa
akademiya2063
PRO
1
120
Large-scale JavaScript Application Architecture
addyosmani
515
110k
Practical Tips for Bootstrapping Information Extraction Pipelines
honnibal
25
1.9k
Between Models and Reality
mayunak
4
310
Transcript
42-JUFͰ ߴશจݕࡧ ʙຊޠฤʙ @shobyshoby
ࣗݾհ pixivͰΞϓϦͭͬͯ͘·͢
ຊ
iOSΞϓϦͰ SQLiteΛͬͯ ຊޠΛ ߴʹશจݕࡧ͍ͨ͠
֓ཁ
42-JUFͷશจݕࡧ֦ு ຊޠͷτʔΫϯׂ ݕࡧॱҐ σϞ
42-JUFͷ શจݕࡧ֦ு
SQLite FTS3 and FTS4 Extensions http://www.sqlite.org/fts3.html
iOS 6 SDK͔Β σϑΥϧτͰ͑Δ
جຊతͳ͍ํ
CREATE VIRTUAL TABLE article USING fts4 (title, body); FTS4 ςʔϒϧ
INSERT INSERT INTO article(title, body) VALUES ('Apple', 'I have iPhone4
and iPhone5s');
جຊతͳQuery SELECT * FROM article WHERE body MATCH 'iPhone5s'; term
match SELECT * FROM article WHERE body MATCH 'iPhone*'; term-prefix match
phrase match SELECT * FROM article WHERE body MATCH 'have
iPhone*'; ◦I have iPhone5s ×I have Android and iPhone5s
ಛघͳQuery SELECT * FROM article WHERE article MATCH 'iPhone*'; all
columns SELECT * FROM article WHERE article MATCH 'title:Apple iPhone*'; specified column
؆୯ʂ
※ͨͩ͠ӳޠʹݶΔ
ຊޠͷ߹ɺ FTSͷTokenizer͕ ͏·͘tokenʹ͚ΒΕͳ͍
FTSͷTokenizer͕ ͏·͘ѻ͑Δܗࣜʹ Ճͯ͠σʔλΛೖΕΑ͏ʂ
ຊޠͷ τʔΫϯׂ
FTSͷςʔϒϧʹ ֯εϖʔεͰ ׂͨ͠σʔλΛೖΕΔ
CREATE TABLE article (id, title, body); CREATE VIRTUAL TABLE articleTokens
USING fts4 (articleId, titleTokens, bodyTokens); ݩσʔλΛೖΕΔςʔϒϧ FTSςʔϒϧ
INSERT͍ͨ͠σʔλ INSERT INTO article(id, title, body) VALUES (1, 'Ξοϓϧ', 'ࢲΞΠϑΥʔϯ4ͱΞΠϑΥʔϯ5sΛ͍࣋ͬͯ·
͢'); ! INSERT INTO articleTokens(articleId, titleTokens, bodyTokens) VALUES (1, 'Ξοϓϧ', 'ࢲ ΞΠϑΥʔϯ4 ͱ ΞΠϑΥʔϯ5s Λ ࣋ͬ ͍ͯ·͢');
͍͛ͨSELECT SELECT * FROM article JOIN ( SELECT articleId FROM
articleTokens WHERE bodyTokens MATCH 'ΞΠϑΥʔϯ*' LIMIT 100; ) AS result ON article.id = result.articleId ORDER BY article.id;
τʔΫϯׂ͢Δ
CFStringTokenizer
෦ͰMeCabΛͬͯ ܗଶૉղੳͯ͠ΔΒ͍͠ http://stackoverflow.com/questions/8280824/how-to-use- cfstringtokenizer-with-chinese-and-japanese
- (NSArray *)tokenArrayWithString:(NSString *)string { NSLocale *locale = [[NSLocale alloc]
initWithLocaleIdentifier:@"ja"]; CFRange range = CFRangeMake(0, CFStringGetLength((CFStringRef)string)); CFStringTokenizerRef tokenizer = CFStringTokenizerCreate(kCFAllocatorDefault, (CFStringRef)string, range, kCFStringTokenizerUnitWordBoundary, (CFLocaleRef)locale); NSMutableArray *tokenArray = [NSMutableArray array]; while(CFStringTokenizerAdvanceToNextToken(tokenizer) != kCFStringTokenizerTokenNone) { CFRange tokenRange = CFStringTokenizerGetCurrentTokenRange(tokenizer); if(range.location != kCFNotFound) { NSString *token = [string substringWithRange:NSMakeRange(tokenRange.location, tokenRange.length)]; [tokenArray addObject:token]; } } CFRelease(tokenizer); return tokenArray; }
INSERT
tokenArrayΛ ֯εϖʔεͰ࿈݁ͯ͠ INSERT͢Δ [tokenArray componentsJoinedByString:@" "];
ݕࡧ
֤Tokenʹ * ΛՃ͠ ֯εϖʔεͰ࿈݁ͯ͠ QueryΛΈཱͯΔ NSMutableArray *searchTokens = [NSMutableArray arrayWithCapacity:tokenArray.count];
for (NSString *token in tokenArray) { [searchTokens addObject:[token stringByAppendingString:@"*"]]; } [searchTokens componentsJoinedByString:@" "];
ݕࡧॱҐ
Ϛον͕ߴ͍ॱʹ ݁ՌΛฦ͍ͨ͠
SQLite ಠࣗͷSQLؔΛ ఆٛͰ͖Δ
rank()ͱ͍͏ SQL͕ؔ ͋ͬͨͱ͢Δͱ…
͍͛ͨSELECT SELECT * FROM article JOIN ( SELECT articleId, rank(matchinfo(articleTokens,
'pcnalx')) AS rank FROM articleTokens WHERE bodyTokens MATCH 'ΞΠϑΥʔϯ*' LIMIT 100; ) AS result ON article.id = result.articleId ORDER BY rank DESC;
match_info() ! ϚονใΛฦ͢ FTSͷؔ http://www.sqlite.org/fts3.html#matchinfo
match_info()͔Β ϚονείΞΛฦ͢ SQLؔΛ࡞Ζ͏ʂ
CݴޠͰ…
ϚονείΞͷ ΞϧΰϦζϜ
Okapi BM25 http://en.wikipedia.org/wiki/Okapi_BM25
sqlite-okapi-bm25 https://github.com/rads/sqlite-okapi-bm25
ंྠͷ࠶ൃ໌ ආ͚ΒΕͨ ※ҙ ୯Ұͷcolumnʹ͔͠ରԠ͍ͯ͠ͳ͍ͷͰ ඞཁͳΒFork͢Δ
σϞ
·ͱΊ
42-JUFͷ'54Λ͏ͱɺߴʹ શจݕࡧͰ͖Δ ຊޠࣄલʹ5PLFOׂ͕ඞཁ ݕࡧॱҐಠࣗ42-ؔΛ࡞ͬͯ ରԠ͢Δ
͓͠·͍