Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Missspell Detection
Search
bk
February 10, 2020
Science
170
1
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
Missspell Detection
bk
February 10, 2020
More Decks by bk
See All by bk
Befriending Kurtosis with R
bk_18
1
1k
tidy_rpart
bk_18
1
1.7k
dotdotdot_in_predict_function
bk_18
1
1.1k
Introduction_of_GoogleAnalytics_with_R
bk_18
2
1k
web scraping with polite package
bk_18
2
830
start-salesforce-with-r
bk_18
0
910
About Missing Values
bk_18
1
400
Other Decks in Science
See All in Science
Kaggle: NeurIPS - Open Polymer Prediction 2025 コンペ 反省会
calpis10000
0
600
Van Dare naar Durf
voginip
0
240
機械学習 - K近傍法 & 機械学習のお作法
trycycle
PRO
1
1.5k
次代のデータサイエンティストへ~スキルチェックリスト、タスクリスト更新~
datascientistsociety
PRO
3
44k
機械学習 - pandas入門
trycycle
PRO
0
620
ITTF卓球世界ランキングのポイント比を用いた試合結果予測モデルの性能評価 / Performance evaluation of match result prediction models using the point ratio of the ITTF Table Tennis World Ranking
konakalab
0
130
フィードフォワードニューラルネットワークを用いた記号入出力制御系に対する制御器設計 / Controller Design for Augmented Systems with Symbolic Inputs and Outputs Using Feedforward Neural Network
konakalab
0
140
20260410_SystemsThinking
takusamar
1
100
「遂行理論の未来」(松島斉教授最終講義記念セッションの発表資料)
shunyanoda
0
920
Cross-Media Technologies, Information Science and Human-Information Interaction
signer
PRO
3
32k
[NLP2026 参加報告会] AI for Science まとめ / NLP2026
lychee1223
0
1.9k
(メタ)科学コミュニケーターからみたAI for Scienceの同床異夢
rmaruy
0
250
Featured
See All Featured
The AI Revolution Will Not Be Monopolized: How open-source beats economies of scale, even for LLMs
inesmontani
PRO
3
3.5k
Product Roadmaps are Hard
iamctodd
PRO
55
12k
How to audit for AI Accessibility on your Front & Back End
davetheseo
0
430
HTML-Aware ERB: The Path to Reactive Rendering @ RubyCon 2026, Rimini, Italy
marcoroth
1
200
Fashionably flexible responsive web design (full day workshop)
malarkey
408
66k
Docker and Python
trallard
47
3.9k
A Tale of Four Properties
chriscoyier
163
24k
Being A Developer After 40
akosma
91
590k
Navigating the Design Leadership Dip - Product Design Week Design Leaders+ Conference 2024
apolaine
1
350
Designing for Timeless Needs
cassininazir
1
260
Organizational Design Perspectives: An Ontology of Organizational Design Elements
kimpetersen
PRO
1
750
Measuring Dark Social's Impact On Conversion and Attribution
stephenakadiri
2
220
Transcript
ฤूڑʹΑΔจࣈྻޡදهݕ ϨʔϕϯγϡλΠϯڑͱδϟϩɾΟϯΫϥʔڑ
࣍ 1. ՝……………………………………p.3-10 2. ࡞ͬͨͷ……………………………p.11-16 3. ฤूڑ………………………………p.17-39 4. ݁Ռ……………………………………p.40-41 5.·ͱΊ…………………………………p.42
6.ࢀߟจݙ………………………………p.43
՝
՝ ϒϥϯυͷࡏݿ
՝ flea ख࡞ۀͰग़
՝ flea
՝ GUCCI Tote Bag Black Leather flea ग़লྗԽ
՝ GUCCHI Tote Bag Black Leather flea
՝ GUCCHI Tote Bag Black Leather flea • ग़औΓফ͠ •
ग़ऀධՁԼ • ΞΧϯτఀࢭ ϒϥϯυ໊ޡදهͷ ϖφϧςΟ
՝ AIͰͳΜͱ͔ͯ͠ Python ࣗવݴޠॲཧ
࡞ͬͨͷ
࡞ͬͨͷ ग़λΠτϧϦετ GUCCHI Tote Bag Black Leather ɾɾɾ ɾɾɾ ɾɾɾ
ɾɾɾ
ग़λΠτϧϦετ GUCCHI Tote Bag Black Leather ɾɾɾ ɾɾɾ ɾɾɾ ɾɾɾ
୯ޠʹղ ग़୯ޠϦετ GUCCHI Tote Bag Black Leather ࡞ͬͨͷ
ग़λΠτϧϦετ GUCCHI Tote Bag Black Leather ɾɾɾ ɾɾɾ ɾɾɾ ɾɾɾ
୯ޠʹղ ग़୯ޠϦετ GUCCHI Tote Bag Black Leather ਖ਼ϒϥϯυ໊Ϧετ GUCCI VUITTON ɾɾɾ ɾɾɾ ɾɾɾ ࡞ͬͨͷ
ग़λΠτϧϦετ GUCCHI Tote Bag Black Leather ɾɾɾ ɾɾɾ ɾɾɾ ɾɾɾ
୯ޠʹղ ग़୯ޠϦετ GUCCHI Tote Bag Black Leather ਖ਼ϒϥϯυ໊Ϧετ GUCCI VUITTON ɾɾɾ ɾɾɾ ɾɾɾ ૯ͨΓ ࣅͨ୯ޠΛग़ྗ ࡞ͬͨͷ
ग़λΠτϧϦετ GUCCHI Tote Bag Black Leather ɾɾɾ ɾɾɾ ɾɾɾ ɾɾɾ
୯ޠʹղ ग़୯ޠϦετ GUCCHI Tote Bag Black Leather ਖ਼ϒϥϯυ໊Ϧετ GUCCI VUITTON ɾɾɾ ɾɾɾ ɾɾɾ ૯ͨΓ ࣅͨ୯ޠΛग़ྗ ࡞ͬͨͷ
ฤूڑ
ฤूڑ 1. ϨʔϕϯγϡλΠϯڑ (Levenshtein Distance) 2. δϟϩɾΟϯΫϥʔڑ (Jaro-Winkler Distance) GUCCHI
GUCCI
ฤूڑ 1. ϨʔϕϯγϡλΠϯڑ (Levenshtein Distance) 2. δϟϩɾΟϯΫϥʔڑ (Jaro-Winkler Distance) GUCCHI
GUCCI 1. ϨʔϕϯγϡλΠϯڑ (Levenshtein Distance)
ฤूڑʢϨʔϕϯγϡλΠϯڑʣ ͋Δจࣈྻ ൺֱ͢Δจࣈྻ จࣈΛૢ࡞ͯ͠Ұகͤ͞Δ
͋Δจࣈྻ ൺֱ͢Δจࣈྻ จࣈΛૢ࡞ͯ͠Ұகͤ͞Δ ૢ࡞ ஔ আ ૠೖ ૢ࡞ճ=ڑ ฤूڑʢϨʔϕϯγϡλΠϯڑʣ
ஔ ݩͷจࣈྻ G U T T I ൺֱ͢Δจࣈྻ G U
C C I ஔ ૢ࡞ճ = ڑ = 2 ฤूڑʢϨʔϕϯγϡλΠϯڑʣ
ஔ আ ૠೖ GUTTI GUCCI GUCCHI GUCCI GUCI GUCCI ฤूճʢڑʣ
2 1 1 ݩͷจࣈྻ ൺֱ͢Δจࣈྻ ฤूํ๏ ฤूڑʢϨʔϕϯγϡλΠϯڑʣ
ฤूڑ 1. ϨʔϕϯγϡλΠϯڑ (Levenshtein Distance) 2. δϟϩɾΟϯΫϥʔڑ (Jaro-Winkler Distance) GUCCHI
GUCCI
Dj = 1 3 * ( m |s1 | +
m |s2 | + m − t 2 m ) s1, s2 ɿจࣈྻͷ͞ mɿ۠ؒͷҰகจࣈ tɿҰகจࣈͷஔ δϟϩڑɿ จࣈྻͷ෦తͳҰக߹͍ΛଌΔ ͕େ͖͍ํ͕ڑ͕͍ۙ ฤूڑʢδϟϩɾΟϯΫϥʔڑʣ
Dj = 1 3 * ( m |s1 | +
m |s2 | + m − t 2 m ) m m m m s1, s2 ɿจࣈྻͷ͞ mɿ۠ؒͷҰகจࣈ tɿҰகจࣈͷஔ δϟϩڑɿ จࣈྻͷ෦తͳҰக߹͍ΛଌΔ ͕େ͖͍ํ͕ڑ͕͍ۙ ฤूڑʢδϟϩɾΟϯΫϥʔڑʣ
mɿ۠ؒͷҰகจࣈ max(|s1 |, |s2 |) 2 − 1 ݩͷจࣈྻɿGCCUHI →
6 ൺֱ͢ΔจࣈྻɿGUCCI → 5 max(6,5) 2 − 1 = 2 ฤूڑʢδϟϩɾΟϯΫϥʔڑʣ
mɿ۠ؒͷҰகจࣈ ݩͷจࣈྻ G C C U H I ൺֱ͢Δจࣈྻ G
U C C I ۠ؒͰҰகจࣈΛݕࡧ Ұகจࣈ͕͋ΕΧϯτ ฤूڑʢδϟϩɾΟϯΫϥʔڑʣ
mɿ۠ؒͷҰகจࣈ ݩͷจࣈྻ G C C U H I ൺֱ͢Δจࣈྻ G
U C C I m = 5 ฤूڑʢδϟϩɾΟϯΫϥʔڑʣ
Dj = 1 3 * ( m |s1 | +
m |s2 | + m − t 2 m ) t s1, s2 ɿจࣈྻͷ͞ mɿ۠ؒͷҰகจࣈ tɿҰகจࣈͷஔ จࣈྻͷ෦తͳҰக߹͍ΛଌΔ ͕େ͖͍ํ͕ڑ͕͍ۙ ฤूڑʢδϟϩɾΟϯΫϥʔڑʣ
tɿҰகจࣈͷஔ ݩͷจࣈྻ G C C U H I ൺֱ͢Δจࣈྻ G
U C C I Ұகͨ͠จࣈΛநग़ ݩͷจࣈྻ G C C U I ൺֱ͢Δจࣈྻ G U C C I ฤूڑʢδϟϩɾΟϯΫϥʔڑʣ
tɿҰகจࣈͷஔ ݩͷจࣈྻ G C C U I ൺֱ͢Δจࣈྻ G U
C C I t = 2 ฤूڑʢδϟϩɾΟϯΫϥʔڑʣ ಉҰͷจࣈྻʹ͢ΔҝʹԿจࣈஔ͢Δͷ͔
Dj = 1 3 * ( m |s1 | +
m |s2 | + m − t 2 m ) s1, s2 ɿจࣈྻͷ͞ mɿ۠ؒͷҰகจࣈ tɿҰகจࣈͷஔ = 1 3 * ( 5 6 + 5 5 + 5 − 2 2 5 ) = 79 90 = 0.8777... ฤूڑʢδϟϩɾΟϯΫϥʔڑʣ
Djw = Dj + l * 1 10 * (1
− Dj ) Dj ɿJaro Distance lɿઌ಄͔ΒͷҰகจࣈʢl <= 4ʣ δϟϩɾΟϯΫϥʔڑɿ ઌ಄จࣈͷҰகॏΈΛ͚ͭͯධՁ ฤूڑʢδϟϩɾΟϯΫϥʔڑʣ
Djw = Dj + l * 1 10 * (1
− Dj ) Dj ɿJaro Distance lɿઌ಄͔ΒͷҰகจࣈʢl <= 4ʣ l δϟϩɾΟϯΫϥʔڑɿ ઌ಄จࣈͷҰகॏΈΛ͚ͭͯධՁ ฤूڑʢδϟϩɾΟϯΫϥʔڑʣ
lɿઌ಄͔ΒͷҰகจࣈʢl <= 4ʣ ݩͷจࣈྻ G C C U H I
ൺֱ͢Δจࣈྻ G U C C I l = 1 ฤूڑʢδϟϩɾΟϯΫϥʔڑʣ
Djw = Dj + l * 1 10 * (1
− Dj ) Dj ɿJaro Distance lɿઌ಄͔ΒͷҰகจࣈʢl <= 4ʣ = 79 90 + 1 * 1 10 * (1 − 79 90 ) = 801 900 = 0.89 ฤूڑʢδϟϩɾΟϯΫϥʔڑʣ
* https://github.com/ztane/python-Levenshtein/ **https://github.com/nap/jaro-winkler-distance Levenshteinɿখ͍͞΄Ͳ͍ۙ Jaro-Winklerɿେ͖͍΄Ͳ͍ۙ ݩͷจࣈྻ ൺֱ͢Δ จࣈྻ *Levenshtein **Jaro-Winkler
GUCCHI GUCCI 1 0.97 GUTTI 2 0.79 GCCUHI 3 0.89 άον༟ࡾ 5 0.00 ฤूڑ
ݩͷจࣈྻ ൺֱ͢Δ จࣈྻ *Levenshtein **Jaro-Winkler GUCCHI GUCCI 1 0.97 GUTTI
2 0.79 GCCUHI 3 0.89 άον༟ࡾ 5 0.00 Jaro-WinklerҰக͢Δจࣈ͕ ଘࡏ͍ͯ͠Δ͜ͱΛධՁ͍ͯ͠Δɻ LevenshteinͱJaro-WinklerͰ ۙ͞ͷॱং͕ҟͳΔɻ ฤूڑ * https://github.com/ztane/python-Levenshtein/ **https://github.com/nap/jaro-winkler-distance
݁Ռ
.py ͳΜ͔ಈ͍ͯΔ͔Βྑ͠ ݁Ռ * https://github.com/bk-18/Misspelled-Brand-Name-Detector
·ͱΊ • ग़࣌ͷϒϥϯυ໊ޡදهͱ͍͏՝ • Ϧετ૯ͨΓʹΑΔޡදهݕ • ϨʔϕϯγϡλΠϯڑ • δϟϩɾΟϯΫϥʔڑ
ࢀߟจݙ • ̎ͭͷจࣈྻͷྨࣅΛԽɹϨʔϕϯγϡλΠϯڑͱδϟϩɾΟ ϯΫϥʔڑͷղઆ, ਓೳͰ͋ͦͿ, http://nkdkccmbr.hateblo.jp/entry/ 2016/08/18/102727 • ฤूڑ (Levenshtein
Distance), naoyaͷͯͳμΠΞϦʔ, https:// naoya-2.hatenadiary.org/entry/20090329/1238307757 • จࣈྻྨࣅධՁ ϨʔϕϯγϡλΠϯڑ / δϟϩɾΟϯΫϥʔڑ, ਓೳͯ͠ΈΔ, http://grahamian.hatenablog.com/entry/word_similarity • Yaoshu Wang(B) , Jianbin Qin, and Wei Wang,: Efficient Approximate Entity Matching Using Jaro-Winkler Distance, Univeristy of New South Wales, http://qinjianbin.com/files/wise2017-wang.pdf
ENJOY! ENJAY! EMJOY! ENJOI! ENZYOI!