Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Approximate Nearest Neighbor Negative Contrasti...
Search
Scatter Lab Inc.
August 07, 2020
Research
2.5k
0
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval
Scatter Lab Inc.
August 07, 2020
More Decks by Scatter Lab Inc.
See All by Scatter Lab Inc.
zeta introduction
scatterlab
0
1.9k
SimCLR: A Simple Framework for Contrastive Learning of Visual Representations
scatterlab
0
4.4k
Adversarial Filters of Dataset Biases
scatterlab
0
2.3k
Sparse, Dense, and Attentional Representations for Text Retrieval
scatterlab
0
2.3k
Weight Poisoning Attacks on Pre-trained Models
scatterlab
0
2.2k
Beyond Accuracy: Behavioral Testing of NLP Models with CheckList
scatterlab
0
2.3k
Open-Retrieval Conversational Question Answering
scatterlab
0
2.3k
What Can Neural Networks Reason About?
scatterlab
0
2.3k
Exploring the Limits of Transfer Learning with Unified Text-to-Text Transformer
scatterlab
0
2.3k
Other Decks in Research
See All in Research
Fukui Shibiten 39 - AI Art
butchi
0
120
世界モデルにおける分布外データ対応の方法論
koukyo1994
7
2.2k
通時的な類似度行列に基づく単語の意味変化の分析
rudorudo11
0
310
Φ-Sat-2のAutoEncoderによる情報圧縮系論文
satai
4
780
重要だけど測れていないもの:高齢者ケアの見えない課題
theoriatec2024
0
350
Scalable dynamic origin-destination demand estimation enhanced by high-resolution satellite imagery data
satai
3
270
(SIGQS17) Frasco-VS:フラグメントに基づく薬剤候補化合物選抜の量子アニーリングによる実現
keisukeyanagisawa
PRO
0
110
コーディングエージェントとABNを再考
hf149
2
710
多様なデータを許容し学習し続ける模倣学習 / Advanced Imitation Learning for VLA
prinlab
0
220
2026年3月1日(日)福島「除染土」の公共利用をかんがえる
atsukomasano2026
0
640
IEEE AIxVR 2026 Keynote Talk: "Beyond Visibility: Understanding Scenes and Humans under Challenging Conditions with Diverse Sensing"
miso2024
0
200
さくらインターネット研究所テックトーク2026春、研究開発Gr.25年度成果26年度方針
kikuzo
0
150
Featured
See All Featured
The Impact of AI in SEO - AI Overviews June 2024 Edition
aleyda
5
1.1k
Applied NLP in the Age of Generative AI
inesmontani
PRO
4
2.3k
How to optimise 3,500 product descriptions for ecommerce in one day using ChatGPT
katarinadahlin
PRO
1
3.6k
AI Search: Implications for SEO and How to Move Forward - #ShenzhenSEOConference
aleyda
1
1.3k
Navigating the moral maze — ethical principles for Al-driven product design
skipperchong
2
390
So, you think you're a good person
axbom
PRO
2
2.1k
Why You Should Never Use an ORM
jnunemaker
PRO
61
9.9k
Making the Leap to Tech Lead
cromwellryan
135
9.9k
How People are Using Generative and Agentic AI to Supercharge Their Products, Projects, Services and Value Streams Today
helenjbeal
1
210
Ten Tips & Tricks for a 🌱 transition
stuffmc
0
130
Design and Strategy: How to Deal with People Who Don’t "Get" Design
morganepeng
133
19k
A designer walks into a library…
pauljervisheath
211
24k
Transcript
MLࣁա S6E3 Approximate Nearest Neighbor Negative Contrastive Learning for
Dense Text Retrieval ӣળࢿ ML Research Scientist, Pingpong
ݾର ݾର 1. Introduction 1. ޙઁ 2. ӝઓ ӝߨ
ೠ҅ 2. Approach 1. Ӕ ߑߨ ࣗѐ 2. ࠺زӝ ण ܖ౯ 3. Experiment 1. प ࢸ҅ 2. प Ѿҗ 3. ҳഅ ࣁࠗࢎ೦
• ࠄ ֤ޙীࢲ Ҿӓਵ۽ ಽҊ ೞח ޙઁח Open-Domain Question Answering
(QA) పझ • Open-Domain QAח যڃ بݫੋী Ҵೠغয ঋ ޙਸ ؍ਸ ٸ, ࠁਬೞҊ ח (~1M+) ޙࢲٜ оؘ ನೣغয ח ਸ ח పझ۽ ೡ ࣻ णפ. • ܳ ٜݶ ਤఃೖ٣ইী ઓೞח ݽٚ ޙࢲܳ ଵઑೡ ࣻ ח о ೞী “ఋ֢झח ݻಌࣃ ࢤݺܳ લয?” ী ೠ ਸ ח Ѫ ੑפ. ޙઁ [1/2]
• ٩۞ ӝ߈ ݽ؛ਸ ਊ೧ࢲ ࠁ ഛೠ ਸ ਸ ࣻ
݅, ݽٚ ޙࢲ(+Nর)ী ೧ োਸ ࣻ೯ೞח Ѫ ݒ ࠺ബਯ Ҋ, पदр ࢲ࠺झо ࠛоמೞח ೠ҅ णפ. • ӝઓ োҳٜ ࣘب ೠ҅ਸ ӓࠂೞӝ ਤ೧ ѱ فо stage ۽ ܻ࠙ೞৈ ޙઁܳ ಽҊ ೞणפ • 1. Document Retrieval: য ী ೧ࢲ ҙ۲ ח ޙࢲٜਸ ח ױ҅ • 2. Reading Comprehension: য ী ೠ ҳੋ ਸ ҙ۲ ޙࢲܳ ଵઑೞৈ بೞח ݽ؛ • য়ט ࣗѐ೧ ܾ٘ ֤ޙ Document Retrieval ࢿמ ೱ࢚ী ҙೠ ߑߨਸ ઁউפ. ޙઁ [2/2]
• ӝઓ ࠗ࠙ োҳীࢲח Document Retrieval ী Lexical Feature ܳ
۽ ࢎਊೞणפ. • द) BM25, TF-IDF, Keyword Matching ١١ (Elastic Search ػ ӝמ) • ೞ݅ ۞ೠ ߑߨ ೣ୷ (Semantic)ܳ ೧ೞҊ ҙ۲ػ ߸ਸ ਸ ࣻח হणפ. • द) Q. ־о పठۄ ঠ? -> (పठۄ, ) ਵ۽ Ѩ࢝೧ب ف ఃਕ٘ܳ ನೣೞח ޙࢲܳ ਸ ࣻ হ.. ӝઓ ߑߨ ೠ҅ [1/3]
• ୭Ӕ োҳٜ(Lee et al., 2019; Guu et al., 2020;
Seo et al. 2019) ৬ ޙࢲܳ BERTܳ ਊ೧ Representation ਵ۽ അೞৈ ࠁ Semantic ೠ ࠁܳ ನೡ ࣻ ח ߑߨਸ ઁউೞ. • ۞ೠ ߑߨٜ BI-Encoder ҳઑ ݽ؛ਸ ࢎਊೞݴ, In-Batch Negative ۽ णਸ ࣻ೯פ. • ण ৮ܐػ റীח Document Encoderܳ ਊ೧ࢲ ܻ ޙࢲٜਸ encoding ೧ ֬ • Inference दীח ݅ BERT۽ Representation ਸ ҅ೞҊ FAISS ৬ э Approximate Nearest Neighbor Search ోਸ ਊ೧ ߄۽ Representation җ оө Top-Kѐ ޙࢲܳ ӝઓ ߑߨ ೠ҅ [2/3]
Bi-encoder ޙࢲ
णߑߨ: In-Batch Negative Q1 D1 Q2 D2 Q3 D3 Q4
D4 ण ؘఠࣇ
णߑߨ: In-Batch Negative Q1 D1 Q2 D2 Q3 D3 Q4
D4 ण ؘఠࣇ Q: (4, 512) D: (4, 512)
णߑߨ: In-Batch Negative Q1 D1 Q2 D2 Q3 D3 Q4
D4 ण ؘఠࣇ Q: (4, 512) D: (4, 512) Q ⋅ DT -> (4,4)
णߑߨ: In-Batch Negative Q1 Q2 Q3 Q4 D1 D2 D3
D4 Q1 D1 Q2 D2 Q3 D3 Q4 D4 ण ؘఠࣇ Q: (4, 512) D: (4, 512) Q ⋅ DT -> (4,4)
णߑߨ: In-Batch Negative Q1 Q2 Q3 Q4 D1 D2 D3
D4 Q1 D1 Q2 D2 Q3 D3 Q4 D4 ण ؘఠࣇ Q: (4, 512) D: (4, 512) 0.5 0.6 0.4 0.7 0.2 0.1 0.2 0.1 0.2 0.1 0.3 0.1 0.2 0.1 0.1 0.1 Softmax Q ⋅ DT п Row ߹۽ Softmaxܳ ஂೣ -> (4,4)
णߑߨ: In-Batch Negative Q1 Q2 Q3 Q4 D1 D2 D3
D4 Q1 D1 Q2 D2 Q3 D3 Q4 D4 ण ؘఠࣇ Q: (4, 512) D: (4, 512) 0.99 0.99 0.01 0.99 0.99 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 Q ⋅ DT ण ݾ: п Row ীࢲ غח ޙࢲо ઁੌ ֫ чਸ ыب۾ -> (4,4)
• ח Dense Retrieval ݽ؛ਸ णೡ ٸ ࢎਊೞח In-Batch Negativeী
ޙઁо ਸ פ. • In-Batch Negative ण ߑߨ যוب ਬࢎೠ ޙࢲٜਸ ୶ܻחؘীח ਬബೞ݅, ҙ۲ ח ޙࢲܳ ഛೞѱ ఐ࢝ೞӝীח Ӕࠄੋ ೠ҅о ਸ Ѫۄח оࢸਸ ࣁפ. • ৵ջೞݶ ৮ ҙ۲ হח റࠁٜ ী, ҙ۲ ח ೞա ޙࢲܳ ࡳب۾ णೞח Ѫҗ ҙ۲ࢿ ח റࠁٜ ীࢲ ҙ۲ ח ೞա ޙࢲܳ ࡳب۾ णೞח Ѫ ܰӝ ٸޙੑפ. ӝઓ ߑߨ ೠ҅ [2/3]
• negative sample ٜ representation ਸ t-SNEਵ۽ दпചೞৈ ࠙ࢳਸ ࣻ೯ೞणפ.
• ӝઓী ۽ ࢎਊೞ؍ Random, BM25 ӝ߈ Negative ٜ पઁ Relevant Document ৬ ࠙ನ ରо ब೮ • ژೠ Random Negative ۽ णػ ݽ؛۽ Dense Retrieval ਸ ࣻ೯द, पઁ ҙ۲ ޙࢲٜਸ நೞ ޅ೮. ӝઓ ߑߨ ೠ҅ [2/3]
• negative sample ٜ representation ਸ t-SNEਵ۽ दпചೞৈ ࠙ࢳਸ ࣻ೯ೞणפ.
• ӝઓী ۽ ࢎਊೞ؍ Random, BM25 ӝ߈ Negative ٜ पઁ Relevant Document ৬ ࠙ನ ରо ब೮ • ژೠ Random Negative ۽ णػ ݽ؛۽ Dense Retrieval ਸ ࣻ೯द, पઁ ҙ۲ ޙࢲٜਸ நೞ ޅ೮. ӝઓ ߑߨ ೠ҅ [2/3] “ উীࢲ ޤо ҙ۲ ޙࢲջ!” ೠ Ѫب णਸ ࣻ೯೧ঠ ೠ!
• ࠄ ֤ޙীࢲח णद ࢎਊغח negative sampleਸ ࡳח ࢜۽ ߑߨਸ
ઁউפ • Approximate nearest neighbor Negative Contrastive Estimation(ANCE) • ण р ݽ؛ retrieval ػ Ѿҗܳ ਊ೧ࢲ য۰ negative sampleਸ ݅٘ח ߑߨੑפ. • ࠺زӝਵ۽ faiss index ܳ N step ݃ সؘೞҊ, negative sample ਸ ࣘਵ۽ јनפ Approach
Approach
• ಣо పझ TREC 2019 Deep Learning Track ܳ ࢎਊೞणפ.
• Ѩ࢝ ূ Bing ਵ۽ ٜযৡ ߔ݅ѐ ࢚ ী ೧ࢲ ҙ۲ػ ޙࢲо ۨ࠶݂ غয ח ؘఠࣇ • ؘఠࣇਸ ࢶఖೠ ਬ۽ Ҋ, ୭नҊ, о അपੋ ࢚ടਸ ੜ ߈೮ӝ ⮶ޙী ࢎਊ೮Ҋ ח ӝࣿೞणפ. • ಣо ݫܼ MRRҗ Recall@1k, NDCGܳ ࢎਊೞणפ. • ࠗ࠙ ࢿמ Retrieval ী ೠ ࢿמਸ ஏೞҊ, ୶оਵ۽ য 100ѐ candidate ղীࢲ DR ݽ؛ਸ ਊ೧ ҙ۲ػ ޙࢲٜਸ Rerank ೞח מ۱ب э Ѩૐೞणפ. (ীࢲ RerankۄҊ ա৬ ח ࠗ࠙) • DPRҗ زੌೞѱ, بݫੋ ઁೠ হח QAؘఠࣇੋ OpenQA task ؘఠࣇਵ۽ب ಣоܳ ࣻ೯ೞणפ. ಣо ߑध Top-Nউী पઁ۽ ܻо ఋѶ ೞח passage о ನೣغয ח ইצ ಣоೞח ݫܼਸ ࢎਊೞणפ Experiment
Experiment
• ӝઓ ߑߨ BM25۽ Document Retrieval ࣻ೯റ, BERT ۽ Reranking
ೞח Two-Stage ߑߨਸ ࢎਊೞणפ • Inference दী ୨ 1.42 ୡ Ѧ۷णפ. • ߈ݶী ࠄ ֤ޙ ANN ӝ߈ Dense Retrieval ਸ ࢎਊ೮ӝ ٸޙী ࠁ ࡅܲ ࣘب Inference о оמפ. -> Inference दী 11.6ms ߆ী Ѧܻ ঋ. Ӓۢীب Two-Stage ࠁ ֫ ࢿמਸ ࠁৈષ Experiment
• Dense Retrievalਸ In-Batch Negative ߑधਵ۽݅ ण ೞח Ѫ ೠ҅
࠙ݺ ઓೠ • റࠁٜ р ࢶࣽਤܳ Ѿೞח מ۱ ࠗೞ. • ण җীࢲ ഁтܻח റࠁ ޙࢲٜ աৢ Ѫਸ о೧ࢲ, о оӰب۾ णਸ ೧ঠ ೠ. • ܳ ਤ೧ࢲ ण җীࢲ ୶ۿҗ زੌೞѱ ANN indexing ਸ ࣻ೯ೞҊ, negative ٜਸ retrieval۽ ࡳ ח ߑߨਸ ઁউೠ. ӒܻҊ ܳ ࠺زӝਵ۽ ࣻ೯ೞৈࢲ োࣘੋ णਸ ೡ ࣻ ب۾ ೠ • प Ѿҗ ઁউೞח ण ߑध पઁ పझীࢲ ࠁ ࣻೠ ࢿਸ ࠁৈ. • Ѩ࢝ Retrieval పझ৬, Open-Domain QAীࢲ Document Retrieval ࢿמਸ ಣоೞ Conclusion
• https://codertimo.github.io/2020/07/20/ANN-negative-contrastive-learning/ ଵҊܐ