Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
サブセット探索を用いた高速なkNNニューラル機械翻訳
Search
Hiroyuki Deguchi
March 22, 2024
Research
0
110
サブセット探索を用いた高速なkNNニューラル機械翻訳
第8回AAMTセミナー
AAMT若手翻訳研究会
最優秀賞
Hiroyuki Deguchi
March 22, 2024
Tweet
Share
More Decks by Hiroyuki Deguchi
See All by Hiroyuki Deguchi
20250226 NLP colloquium: "SoftMatcha: 10億単語規模コーパス検索のための柔らかくも高速なパターンマッチャー"
de9uch1
0
420
20240820: Minimum Bayes Risk Decoding for High-Quality Text Generation Beyond High-Probability Text
de9uch1
0
240
20240226_AAMT-Japio
de9uch1
0
140
Searching for Needles in a Haystack: On the Role of Incidental Bilingualism in PaLM’s Translation Capability
de9uch1
0
120
Paper Reading: Sampling-Based Approximations to Minimum Bayes Risk Decoding for Neural Machine Translation
de9uch1
0
160
My Research Environmental Setup
de9uch1
0
280
Nearest Neighbor Machine Translation
de9uch1
0
230
Paper Reading - Dynamic Programming Encoding for Subword Segmentation in Neural Machine Translation
de9uch1
0
280
paper reading - Tree Transformer
de9uch1
0
230
Other Decks in Research
See All in Research
定性データ、どう活かす? 〜定性データのための分析基盤、はじめました〜 / How to utilize qualitative data? ~We have launched an analysis platform for qualitative data~
kaminashi
6
970
在庫管理のための機械学習と最適化の融合
mickey_kubo
3
1k
90 分で学ぶ P 対 NP 問題
e869120
16
7.2k
Mathematics in the Age of AI and the 4 Generation University
hachama
0
150
(NULLCON Goa 2025)Windows Keylogger Detection: Targeting Past and Present Keylogging Techniques
asuna_jp
1
480
電力システム最適化入門
mickey_kubo
1
530
データxデジタルマップで拓く ミラノ発・地域共創最前線
mapconcierge4agu
0
150
NLP2025SharedTask翻訳部門
moriokataku
0
280
JSAI NeurIPS 2024 参加報告会(AI アライメント)
akifumi_wachi
5
1k
Batch Processing Algorithm for Elliptic Curve Operations and Its AVX-512 Implementation
herumi
0
170
RapidPen: AIエージェントによるペネトレーションテスト 初期侵入全自動化の研究
laysakura
0
1k
実行環境に中立なWebAssemblyライブマイグレーション機構/techtalk-2025spring
chikuwait
0
190
Featured
See All Featured
The MySQL Ecosystem @ GitHub 2015
samlambert
251
12k
10 Git Anti Patterns You Should be Aware of
lemiorhan
PRO
656
60k
Building a Scalable Design System with Sketch
lauravandoore
462
33k
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
45
7.3k
Building an army of robots
kneath
306
45k
How STYLIGHT went responsive
nonsquared
100
5.6k
Building Flexible Design Systems
yeseniaperezcruz
329
39k
Side Projects
sachag
453
42k
jQuery: Nuts, Bolts and Bling
dougneiner
63
7.8k
[Rails World 2023 - Day 1 Closing Keynote] - The Magic of Rails
eileencodes
34
2.3k
Art, The Web, and Tiny UX
lynnandtonic
298
21k
個人開発の失敗を避けるイケてる考え方 / tips for indie hackers
panda_program
105
19k
Transcript
𝒌
◼ ⚫ ⚫ ◼ ⚫ (Zhang+, NAACL2018; Gu+, AAAI2018; Khandelwal+,
ICLR2021) ▶ (Nagao, 1984) ▶ ⚫ 𝑘 (Khandelwal+, ICLR2021) ▶ ▶ ▶ Guiding Neural Machine Translation with Retrieved Translation Pieces (Zhang+, NAACL2018) Search Engine Guided Neural Machine Translation (Gu+, AAAI2018) Nearest Neighbor Machine Translation (Khandelwal+, ICLR2021) A framework for a mechanical translation between Japanese and English by analogy principle (Nagao, 1984)
◼ ◼ ⚫ ⚫
𝒌 (Khandelwal+, ICLR2021) ◼ ⚫ ⚫ ⚫ ◼ ⚫ ▶
⚫ ▶ ≈ Nearest Neighbor Machine Translation (Khandelwal+, ICLR2021) 𝒙 𝒚
𝒌 (Khandelwal+, ICLR2021) 𝒌𝑖 ∈ ℝ𝐷 𝑓 𝒙, 𝒚<𝑡 ∈
ℝ𝐷 Nearest Neighbor Machine Translation (Khandelwal+, ICLR2021) ◼ 𝑘 ◼ ⚫ ⚫ 𝑝𝑘NN 𝑦𝑡 𝒙, 𝒚<𝑡 ∝ 𝑖=1 𝑘 𝟙𝑦𝑡=𝑣𝑖 exp − 𝒌𝑖 − 𝑓 𝒙, 𝒚<𝑡 2 2 𝜏 ◼ 𝑘
𝒌 ◼ (Martins+, EMNLP2022) ◼ (Meng+, ACLFindings2022) ⚫ 𝑘 𝑘
𝜆 = 0.5 𝑘 = 16 Chunk-based Nearest Neighbor Machine Translation (Martins+, EMNLP2022) Fast Nearest Neighbor Machine Translation (Meng+, ACL Findings2022)
𝒌 ◼ 𝑘 ◼ ⚫ 𝑘 (Matsui+, ACMMM2018) ⚫ 𝑘
𝑘 𝑘 Reconfigurable Inverted Index (Matsui+, ACMMM2018) 𝒌
◼ ⚫ 𝑘 ⚫ 𝑘 ◼ ◼ 𝑘
𝑛 𝑘 1 1 1 1 1 1 1 1
1
𝑛 𝑘 1 1 1 1 1 1 1 1
1
𝑛 𝑘 1 1 1 1 1 1 1 1
1
⚫ ⚫ ⚫ ⚫ ⚫ 𝑘 𝜆 = 0.5 𝑘
= 16 𝑛 = 56
𝑘 𝑘 ◼ 𝑘 ⚫ ▶ ⚫ ▶
◼ 𝑘 𝒌 𝒌
◼ ⚫ 𝑘
𝒌 𝒌 ◼ ⚫ ⚫ ◼ 𝑘 ⚫ ⚫ ◼
⚫
⚫ ⚫ ▶ ⚫ ▶