Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Understanding Back-Translation at Scale
Search
ysasano
February 12, 2019
Technology
5
2.8k
Understanding Back-Translation at Scale
機械翻訳のデータ拡大手法の一つである逆翻訳について、大量データで評価するとどうなるか検証した論文を紹介します。
ysasano
February 12, 2019
Tweet
Share
Other Decks in Technology
See All in Technology
iPadOS18でフローティングタブバーを解除してみた
sansantech
PRO
1
140
生成AI × 旅行 LLMを活用した旅行プラン生成・チャットボット
kominet_ava
0
160
あなたの人生も変わるかも?AWS認定2つで始まったウソみたいな話
iwamot
3
860
Bring Your Own Container: When Containers Turn the Key to EDR Bypass/byoc-avtokyo2024
tkmru
0
860
2024AWSで個人的にアツかったアップデート
nagisa53
1
110
深層学習と3Dキャプチャ・3Dモデル生成(土木学会応用力学委員会 応用数理・AIセミナー)
pfn
PRO
0
460
AIアプリケーション開発でAzure AI Searchを使いこなすためには
isidaitc
1
120
comilioとCloudflare、そして未来へと向けて
oliver_diary
6
450
0→1事業こそPMは営業すべし / pmconf #落選お披露目 / PM should do sales in zero to one
roki_n_
PRO
1
1.5k
Amazon Route 53, 待ちに待った TLSAレコードのサポート開始
kenichinakamura
0
170
【JAWS-UG大阪 reInvent reCap LT大会 サンバが始まったら強制終了】“1分”で初めてのソロ参戦reInventを数字で振り返りながら反省する
ttelltte
0
140
dbtを中心にして組織のアジリティとガバナンスのトレードオンを考えてみた
gappy50
0
280
Featured
See All Featured
[Rails World 2023 - Day 1 Closing Keynote] - The Magic of Rails
eileencodes
33
2k
The Power of CSS Pseudo Elements
geoffreycrofte
74
5.4k
Fight the Zombie Pattern Library - RWD Summit 2016
marcelosomers
232
17k
Into the Great Unknown - MozCon
thekraken
34
1.6k
The World Runs on Bad Software
bkeepers
PRO
66
11k
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
226
22k
[RailsConf 2023] Rails as a piece of cake
palkan
53
5.1k
Design and Strategy: How to Deal with People Who Don’t "Get" Design
morganepeng
127
18k
Refactoring Trust on Your Teams (GOTO; Chicago 2020)
rmw
33
2.7k
Sharpening the Axe: The Primacy of Toolmaking
bcantrill
38
1.9k
Rebuilding a faster, lazier Slack
samanthasiow
79
8.8k
Designing on Purpose - Digital PM Summit 2013
jponch
116
7.1k
Transcript
Understanding Back-Translation at Scale Yasumasa Sasano (@SquirrelYellow) ٯ༁จͷσʔλΛಡΉ Edunov et
al. 2018ˏEMNLP 2018
Back-Translation = BT ͱԿ͔ 5BSHFU จষσʔλ 4PVSDF จষσʔλ ֶश ٯ༁Ϟσϧ
BT https://qiita.com/tkmaroon/items/4b8f469db1534d5e265b ͪ͜ΒͷهࣄͷදݱΛआΓ·ͨ͠ (1) ຊ໋ͱٯํͷ༁ϞσϧΛֶश(ӳͳΒӳ)
5BSHFU จষσʔλ 4PVSDF จষσʔλ 5BSHFU ୯ݴޠσʔλ 4PVSDF ߹ 4ZOUIFUJD
୯ݴޠσʔλ ਪ ٯ༁Ϟσϧ BT Back-Translation = BT ͱԿ͔ (2) BTΛͬͯσʔλΛ૿͢
5BSHFU จষσʔλ 4PVSDF จষσʔλ ຊ໋Ϟσϧ 5BSHFU ୯ݴޠσʔλ 4PVSDF ߹ 4ZOUIFUJD
୯ݴޠσʔλ ֶश Back-Translation = BT ͱԿ͔ (3) ૿ͨ͠σʔλͰֶश จʹॻ͍ͯͳ͍͕ɺΘ͟Θ͟ʮٯʯ༁͢Δͷ ਖ਼͍͠จষΛڭࢣʹ࠷దԽ͍ͨ͠ͱ͍͏͜ͱͩͱߟ͑Δ
BTͰେ෯ਫ਼UPͱʹ http://deeplearning.hatenablog.com/entry/back_translation
͜ͷจΛબΜͩಈػ ࣮৽ख๏ͷఏҊจͰͳ͍ طଘͷॾख๏ΛେྔσʔλͰධՁ͢ΔͱͲ͏ͳΔ͔ݕূ at Scale σʔλ֦େʹର͢ΔݕূσʔλΛಡΜͰ͍ٞͨ͠ BTҰछͷσʔλ֦େ - ࣄͷ্ؔɺࠓ͋ΔσʔλΛϑϧʹ׆͔͢ಈػ͕͋Δ -
ͲΜͳσʔλ֦େ͕༗ޮ͔ղ໌͞Ε͍ͯͳ͍෦͕ଟ͍ͷͰڵຯ͕͋Δ ͷ͕ಈػ
ฆΕ͕ͳ͍Α͏ʹ ΤϏσϯε จͷओு ݸਓͷॴײ ؾʹͳΔϙΠϯτ
Synthetic data generation method #5Ͱ࡞Δ߹σʔλʹ͍ͭͯ
߹σʔλͷ࡞ΓํʹΑΔҧ͍ΛධՁ Greedy Search ෩अ ෩अ פ͍ פ͍ ࠓ ͷ ෩अ
פ͍ ࡢ Beam Search ArgmaxΛ͏ͱ༁จͷଟ༷ੑ͕ͳ͘ͳͬͯ·͍ͣ ࠓ ͷ ෩अ פ͍ ࡢ εςοϓຖʹҐΛ ֬ఆͯ࣍͠ͷ୯ޠ ௨͠Ͱߴ֬ͷΛબ શ୳ࡧແཧͳͷͰ Beam ༗ݶ෯ Ͱ୳ࡧ 1Ґ લޙ݅1Ґ Greedy Search Beam Search Top 10 Sampling Beam + Noise Argmax Noised Middle ୯ޠ ֬ (ιʔτࡁ)
߹σʔλͷ࡞ΓํʹΑΔҧ͍ΛධՁ Top 10 ηʔλʔ פ͍ פ͍ ࠓ ͷ ෩अ פ͍
ࡢ Beam + Noise Sampling ྫྷଂݿ ϥϯμϜαϯϓϦϯά 1Ґ͔Β10ҐݶఆͰϥϯμϜαϯϓϦϯά ࠓ פ͍ ࠓ פ͍ ࠓ פ͍ ࠓ פ͍ BLANK ม͕͑ͯࠩͳ͍ p=0.1 p=0.1 uniform+maxҠಈ3 k=5, 10, 20, 50Ͱࢼ͕ͨ͠ɺ Otto et al. 2018a ʹΑΔͱෆ֬ఆੑ͕ ͔ͳΓେ͖͘มͳ ୯ޠΛग़͢Մೳੑ͕େ͖͍ ॳग़Imamura et al. 2018 (NICT) ڭࢣͳֶ͠शख๏ͰఏҊ Lample et al. 2018a ෩अ ෩अ ୯ޠ ֬ (ιʔτࡁ) ੜจʹଟ༷ੑΛ࣋ͨͤΔ͜ͱ͕Ͱ͖Δ จষੜٕ๏ͱͯ͠ݹ͘ɺ Graves et al. 2003ͳͲͰΘΕ͍ͯΔ
߹σʔλͷ࡞ΓํʹΑΔҧ͍ΛධՁ samplingbeam+noiseɺbeamgreedyΑΓ1.7-2.0 BLEUੑೳ͕ྑ͍ top10beamgreedyΑΓྑ͍͕samplingbeam+noiseΑΓѱ͍ samplingbeam+noise.ͷ࣌ʹbeamͷഒۙ͘ੑೳվળ͍ͯ͠Δ
ੜ͞Εͨจষͷੳ Greedy searchBeam searchଟ༷ͰϦονͳσʔλΛΊΔ Ott et al.2018aͷ จʹΑΔͱසޠ͕ग़ͳ͘ͳΔʹ͋Δ ͷͰSamplingख๏͕Α͍ denoising
autoencodersͱͷྨࣅੑ samplingbeam+noiseͰग़དྷ্͕ͬͨจݱ࣮Ε͍ͯ͠Δ͕ɺzஔzzॱংมߋzͱ ͍͏ݱී௨ʹى͖ΔͷͰͦ͏͍ͬͨॲཧΛೖΕΔͱϩόετʹͳΔ ࣍ͷ୯ޠ͕༧ଌͰ͖ͳ͍ͨΊɺқ͕Ҿ্͖͕ͬͯਫ਼্͕͕Δ
ੜ͞Εͨจষͷੳ ໌Β͔ʹ͓͔͍͠୯ޠ͕ೖΔͷzہॴతzͩͱΘ͔Δ ԾઆͲΜͳϊΠζ୯ޠ͕དྷͯͳ͍Α͏ɺͬͨਖ਼ৗ෦ͷ൚Խੑೳ্͕ͨ͠ʁ 0, /( ڐ༰Ͱ͖Δ୯ޠΛ੨ɺ໌Β͔ʹ͓͔͍͠୯ޠΛͰృͬͯΈΔͱɺ ʮہॴతͳϊΠζʯʹΑΔ൚Խੑೳ্ ࣭ʹؔΘΒͣଟ༷ੑ͕૿͔͑ͨΒ0,ͱ͍͏ղऍͰ͖ͳ͘ͳ͍͕ɺ ͦΕʹͯ͠ਫ਼্͕Γ͗͢Ͱʁͱ͍͏͜ͱͰ͏গ͠۷ΓԼ͍͛ͨ (ݸਓతߟ)
(ݸਓతߟͷଓ͖) ݘ͕͖Ͱ͢ ΫτΡϧϑਆ͕͖Ͱ͢ I like dog I am scared of
Cthulhu ہॴతϊΠζΛ༩ ଟ͘ͷࣗવݴޠॲཧͷϞσϧ গ͠ม͑Δ͚ͩͰ؆୯ʹὃͤΔಛੑ͕͋Δ Deep Text Classification Can be Fooled Liang et al. 2016 ༁ ະֶशͷσʔλ ޡࠩٯ ͜ͷʹରԠ͢Δଧͪख ʹͳ͍ͬͯΔՄೳੑ ԾʹΫτΡϧϑ͕ປࢺͰ ʮ͖ʯʮlikeʯ (ϊΠζ෦ʹޡࠩΛ͢ΔͷᘳʹແବͳͷͰվળͰ͖Δ͔)
Low Resource & High Resource #5ͷݩखͱͳΔର༁Ϧιʔεྔͷҧ͍ʹ͍ͭͯ
5BSHFU 4PVSDF ຊ໋Ϟσϧ 5BSHFU ୯ݴޠσʔλ 4PVSDF ߹ 4ZOUIFUJD ୯ݴޠσʔλ
ֶश ݩख͕গͳ͍ͱԿ͕ى͜Δ͔ ͜͜ͷྔ͕গͳ͍(80Kจఔ) จݿຊ͘Β͍ (112ສࣈ, 80ࣈ/จ)
ݩख͕গͳ͍ͱԿ͕ى͜Δ͔ 80KจͰsamplingbeam searchͷٯసݱ͕ى͖͍ͯΔ σʔλ͕ଟ͚Εଟ͍΄Ͳsampling͕ڧ͘ͳΔ ݩख͕গͳ͍߹ɺBTͷਫ਼͕ߴ͘ͳ͍ͷͰɺsamplingͰϊΠζΛՃ͑ͨͱ͖ͷѱӨ ڹʹ੬͘ͳΔ BTͷਫ਼ͷҾ্͖͕͛ඞཁ
ݩख͕গͳ͍ͷܰݮ 5BSHFU 4PVSDF &ODPEFS %FDPEFS 4PVSDF 4PVSDF 5BSHFU 5BSHFU 4PVSDFݴޠϞσϧ
5BSHFUݴޠϞσϧ సҠֶशorॏΈڞ༗ సҠֶशorॏΈڞ༗ (1) ୯ݴޠͰݴޠϞσϧΛ࡞ͬͯసҠֶश ʮݴޠϞσϧͷసҠ͕ࠔʯͱ͍͏͕Devlin et al. 2018 (BERT)Ͱղফ͞ΕͨͷͰਐల͋Δ͔
͍ͭͷؒʹ͔ͷ͍͢͝จ͕ൃද͞Ε͍ͯͨ ࢀߟจ: Lample et al. 2019 (XLM) #&35ΛసҠֶशɺ༁Λ&ODPEFS%FDPEFSͷܗͰͳ͘ҰͭͷݴޠϞσϧ ͱֶͯ͠श͠ɺ8.5`ಠӳ༁ͷڭࢣͳֶ͠शͷ405"Λ#-&6ߋ৽ BSYJWTVCNJU
ݩख͕গͳ͍ͷܰݮ (2) ରֶश (Dual Learning) ຊ໋Ϟσϧ 5BSHFU ୯ݴޠσʔλ 4PVSDF ୯ݴޠσʔλ
lରzϞσϧ ର༁Ͱͳͯ͘OK
Domain of synthetic data ߹σʔλͷυϝΠϯʹؔ͢Δݕূ
υϝΠϯదԠ 5BSHFU จষσʔλ 4PVSDF จষσʔλ ຊ໋Ϟσϧ χϡʔε 5BSHFU ୯ݴޠσʔλ χϡʔε
4PVSDF ߹ 4ZOUIFUJD ୯ݴޠσʔλ ֶश χϡʔεͷର༁σʔλ͕ͳͯ͘χϡʔεʹڧ͘ͳΔ͔ʁ
υϝΠϯదԠ ධՁ༻σʔλͷυϝΠϯʹBTͷυϝΠϯ news ͷ߹ຊͷσʔλ ఆͰ83%ͷվળ ධՁ༻σʔλͷυϝΠϯͱ#5ͷυϝΠϯ news ͕·ΔͰ߹͍ͬͯͳ͍ ߹ʹຊͷσʔλఆͰ32.5%ͷվળ ͲͪΒվળ͍ͯ͠Δ͕ɺυϝΠϯ߹க͍ͯ͠Δ߹൚༻ͷσʔλҎ
্ͷਫ਼ʹͳΔ ʓʓδϟϯϧͷର༁σʔλ͕ͳͯ͘ ୯ݴޠσʔλ͕͋Εʓʓδϟϯϧͷ༁ΛڧԽՄೳ
·ͱΊ ·ͱΊ Ͳͷख๏Ͱٯ༁ΛೖΕΕਫ਼্͕Δ͕ɺٯ ༁͢Δͱ͖ͷѻ͍Ͱਫ਼্෯͕ഒʹͳΔ͜ͱ ͋Δ σʔλ͕গͳ͍࣌ʹ૬ରతʹੑೳ͕Լ͕ΔͷͰ҆ қʹαϯϓϦϯά͕͑ͳ͍ υϝΠϯదԠʹ͑Δ