Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Understanding Back-Translation at Scale
Search
ysasano
February 12, 2019
Technology
5
2.8k
Understanding Back-Translation at Scale
機械翻訳のデータ拡大手法の一つである逆翻訳について、大量データで評価するとどうなるか検証した論文を紹介します。
ysasano
February 12, 2019
Tweet
Share
Other Decks in Technology
See All in Technology
AIを駆使したゲーム開発戦略: 新設AI組織の取り組み / sge-ai-strategy
cyberagentdevelopers
PRO
1
130
最速最小からはじめるデータプロダクト / Data Product MVP
amaotone
5
730
「視座」の上げ方が成人発達理論にわかりやすくまとまってた / think_ perspective_hidden_dimensions
shuzon
2
120
Aurora_BlueGreenDeploymentsやってみた
tsukasa_ishimaru
1
120
【技術書典17】OpenFOAM(自宅で極める流体解析)2次元円柱まわりの流れ
kamakiri1225
0
210
Fargateを使った研修の話
takesection
0
110
わたしとトラックポイント / TrackPoint tips
masahirokawahara
1
240
ガバメントクラウド単独利用方式におけるIaC活用
techniczna
3
270
話題のGraphRAG、その可能性と課題を理解する
hide212131
4
1.4k
分布で見る効果検証入門 / ai-distributional-effect
cyberagentdevelopers
PRO
4
690
Commitment vs Harrisonism - Keynote for Scrum Niseko 2024
miholovesq
6
1.1k
Oracle Base Database Service 技術詳細
oracle4engineer
PRO
5
49k
Featured
See All Featured
Designing for humans not robots
tammielis
249
25k
How to Create Impact in a Changing Tech Landscape [PerfNow 2023]
tammyeverts
46
2.1k
What's in a price? How to price your products and services
michaelherold
243
12k
Building an army of robots
kneath
302
42k
The Power of CSS Pseudo Elements
geoffreycrofte
72
5.3k
GitHub's CSS Performance
jonrohan
1030
460k
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
131
33k
10 Git Anti Patterns You Should be Aware of
lemiorhan
654
59k
Docker and Python
trallard
40
3.1k
Fight the Zombie Pattern Library - RWD Summit 2016
marcelosomers
231
17k
Principles of Awesome APIs and How to Build Them.
keavy
126
17k
Building a Scalable Design System with Sketch
lauravandoore
459
33k
Transcript
Understanding Back-Translation at Scale Yasumasa Sasano (@SquirrelYellow) ٯ༁จͷσʔλΛಡΉ Edunov et
al. 2018ˏEMNLP 2018
Back-Translation = BT ͱԿ͔ 5BSHFU จষσʔλ 4PVSDF จষσʔλ ֶश ٯ༁Ϟσϧ
BT https://qiita.com/tkmaroon/items/4b8f469db1534d5e265b ͪ͜ΒͷهࣄͷදݱΛआΓ·ͨ͠ (1) ຊ໋ͱٯํͷ༁ϞσϧΛֶश(ӳͳΒӳ)
5BSHFU จষσʔλ 4PVSDF จষσʔλ 5BSHFU ୯ݴޠσʔλ 4PVSDF ߹ 4ZOUIFUJD
୯ݴޠσʔλ ਪ ٯ༁Ϟσϧ BT Back-Translation = BT ͱԿ͔ (2) BTΛͬͯσʔλΛ૿͢
5BSHFU จষσʔλ 4PVSDF จষσʔλ ຊ໋Ϟσϧ 5BSHFU ୯ݴޠσʔλ 4PVSDF ߹ 4ZOUIFUJD
୯ݴޠσʔλ ֶश Back-Translation = BT ͱԿ͔ (3) ૿ͨ͠σʔλͰֶश จʹॻ͍ͯͳ͍͕ɺΘ͟Θ͟ʮٯʯ༁͢Δͷ ਖ਼͍͠จষΛڭࢣʹ࠷దԽ͍ͨ͠ͱ͍͏͜ͱͩͱߟ͑Δ
BTͰେ෯ਫ਼UPͱʹ http://deeplearning.hatenablog.com/entry/back_translation
͜ͷจΛબΜͩಈػ ࣮৽ख๏ͷఏҊจͰͳ͍ طଘͷॾख๏ΛେྔσʔλͰධՁ͢ΔͱͲ͏ͳΔ͔ݕূ at Scale σʔλ֦େʹର͢ΔݕূσʔλΛಡΜͰ͍ٞͨ͠ BTҰछͷσʔλ֦େ - ࣄͷ্ؔɺࠓ͋ΔσʔλΛϑϧʹ׆͔͢ಈػ͕͋Δ -
ͲΜͳσʔλ֦େ͕༗ޮ͔ղ໌͞Ε͍ͯͳ͍෦͕ଟ͍ͷͰڵຯ͕͋Δ ͷ͕ಈػ
ฆΕ͕ͳ͍Α͏ʹ ΤϏσϯε จͷओு ݸਓͷॴײ ؾʹͳΔϙΠϯτ
Synthetic data generation method #5Ͱ࡞Δ߹σʔλʹ͍ͭͯ
߹σʔλͷ࡞ΓํʹΑΔҧ͍ΛධՁ Greedy Search ෩अ ෩अ פ͍ פ͍ ࠓ ͷ ෩अ
פ͍ ࡢ Beam Search ArgmaxΛ͏ͱ༁จͷଟ༷ੑ͕ͳ͘ͳͬͯ·͍ͣ ࠓ ͷ ෩अ פ͍ ࡢ εςοϓຖʹҐΛ ֬ఆͯ࣍͠ͷ୯ޠ ௨͠Ͱߴ֬ͷΛબ શ୳ࡧແཧͳͷͰ Beam ༗ݶ෯ Ͱ୳ࡧ 1Ґ લޙ݅1Ґ Greedy Search Beam Search Top 10 Sampling Beam + Noise Argmax Noised Middle ୯ޠ ֬ (ιʔτࡁ)
߹σʔλͷ࡞ΓํʹΑΔҧ͍ΛධՁ Top 10 ηʔλʔ פ͍ פ͍ ࠓ ͷ ෩अ פ͍
ࡢ Beam + Noise Sampling ྫྷଂݿ ϥϯμϜαϯϓϦϯά 1Ґ͔Β10ҐݶఆͰϥϯμϜαϯϓϦϯά ࠓ פ͍ ࠓ פ͍ ࠓ פ͍ ࠓ פ͍ BLANK ม͕͑ͯࠩͳ͍ p=0.1 p=0.1 uniform+maxҠಈ3 k=5, 10, 20, 50Ͱࢼ͕ͨ͠ɺ Otto et al. 2018a ʹΑΔͱෆ֬ఆੑ͕ ͔ͳΓେ͖͘มͳ ୯ޠΛग़͢Մೳੑ͕େ͖͍ ॳग़Imamura et al. 2018 (NICT) ڭࢣͳֶ͠शख๏ͰఏҊ Lample et al. 2018a ෩अ ෩अ ୯ޠ ֬ (ιʔτࡁ) ੜจʹଟ༷ੑΛ࣋ͨͤΔ͜ͱ͕Ͱ͖Δ จষੜٕ๏ͱͯ͠ݹ͘ɺ Graves et al. 2003ͳͲͰΘΕ͍ͯΔ
߹σʔλͷ࡞ΓํʹΑΔҧ͍ΛධՁ samplingbeam+noiseɺbeamgreedyΑΓ1.7-2.0 BLEUੑೳ͕ྑ͍ top10beamgreedyΑΓྑ͍͕samplingbeam+noiseΑΓѱ͍ samplingbeam+noise.ͷ࣌ʹbeamͷഒۙ͘ੑೳվળ͍ͯ͠Δ
ੜ͞Εͨจষͷੳ Greedy searchBeam searchଟ༷ͰϦονͳσʔλΛΊΔ Ott et al.2018aͷ จʹΑΔͱසޠ͕ग़ͳ͘ͳΔʹ͋Δ ͷͰSamplingख๏͕Α͍ denoising
autoencodersͱͷྨࣅੑ samplingbeam+noiseͰग़དྷ্͕ͬͨจݱ࣮Ε͍ͯ͠Δ͕ɺzஔzzॱংมߋzͱ ͍͏ݱී௨ʹى͖ΔͷͰͦ͏͍ͬͨॲཧΛೖΕΔͱϩόετʹͳΔ ࣍ͷ୯ޠ͕༧ଌͰ͖ͳ͍ͨΊɺқ͕Ҿ্͖͕ͬͯਫ਼্͕͕Δ
ੜ͞Εͨจষͷੳ ໌Β͔ʹ͓͔͍͠୯ޠ͕ೖΔͷzہॴతzͩͱΘ͔Δ ԾઆͲΜͳϊΠζ୯ޠ͕དྷͯͳ͍Α͏ɺͬͨਖ਼ৗ෦ͷ൚Խੑೳ্͕ͨ͠ʁ 0, /( ڐ༰Ͱ͖Δ୯ޠΛ੨ɺ໌Β͔ʹ͓͔͍͠୯ޠΛͰృͬͯΈΔͱɺ ʮہॴతͳϊΠζʯʹΑΔ൚Խੑೳ্ ࣭ʹؔΘΒͣଟ༷ੑ͕૿͔͑ͨΒ0,ͱ͍͏ղऍͰ͖ͳ͘ͳ͍͕ɺ ͦΕʹͯ͠ਫ਼্͕Γ͗͢Ͱʁͱ͍͏͜ͱͰ͏গ͠۷ΓԼ͍͛ͨ (ݸਓతߟ)
(ݸਓతߟͷଓ͖) ݘ͕͖Ͱ͢ ΫτΡϧϑਆ͕͖Ͱ͢ I like dog I am scared of
Cthulhu ہॴతϊΠζΛ༩ ଟ͘ͷࣗવݴޠॲཧͷϞσϧ গ͠ม͑Δ͚ͩͰ؆୯ʹὃͤΔಛੑ͕͋Δ Deep Text Classification Can be Fooled Liang et al. 2016 ༁ ະֶशͷσʔλ ޡࠩٯ ͜ͷʹରԠ͢Δଧͪख ʹͳ͍ͬͯΔՄೳੑ ԾʹΫτΡϧϑ͕ປࢺͰ ʮ͖ʯʮlikeʯ (ϊΠζ෦ʹޡࠩΛ͢ΔͷᘳʹແବͳͷͰվળͰ͖Δ͔)
Low Resource & High Resource #5ͷݩखͱͳΔର༁Ϧιʔεྔͷҧ͍ʹ͍ͭͯ
5BSHFU 4PVSDF ຊ໋Ϟσϧ 5BSHFU ୯ݴޠσʔλ 4PVSDF ߹ 4ZOUIFUJD ୯ݴޠσʔλ
ֶश ݩख͕গͳ͍ͱԿ͕ى͜Δ͔ ͜͜ͷྔ͕গͳ͍(80Kจఔ) จݿຊ͘Β͍ (112ສࣈ, 80ࣈ/จ)
ݩख͕গͳ͍ͱԿ͕ى͜Δ͔ 80KจͰsamplingbeam searchͷٯసݱ͕ى͖͍ͯΔ σʔλ͕ଟ͚Εଟ͍΄Ͳsampling͕ڧ͘ͳΔ ݩख͕গͳ͍߹ɺBTͷਫ਼͕ߴ͘ͳ͍ͷͰɺsamplingͰϊΠζΛՃ͑ͨͱ͖ͷѱӨ ڹʹ੬͘ͳΔ BTͷਫ਼ͷҾ্͖͕͛ඞཁ
ݩख͕গͳ͍ͷܰݮ 5BSHFU 4PVSDF &ODPEFS %FDPEFS 4PVSDF 4PVSDF 5BSHFU 5BSHFU 4PVSDFݴޠϞσϧ
5BSHFUݴޠϞσϧ సҠֶशorॏΈڞ༗ సҠֶशorॏΈڞ༗ (1) ୯ݴޠͰݴޠϞσϧΛ࡞ͬͯసҠֶश ʮݴޠϞσϧͷసҠ͕ࠔʯͱ͍͏͕Devlin et al. 2018 (BERT)Ͱղফ͞ΕͨͷͰਐల͋Δ͔
͍ͭͷؒʹ͔ͷ͍͢͝จ͕ൃද͞Ε͍ͯͨ ࢀߟจ: Lample et al. 2019 (XLM) #&35ΛసҠֶशɺ༁Λ&ODPEFS%FDPEFSͷܗͰͳ͘ҰͭͷݴޠϞσϧ ͱֶͯ͠श͠ɺ8.5`ಠӳ༁ͷڭࢣͳֶ͠शͷ405"Λ#-&6ߋ৽ BSYJWTVCNJU
ݩख͕গͳ͍ͷܰݮ (2) ରֶश (Dual Learning) ຊ໋Ϟσϧ 5BSHFU ୯ݴޠσʔλ 4PVSDF ୯ݴޠσʔλ
lରzϞσϧ ର༁Ͱͳͯ͘OK
Domain of synthetic data ߹σʔλͷυϝΠϯʹؔ͢Δݕূ
υϝΠϯదԠ 5BSHFU จষσʔλ 4PVSDF จষσʔλ ຊ໋Ϟσϧ χϡʔε 5BSHFU ୯ݴޠσʔλ χϡʔε
4PVSDF ߹ 4ZOUIFUJD ୯ݴޠσʔλ ֶश χϡʔεͷର༁σʔλ͕ͳͯ͘χϡʔεʹڧ͘ͳΔ͔ʁ
υϝΠϯదԠ ධՁ༻σʔλͷυϝΠϯʹBTͷυϝΠϯ news ͷ߹ຊͷσʔλ ఆͰ83%ͷվળ ධՁ༻σʔλͷυϝΠϯͱ#5ͷυϝΠϯ news ͕·ΔͰ߹͍ͬͯͳ͍ ߹ʹຊͷσʔλఆͰ32.5%ͷվળ ͲͪΒվળ͍ͯ͠Δ͕ɺυϝΠϯ߹க͍ͯ͠Δ߹൚༻ͷσʔλҎ
্ͷਫ਼ʹͳΔ ʓʓδϟϯϧͷର༁σʔλ͕ͳͯ͘ ୯ݴޠσʔλ͕͋Εʓʓδϟϯϧͷ༁ΛڧԽՄೳ
·ͱΊ ·ͱΊ Ͳͷख๏Ͱٯ༁ΛೖΕΕਫ਼্͕Δ͕ɺٯ ༁͢Δͱ͖ͷѻ͍Ͱਫ਼্෯͕ഒʹͳΔ͜ͱ ͋Δ σʔλ͕গͳ͍࣌ʹ૬ରతʹੑೳ͕Լ͕ΔͷͰ҆ қʹαϯϓϦϯά͕͑ͳ͍ υϝΠϯదԠʹ͑Δ