Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
TensorFlow & DeepMind Lab & UNREAL
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
Kosuke Miyoshi
April 20, 2017
Technology
2.6k
1
Share
TensorFlow & DeepMind Lab & UNREAL
TensorFlowで実装したUNREALアルゴリズムでDeepMind Labの3D迷路を解く
Kosuke Miyoshi
April 20, 2017
More Decks by Kosuke Miyoshi
See All by Kosuke Miyoshi
Representation Learning with Contrastive Predictive Coding
miyosuda
1
230
Sutton "Reinforcement Learning" 2nd Edition Ch13: Policy Gradient Methods
miyosuda
0
220
Sutton "Reinforcement Learning" 2nd Edition Ch7: n-step Bootstrapping
miyosuda
0
110
Sutton "Reinforcement Learning" 2nd Edition Ch6: TD-learning
miyosuda
0
110
SCAN
miyosuda
0
880
Variational Auto Encoderでの Disentangled表現
miyosuda
0
640
Other Decks in Technology
See All in Technology
最新技術を"今は選ばない"という技術選定
leveragestech
PRO
0
230
AWS WAFの運用を地道に改善し、自社で運用可能にするプラクティス
andpad
1
470
20260515 ⾃分のアカウントとプライバシーを守る認証と認可の話〜利⽤者向け〜
oidfj
0
720
Loadbalancing exporter internals
ymotongpoo
1
110
GCASアップデート(202603-202605)
techniczna
0
220
Claude Code / Codex / Kiro に AWS 権限を 渡すとき、何を設計すべきか
k_adachi_01
6
1.7k
JaSSTに関わることで変わった人生観 #jasstnano
makky_tyuyan
0
140
みんなの考えた最強のデータ基盤アーキテクチャ'26前期〜前夜祭〜ルーキーズ_資料_遠藤な
endonanana
0
460
クラウドネイティブ DB はいかにして制約を 克服したか? 〜進化歴史から紐解く、スケーラブルアーキテクチャ設計指針〜
hacomono
PRO
6
1.1k
「強制アップデート」か「チームの自律」か?エンタープライズが辿り着いたプラットフォームのハイブリッド運用/cloudnative-kaigi-hybrid-platform-operations
mhrtech
0
210
セキュリティ対策、何からはじめる? CloudNative環境の脅威モデリングと リスク評価実践入門 #cloudnativekaigi
varu3
5
1k
AI全盛の今だからこそ、あえてもう一度振り返るAPIの基礎
smt7174
3
130
Featured
See All Featured
職位にかかわらず全員がリーダーシップを発揮するチーム作り / Building a team where everyone can demonstrate leadership regardless of position
madoxten
62
54k
The Psychology of Web Performance [Beyond Tellerrand 2023]
tammyeverts
49
3.4k
Digital Projects Gone Horribly Wrong (And the UX Pros Who Still Save the Day) - Dean Schuster
uxyall
0
1.4k
Building Flexible Design Systems
yeseniaperezcruz
330
40k
Applied NLP in the Age of Generative AI
inesmontani
PRO
4
2.2k
A Modern Web Designer's Workflow
chriscoyier
698
190k
Building Better People: How to give real-time feedback that sticks.
wjessup
370
20k
Creating an realtime collaboration tool: Agile Flush - .NET Oxford
marcduiker
35
2.4k
Performance Is Good for Brains [We Love Speed 2024]
tammyeverts
12
1.6k
Mobile First: as difficult as doing things right
swwweet
225
10k
Google's AI Overviews - The New Search
badams
0
1k
Gemini Prompt Engineering: Practical Techniques for Tangible AI Outcomes
mfonobong
2
390
Transcript
5FOTPS'MPX %FFQNJOE-BC OBSSBUJWFOJHIUTגࣜձࣾ ࡾ߁༞ 5FOTPS'MPX6TFS(SPVQ
%FFQ.JOE-BC
6/3&"- ڧԽֶशͷ"$ΞϧΰϦζϜΛϕʔεʹ&YQFSJFODF 3FQMBZΛͬͨิॿλεΫΛΈ߹Θͤͯ%໎࿏Ͱ YഒͷֶशͷߴԽΛ࣮ݱ REINFORCEMENT LEARNING WITH UNSUPERVISED AUXILIARY TASKS
Max Jaderberg, Volodymyr Mnih, Wojciech Marian Czarnecki et. al (DeepMind, 2016)
ಈͷເ w ಈເͷதͰܦݧͨ͠ग़དྷࣄΛ࠶ݱ ϦϓϨΠ ͠ ͳ͕Βւഅ৽ൽ࣭ͷهԱͷݻఆΛߦ͍ͬͯΔ w ߠఆత൱ఆతͳใुʹؔΘΔग़དྷࣄͷເΛಘʹස ൟʹݟֶͯशΛߦ͍ͬͯΔ w
FYʮਫҿΈͰϥΠΦϯΛݟ͔͚ͯةݥͳʹ͋ͬ ͨʯ w 6/3&"-Ͱ͜ΕΛώϯτʹ͍ͯ͠Δ
ڧԽֶश ڥ ΤʔδΣϯτ "DUJPO ⬆ ➡ ⬇ ঢ়ଶ T ใु
S
6/3&"-ͷྲྀΕ %2/ "$ 6/3&"-
"$ "TZODISPOPVT"EWBODFE"DUPS$SJUJD w ෳͷڥΛඇಉظʹฒྻʹಈֶ͔ͯ͠शΛߴԽ ҆ఆԽͤͨ͞
К 1PMJDZ 7 ֤"DUJPOΛऔΔ֬ ݱࡏͷঢ়ଶՁ ⬆ ➡ ⬇ TPGUNBY MJOFBS
$POW $POW '$ -45. "$ͷωοτϫʔΫߏ
֤-PDBM/FUXPSLͰɺֶश݁Ռͷޯ EВ ͷΈΛٻΊɺ ΣΠτʹөͤͣ(MPCBMͷΣΠτ В ʹݸผʹөɻ (MPCBMͷΣΠτΛ·֤ͨ-PDBMͷΣΠτʹίϐʔɻ EВ EВ EВ
EВ В ʜ
1PMJDZ К 7ͷޯ R= = = w 73ʹ͚ۙͮΔ༷ʹߋ৽ w 37͕ਖ਼ͳΒɺऔͬͨBDUJPO͕ग़Δ֬Λ૿༷͢ʹߋ৽
37͕ෛͳΒɺऔͬͨBDUJPO͕ग़Δ֬ΛݮΒ༷͢ʹߋ৽ V network: Policy network: ˞্هͷදهͰ7(SBEJFOU%FTDFOU 1PMJDZ(SBEJFOU"TDFOUθv = θv - α * dθv, θ = θ + α * dθ 1PMJDZ 7
6/3&"- w "$ʹɺ&YQFSJFODF3FQMBZΛޮՌతʹͬͨิ ॿλεΫΛಋೖ͠ɺ͞ΒʹֶशΛߴԽͤ͞Δ w 1JYFM$POUSPM w 3FXBSE1SFEJDUJPO w 7BMVF'VODUJPO3FQMBZ
6/TVQFSWJTFE3&JOGPSDFNFOU"VYJMJBSZ-FBSOJOH
&YQFSJFODF3FQMBZ w <ঢ়ଶ "DUJPO ใु ࣍ঢ়ଶ>ͷϖΞΛେྔʹอଘ͠ ͯɺ͔ͦ͜ΒαϯϓϦϯάͯ͠ωοτϫʔΫΛֶश w %2/ɺ͜Ε͕ͳ͍ͱֶश͕҆ఆ͠ͳ͔ͬͨ w
"$Ͱ͍ͬͯͳ͍
None
1JYFM$POUSPM w ը໘ͷϐΫηϧͷมԽྔΛΑΓେ͖͘͢Δ༷ʹ͞ ͍ͤͨ w ը໘ͷϐΫηϧͷมԽΛٖࣅใुͱ͢Δิॿλε Ϋ
1JYFM$POUSPM w ը໘ΛYͷϐΫηϧάϦουʹ͚ɺάϦουຖʹ2ֶशΛߦ͏ w %VFMJOH/FUXPSLΛͬͨ2ֶश ˞1JYFM$POUSPMͰಘΒΕͨ2͕BDUJPOͷબʹΘΕΔ༁Ͱͳ͍ YͷάϦου BDUJPO ֤άϦουͷϐΫηϧมԽྔฏۉΛใुͱͨ࣌͠ͷׂҾՃࢉใु߹ܭ2
3FXBSE1SFEJDUJPO w &YQFSJFODF3FQMBZ͔Β࿈ଓͨ͠ϑϨʔϜऔΓग़ ͠ɺϑϨʔϜͷใु͕ɺਖ਼͔ෛ͔θϩ͔Λ༧ଌ ͢ΔิॿλεΫ w ༧ଌ͢Δใुɺ ʴ ʔPSͷൺ͕ʹͳΔ༷ʹαϯϓϦϯά ༗ӹͳใुΠϕϯτϨΞͰ͋ͬͯɺසൟʹαϯϓϦϯά͞ΕΔ
3FXBSE1SFEJDUJPO ࣍ͷใु͕ PSPSΛ༧ଌ
7BMVF'VODUJPO3FQMBZ w "$Ͱ͍ͬͯΔɺঢ়ଶՁ 7 ͷਪఆ "DUPS$SJUJDͷ$SJUJDଆ Λɺ&YQFSJFODF3FQMBZ͔ΒαϯϓϦϯάͨ͠ϑϨʔϜͰ࠶ ߦ͏ w 3FXBSE1SFEJDUJPOͱҧͬͯɺαϯϓϦϯάಛʹภΒͤͳ͍
ิॿλεΫɺ"DUJPOબʹӨڹ༩͑ͳ͍͕ɺϕʔ εͷ"$ͱ$POWɺ-45.ͷ8FJHIUΛڞ༗͍ͯ͠Δͷ ͰɺิॿλεΫΛೖΕΔ͜ͱʹΑΓɺͦΕΛղ͘ޮՌతͳ ಛදݱ͕ಘΒΕΔ͜ͱʹΑΓɺؒతʹ"DUJPOબʹӨ ڹΛ༩͑Δ
ଛࣦؔ #BTF"$ 7BMVF'VODUJPO 3FQMBZ 1JYFM$POUSPM YάϦου 3FXBSE 1SFEJDUJPO
None
"$ͱͷൺֱ %FFQ.JOE-BCڥʹͯฏۉͰYഒͷߴԽ
ΓΜ͝ΛऔΔͱ ϫʔϓʹ౸ୡ͢Δͱ ΛಘͯϥϯμϜͳ ॴʹϫʔϓ
࠶ݱݕূಈը IUUQTZPVUVCFY),R#F)* ˞4QFBLFS%FDLͰද͍ࣔͯ͠Δ߹ɺ63-ϦϯΫ͕ΫϦοΫͰ͖ͳ͍ͷͰɺQEGΛμϯϩʔυͯ͠ΫϦοΫ͍ͯͩ͘͠͞
1JYFM$POUSPM ֤άϦουͷલϑϨʔϜͱͷ ϐΫηϧมԽྔ ֤άϦουͷ2 औͬͨ"DUJPOʹର͢Δ2
1PMJDZ К ֤ΞΫγϣϯΛऔΔ֬ લਐ ޙୀ ࠨӈճస ࠨӈεϥΠυ ֶश͕ਐΉͱ΄΅ͷ֬Ͱ֤"DUJPOΛબͿΑ͏ʹͳͬͯ͘Δ
7BMVF'VODUJPO ݱࡏͷঢ়ଶՁ ϫʔϓ ʹۙͮ͘ʹͭΕ্͕͍ͯͬͯ͘
3FXBSE1SFEJDUJPO ϓϥεใु͕དྷΔͱ༧ଌ͍ͯ͠Δ
4PVSDF w IUUQTHJUIVCDPNNJZPTVEBVOSFBM