Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
DRL 組み合わせ最適化
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
newzy
November 24, 2021
Research
100
8
Share
DRL 組み合わせ最適化
newzy
November 24, 2021
Other Decks in Research
See All in Research
Any-Optical-Model: A Universal Foundation Model for Optical Remote Sensing
satai
3
630
データセンター事業者を取り巻く近年の状況とその中での研究開発動向、テストベッドへの貢献の可能性
kikuzo
1
120
A History of Approximate Nearest Neighbor Search from an Applications Perspective
matsui_528
1
260
羽田新ルート運用6年の検証
1manken
0
140
Unified Audio Source Separation (Defense Slides)
kohei_1979
1
600
2026年1月の生成AI領域の重要リリース&トピック解説
kajikent
0
980
LINEヤフー データサイエンス Meetup「三井物産コモディティ予測チャレンジ」の舞台裏-AlpacaTechパート
gamella
1
480
論文紹介 "ReSim: Reliable World Simulation for Autonomous Driving"
kogo
0
500
Tiaccoon: Unified Access Control with Multiple Transports in Container Networks
hiroyaonoe
0
1.6k
通時的な類似度行列に基づく単語の意味変化の分析
rudorudo11
0
270
COFFEE-Japan PROJECT Impact Report(Uminomukou Coffee)
ontheslope
0
110
Self-Hosted WebAssembly Runtime for Runtime-Neutral Checkpoint/Restore in Edge–Cloud Continuum
chikuwait
0
510
Featured
See All Featured
SEOcharity - Dark patterns in SEO and UX: How to avoid them and build a more ethical web
sarafernandez
0
180
Introduction to Domain-Driven Design and Collaborative software design
baasie
1
770
Balancing Empowerment & Direction
lara
6
1.1k
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
141
35k
Visualization
eitanlees
150
17k
Writing Fast Ruby
sferik
630
63k
Google's AI Overviews - The New Search
badams
0
1k
More Than Pixels: Becoming A User Experience Designer
marktimemedia
3
400
KATA
mclloyd
PRO
35
15k
Making Projects Easy
brettharned
120
6.6k
How Software Deployment tools have changed in the past 20 years
geshan
0
33k
Context Engineering - Making Every Token Count
addyosmani
9
860
Transcript
POMO: Policy Optimization with Multiple Optima for Reinforcement Learning Kwon,
Yeong-Dae, et al. NeurIPS, 2020, vol.33
ཁ •Έ߹Θͤ࠷దԽʹ͓͚ΔɼਂڧԽֶश ͰͷFOEUPFOEͷۙࣅղ๏ɽ •طଘͷਂڧԽֶशख๏ͱൺֱͯ͠ɼ ܭࢉ࣌ؒɾਫ਼ͱʹେ͖͘վળͨ͠ •८ճηʔϧεϚϯͳͲͰݕূɽ 2/26
ಋೖ
Έ߹Θͤ࠷దԽ •८ճηʔϧεϚϯૹܭըɼφοϓβοΫ ͳͲʹද͞ΕΔΑ͏ͳ࠷దͳΈ߹ΘͤΛٻΊΔɽ 4/26 精度 計算時間 厳密解法 最適 遅い 近似解法
最適に 近い 早い https://onl.tw/vzkASMX
ڧԽֶशʢ3FJOGPSDFNFOU-FBSOJOH3-ʣ •3-ɿஞ࣍తͳҙࢥܾఆΛղ͘ख๏ɽ ྦྷੵใु͕࠷େʹͳΔΑ͏ͳํࡦΛݟ͚ͭΔ͜ͱ͕తɽ 5/26 ઃఆͱͯ͠ɼঢ়ଶू߹ɼߦಈू߹ɼใुؔΛ ઃఆ͢Δඞཁ͕͋Δɽ https://onl.tw/98fQVvW
ํࡦϕʔεͷ3&*/'03$& 6/26 •ํࡦ 𝜋 𝑠 ɿঢ়ଶ𝑠ʹ͓͚Δߦಈ𝑎Λग़ྗ͢Δؔ •𝜋! ɿύϥϝʔλ 𝜃ͰύϥϝʔλԽ͞Εͨํࡦ •ํࡦͷߋ৽ࣜɿ𝛼ֶशɼ𝐽
𝜋! తؔ 𝜃 ← 𝜃 + 𝛼∇! 𝐽 𝜋! •ํࡦޯͷࣜɿ𝔼ظɼ𝑅" ऩӹɼ𝑏 𝑠 ϕʔεϥΠϯ ∇! 𝐽 𝜋! = 𝔼#! ∇! log 𝜋! ⋅ 𝑅" − 𝑏 𝑠
ઌߦݚڀ
1PJOUFS/FUXPSLTʢʣ Έ߹Θͤ࠷దԽͰར༻͢ΔωοτϫʔΫ •ॏෳͳ͘બ͠ɼग़ྗύλʔϯྻΛੜ͢Δɽ •ೖྗใ͔Βಛநग़Λߦ͏FODPEFSͱɼFODPEFS ͷग़ྗΛར༻ͯ͑͠ͱͳΔܦ࿏Λग़ྗ͢ΔEFDPEFS͔ ΒͳΔɽ •FODPEFSͱEFDPEFSʹ-45.Λ༻ɽ 8/26
"UUFOUJPO .PEFMʢʣ 1PJOUFS/FUXPSLTͷվྑ൛ •1PJOUFS/FUXPSLTಉ༷ɼ&ODPEFSͱ%FDPEFSΛ༻͢Δ Ϟσϧɽ •-45.ഇࢭ͠ɼ.VMUJIFBE"UUFOUJPOΛ࠾༻ɽ 9/26
ख๏
ຊจͷख๏ͷΞΠσΞ 11/26 ࠷ॳͷߦಈɼޙͷΤʔδΣϯτͷߦಈʹେ͖͘ӨڹΛ༩͑Δɽ Έ߹Θͤ࠷దԽʹΑ͘ݟΒΕΔରশੑΛར༻ɽ
10.0 •3&*/'03$&XJUI#BTFMJOFɿయܕతͳํࡦޯϕʔεͷ 3-ΞϧΰϦζϜΛ༻ɽ •ෳͷҟͳΔ։࢝ߦಈΛࢦఆ͠ɼෳͷߦಈܥྻʢيಓʣ ΛಘΔɽ •ʻ45"35ʼτʔΫϯΛ༻͍ͳ͍ɽ 12/26 従来 POMO
10.0 ∇! 𝐽 𝜃 ≈ 1 𝑁 6 $%& '
𝑅 𝜏$ − 𝑏$ 𝑠 ∇! log 𝑝! 𝜏$ ∣ 𝑠 𝑤ℎ𝑒𝑟𝑒 𝑝! 𝝉$ ∣ 𝑠 ≡ @ "%( ) 𝑝! 𝑎" $ ∣ 𝑠, 𝑎&:"+& $ يಓ 𝝉$ = 𝑎& $ , 𝑎( $ , … , 𝑎) $ GPS 𝑖 = 1,2, … , 𝑁 ڞ༗ϕʔεϥΠϯ 𝑏$(𝑠) = 𝑏TIBSFE (𝑠) = 1 𝑁 6 ,%& ' 𝑅 𝝉, GPS 𝑖 = 1,2, … , 𝑁 13/26
܇࿅෦ͷٖࣅίʔυ 14/26
*OTUBODF"VHNFOUBUJPOɿਪख๏ •ը૾ॲཧͷσʔλΦʔάϝϯςʔγϣϯ͔Βணɽ •ࠓճ͏࠲ඪɼYͷ୯Ґਖ਼ํܗʢୈҰݶʣͷ ͷΛར༻ɽ 15/26 今回使う Instance Augmentation
ਪ෦ͷٖࣅίʔυ 16/26
࣮ݧ
࣮ݧ ࣮ݧ༰ •10.0Λ༻͍ͯɼҎԼͷΛղ͍ͨ݁ՌΛଞͷදతख๏ͱ ൺֱɽ ८ճηʔϧεϚϯ ༰ྔ੍͋Γͷૹܭը φοϓβοΫ
18/26
ֶशۂઢɿ८ճηʔϧεϚϯ 19/26 50地点 100地点
८ճηʔϧεϚϯʢ541ʣ 20/26
८ճηʔϧεϚϯʢ541ʣ 21/26
༰ྔ੍͋Γͷૹܭըʢ$731ʣ 22/26
φοϓβοΫʢ,1ʣ 23/26
࣮ݧͷ·ͱΊ •ҟͳΔઃఆͷͭͷΈ߹Θͤ࠷దԽʹରͯ͠ɼ ಉҰͷ܇࿅ख๏ͱ//ΞʔΩςΫνϟΛ༻͍ͯ༗ͳ݁ՌΛ ಘͨɽ •܇࿅ɾਪख๏ͱͯ͠ͷ10.0ɼਪख๏ͱͯ͠ͷ *OTUBODF"VHNFOUBUJPOͲͪΒޮՌతͳख๏Ͱ͋Δ͜ͱ Λ֬ೝͨ͠ɽ 24/26
·ͱΊ ຊจͰΈ߹Θͤ࠷దԽʹ͓͍ͯɼରশੑΛར༻ ͯ͠3-ͷαϯϓϧޮਫ਼ ਪ࣌ؒΛॖ͢Δख๏Λ հͨ͠ɽ 25/26
ࢀߟจݙ ,XPO :FPOH%BF FUBM10.01PMJDZ0QUJNJ[BUJPOXJUI .VMUJQMF0QUJNBGPS3FJOGPSDFNFOU-FBSOJOH "EWBODFTJO /FVSBM*OGPSNBUJPO1SPDFTTJOH4ZTUFNT
,PPM 8PVUFS )FSLF WBO)PPG BOE.BY8FMMJOH"UUFOUJPO -FBSOUP4PMWF3PVUJOH1SPCMFNT *OUFSOBUJPOBM$POGFSFODF PO-FBSOJOH3FQSFTFOUBUJPOT 7JOZBMT 0SJPM .FJSF 'PSUVOBUP BOE/BWEFFQ+BJUMZ1PJOUFS /FUXPSLT "EWBODFTJO/FVSBM*OGPSNBUJPO1SPDFTTJOH 4ZTUFNT 26/26