[論文サーベイ] Survey on Visualization in Deep Reinforcement Learning of Game Tasks 2

Visualization of Deep Reinforcement Learning using Grad-CAM:How AI Plays Atari
Games?, Ho-Taek Joo et al. (Institute of Integrated Technology GIST) [IEEE CoG'19] (Cited by:54) Survey on Visualization in Deep Reinforcement Learning of Game Tasks 2 Atari-HEAD: Atari Human Eye-Tracking and Demonstration Dataset, Ruohan Zhang et al. (Department of Computer Science, University of Texas at Austin et al.) [AAAI'20] (Cited by:66) 1/16 2024/05/15 Explain Your Move: Understanding Agent Actions Using Speciﬁc and Relevant Feature Attribution, Nikaash Puri et al. (Adobe Systems Inc et al.) [ICLR'20] (Cited by:72)

基礎知識 | XAI (Explainable Artiﬁcial Intelligence) 2/16 ❏ XAIは機械学習モデル全般で取り組まれている引用：zero2one,
モデルの解釈

背景 | XRL (Explainable Reinforcement Learning) ❏ ネットワーク内部の特徴量を可視化する方法 : t-SNEを使用
❏ 説明可能な強化学習モデルそのものを構築する方法 : 階層的なDRLモデル ❏ 自然言語で判断根拠を説明 : 言語的説明 (Linguistic Explanation) ❏ 視覚的説明 (Visual Explanation) : 観測を画像とする強化学習 3/16 Visualizing Dynamics: from t-SNE to SEMI-MDPs [ICML'16] Hierarchical and Interpretable Skill Acquisition in Multi-task Reinforcement Learning [ICLR'18] 参考：深層強化学習における視覚的説明 [日本ロボット学会誌] Read and Reap the Rewards: Learning to Play Atari with the Help of Instruction Manuals [NeurIPS'23]

Atariをプレイした人間の視線データと行動データを含む大規模データセットを提案 ❖ 視線データ ➢ 視線データは”EyeLink 1000”という機械を使用して取得 ❖ 人間の行動データ
1. ゲームをフレームごとに一時停止し，人間のプレイヤーがアクションを入力 2. 人間のプレイヤーが”最善の行動”を選択することで高スコアを実現 3. “最善の行動”を選択しているためデータ品質が向上 4. 研究における標準的なベンチマークとして利用可能 4/16 Atari-HEAD: Atari Human Eye-Tracking and Demonstration Dataset | 概要

データセット ❖ 20種類のAtariゲームで約117時間のプレイデータ ❖ 約800万のアクションデータ ❖ 約3.3億の視線データデータの種類 ❖
Foveated Rendering ➢ 人間が注目している箇所を高解像度で表示 ➢ 注目箇所以外は低解像度で表示 ❖ Saliency ➢ 画像の色，輝度，エッジから特徴抽出 ❖ Optic Flow ➢ 「前フレーム」と「現在フレーム」間の動きの特徴を抽出 5/16 Atari-HEAD: Atari Human Eye-Tracking and Demonstration Dataset | データセット

❖ Saliency Prediction ➢ 目的: 視線データ画像から注目領域を予測 ➢ 手法: CNNによる特徴抽出を行いサリエンシーマップ生成
❖ Imitation Learning (模倣学習) ➢ 目的: 提案データセットの行動を模倣 ➢ 手法: 提案データセットを使用して行動予測モデルを学習 (behavior cloning) ❖ Attention Guided Imitation Learning ➢ 目的: 視線データ画像を活用し，模倣学習を強化 ➢ 手法: 入力画像とサリエンシーマップの積を用いて行動予測モデルを強化 6/16 Atari-HEAD: Atari Human Eye-Tracking and Demonstration Dataset | 手法

1. 視線データ画像 2. Foveated Rendering画像 3. Saliency画像 4. Optic Flow画像
5. Saliency Prediction画像 7/16 Atari-HEAD: Atari Human Eye-Tracking and Demonstration Dataset | 実験結果 1.視線データ 2.Foveated Rendering 3.Saliency 4.Optic Flow 5.Saliency Prediction

❏ AtariHEAD: 提案したデータセットを使用して模倣学習したもの ❏ Community Record: 公式世界記録 (人間がプレイした際の最高記録) ❏ RL:
強化学習エージェントのスコア (アルゴリズムの記載なし) 8/16 Atari-HEAD: Atari Human Eye-Tracking and Demonstration Dataset | 実験結果人間がプレイしたスコア

9/16 ❏ SARFA：エージェントの行動に関連する重要な特徴だけを強調するサリエンシーマップ手法 ❏ 既存手法と比較して「チェス，Atari，囲碁」において関連性の高いサリエンシーマップを生成可能 Explain Your Move:
Understanding Agent Actions Using Speciﬁc and Relevant Feature Attribution | 概要

10/16 ①状態の摂動 ❏ 画像の一部をぼかして摂動状態を作成 ②Q値の計算 ❏ 元の状態 s と摂動状態 s’
のQ値を計算 ❏ Q値は各行動に対する期待報酬 ③特異性の計算 ❏ 選択した行動 â の行動を評価 ❏ ソフトマックス関数を使って，各行動が選ばれる確率を計算 ❏ 元の状態 s と摂動状態 s’ で計算した差分を特異性Δpとする Explain Your Move: Understanding Agent Actions Using Speciﬁc and Relevant Feature Attribution | 手法 ① ② ③ ④ ⑤

11/16 ④関連性の計算 ❏ 選択した行動 â 以外の行動を評価 ❏ 元の状態 s と摂動状態
s’ で計算する ❏ KLダイバージェンスを用いて，元の状態と摂動状態における確率分布間の違いを測定 ❏ 高い場合 (分布間の差が大きい)：摂動状態 s’ が他の行動の選択確率に大きく影響する ⑤サリエンシーマップの生成 ❏ Kの式は，KLダイバージェンスの逆数で評価 ❏ Kが小さい場合：摂動状態 s’ が他の行動の選択確率に大きく影響する ❏ KとΔpで調和平均をとって，サリエンシーマップを生成 Explain Your Move: Understanding Agent Actions Using Speciﬁc and Relevant Feature Attribution | 手法 ① ② ③ ④ ⑤

12/16 ❏ 既存手法：パドルやボールに関係のない部分も強調されている ❏ 提案手法：パドルやボールに関係する領域が明確に強調されている Explain Your Move: Understanding Agent
Actions Using Speciﬁc and Relevant Feature Attribution | 実験結果

13/16 ❏ 強化学習アルゴリズムのA3CにGrad-CAMを適用する手法を提案 Visualization of Deep Reinforcement Learning using Grad-CAM:How
AI Plays Atari Games? | 概要

14/16 ❖ E-A3Cモデル ➢ 各CNN層の後にマックスプーリング層を追加 ➢ 各CNN層の出力チャネルを拡大 ❖ Grad-CAM適用手順
1. 160 × 160のRGB入力画像 2. CNN層を通して特徴マップを取得 3. 予測されたアクションに基づいて逆伝播し，勾配を計算 4. 勾配のグローバル平均プーリング 5. 勾配と特徴マップを掛け合わせて足す 6. ReLU関数を適用してGrad-CAM マップを生成 Visualization of Deep Reinforcement Learning using Grad-CAM:How AI Plays Atari Games? | 手法 ① ② ③ ④ ⑤ ⑥

15/16 敵，弾丸，プレイヤーを検出し，動きを可視化できている Visualization of Deep Reinforcement Learning using Grad-CAM:How AI
Plays Atari Games? | 実験結果 ❖ 敵：青色 ❖ 弾丸：赤色 ❖ プレイヤー：緑色

Visualization of Deep Reinforcement Learning using Grad-CAM:How AI Plays Atari
Games?： E-A3CにGrad-CAMを適用まとめ Atari-HEAD: Atari Human Eye-Tracking and Demonstration Dataset：人間の視線データを含むAtari大規模データセットを提案 16/16 Explain Your Move: Understanding Agent Actions Using Speciﬁc and Relevant Feature Attribution：重要な特徴だけを強調するサリエンシーマップ手法を提案 ❏ 傾向と今後 ❏ Transformerベース強化学習 × 可視化手法が出てきそう

[論文サーベイ] Survey on Visualization in Deep Reinfo...

[論文サーベイ] Survey on Visualization in Deep Reinforcement Learning of Game Tasks 2

tt1717

More Decks by tt1717

Other Decks in Research

Featured

Transcript

Visualization of Deep Reinforcement Learning using Grad-CAM:How AI Plays Atari

基礎知識 | XAI (Explainable Artiﬁcial Intelligence) 2/16 ❏ XAIは機械学習モデル全般で取り組まれている引用：zero2one,

背景 | XRL (Explainable Reinforcement Learning) ❏ ネットワーク内部の特徴量を可視化する方法 : t-SNEを使用

Atariをプレイした人間の視線データと行動データを含む大規模データセットを提案 ❖ 視線データ ➢ 視線データは”EyeLink 1000”という機械を使用して取得 ❖ 人間の行動データ

データセット ❖ 20種類のAtariゲームで約117時間のプレイデータ ❖ 約800万のアクションデータ ❖ 約3.3億の視線データデータの種類 ❖

❖ Saliency Prediction ➢ 目的: 視線データ画像から注目領域を予測 ➢ 手法: CNNによる特徴抽出を行いサリエンシーマップ生成

1. 視線データ画像 2. Foveated Rendering画像 3. Saliency画像 4. Optic Flow画像

❏ AtariHEAD: 提案したデータセットを使用して模倣学習したもの ❏ Community Record: 公式世界記録 (人間がプレイした際の最高記録) ❏ RL:

9/16 ❏ SARFA：エージェントの行動に関連する重要な特徴だけを強調するサリエンシーマップ手法 ❏ 既存手法と比較して「チェス，Atari，囲碁」において関連性の高いサリエンシーマップを生成可能 Explain Your Move:

10/16 ①状態の摂動 ❏ 画像の一部をぼかして摂動状態を作成 ②Q値の計算 ❏ 元の状態 s と摂動状態 s’

11/16 ④関連性の計算 ❏ 選択した行動 â 以外の行動を評価 ❏ 元の状態 s と摂動状態

12/16 ❏ 既存手法：パドルやボールに関係のない部分も強調されている ❏ 提案手法：パドルやボールに関係する領域が明確に強調されている Explain Your Move: Understanding Agent

13/16 ❏ 強化学習アルゴリズムのA3CにGrad-CAMを適用する手法を提案 Visualization of Deep Reinforcement Learning using Grad-CAM:How

14/16 ❖ E-A3Cモデル ➢ 各CNN層の後にマックスプーリング層を追加 ➢ 各CNN層の出力チャネルを拡大 ❖ Grad-CAM適用手順

15/16 敵，弾丸，プレイヤーを検出し，動きを可視化できている Visualization of Deep Reinforcement Learning using Grad-CAM:How AI

Visualization of Deep Reinforcement Learning using Grad-CAM:How AI Plays Atari