論文解説 DiLu: A Knowledge-Driven Approach to Autonomous Driving with Large Language Models

論⽂解説 DiLu: A Knowledge-Driven Approach to Autonomous Driving with Large
Language Models Takehiro Matsuda

2 論⽂情報タイトル： DiLu: A Knowledge-Driven Approach to Autonomous Driving
with Large Language Models • 論⽂： https://arxiv.org/html/2309.16292v3 • コード： https://github.com/PJLab-ADG/DiLu • 投稿学会： ICLR2024 • 著者： Licheng Wen, Daocheng Fu1, Xin Li, Xinyu Cai, Tao Ma, Pinlong Cai, Min Dou, Botian Shi, Liang He, Yu Qia • 所属： Shanghai Artificial Intelligence Laboratory, East China Normal University, The Chinese University of Hong Kong 選んだ理由： • knowledge-driven approachと名付けられているがどのように実現しているか知りたいため

3 knowledge-driven Drawing inspiration from the profound question posed by
LeCun (2022): “Why can an adolescent learn to drive a car in about 20 hours of practice and know how to act in many situations he/she has never encountered before?” 画像認識など⾃動運転に関する技術について、⼤量データとDNNによるdata-drivenで⾼い性能が⽰されてきた。ただし、学習していないシーン(エッジケース/レアケース)では性能がでないこともあり、⼈間の学習と違う弱点もある。

4 LLM is as embodiment of human knowledge https://palm-e.github.io/ PaLM-E
https://github.com/OpenGVLab/Instruct2Ac Instruct2Act Put the polka dot block into the green container. https://voyager.minedojo.org/ Voyager 現在LLM(Large Language Model)が⼈間の知識を最も汎化して所持しているとして、その応⽤をする研究がある。

5 Empower LLM to Autonomous driving (1) an environment with
which an agent can interact; (2) a driver agent with recall, reasoning, and reflection abilities; (3) a memory component to persist experiences. ただし、LLMにdriving taskをそのまま解かせようとしてもそれほど良い性能にならない。 LLMによりknowledge-drivenなdecision-makingを実現するために以下のコンポーネントを設計した。

6 The framework of DiLu 交通状況を⾔語化して、LLMに与えられるようにする。 Memory Moduleから過去の似た状況をとりだし、付帯する。⾃⾞の動きの
指⽰を得る。指⽰に従った動作により問題が⽣じた場合は何が問題でどう修正するのがよいか考察させ、修正した内容でMemoryに格納する。

7 Demo screen Highway-env

8 Memory module Initialization Memory recall Memory storage We select
a few scenarios and manually outline the correct reasoning and decision-making processes for these situations to form the initial memory. 公道に出る前に教習所で学ぶようにいくつのシナリオについて、マニュアルで正しい reasoningとdecision-makingを作ってinitial memoryとして保存する。 Before making a decision, the current driving scenario is embedded into a vector, which serves as the memory key. This key is then clustered and searched to find the closest scenarios in the memory module and their corresponding reasoning processes, or memories. 過去のシナリオをvectorとして埋め込み、似たシナリオを検索可能にする。正しいreasoningとdecisionされたシナリオを保存する。運転経験の蓄積過去の運転経験の活⽤ベースとなる運転⽅法を教える

9 Reasoning module (1) encode the scenario by a descriptor;
(2) recall several experience from the Memory module; (3) generate the prompt; (4) feed the prompt into the LLM; (5) decode the action from the LLMʼs response Memory moduleからの経験とLLMのcommon-sense knowledgeを利⽤して、traffic scenarioの decision-makingを⾏う。

10 Reflection module However, our goal is to make the
autonomous driving system learn from mistakes on its own. We discover that LLM can effectively act as a mistake rectifier. 衝突などを起こしてdecision-makingに間違いがあった場合は、LLMによりその状況を説明させ、修正内容を⽰させる。修正した内容をMemoryに格納することで似た状況で間違いが起こりづらくなる。

11 Experiments Closed-loopのsimulation environmentとしてHighway-envを使う。 • Memory moduleから引き出すshot数の違いを⽐較 0-shot, 1-shot, 3-shots,
5-shots • Memory initialization 5 human-crafted experiences • Memory stored experiencesの違いを⽐較 5, 20, 40 experiences 10 times with different seedsで実験する. https://github.com/Farama-Foundation/HighwayEnv

12 Experiments GPT-3.5 GPT-4 Chroma ベクトルDB Chromaの使い⽅について https://note.com/mahlab/ n/nb6677d0fc7c2
OpenAIのtext-embedding-ada- 002 modelを使ってvectorに変換され格納される。 Highway-env 各⾞のposition, speed, accelerationが与えられる。

13 Reasoning module prompt example LLM(GPT)に対してのtaskの説明 (固定の内容) ⼊⼒や望ましい出⼒形式など

14 Reasoning module prompt example Highway-envの現在フレームの状況を記述したtext ベクトル化してqueryとして Memoryに与え、保管されているシナリオから類似するも
のを取り出す。運転の指針：衝突を避け安全運転など (変更することもできる) 選択できる⾏動； IDLE, Turn-right, Acceleration, Deceleration, ・・・ COT(Chain of Thouhght)として、System promptsに続いてLLMに与えられる。

15 Example of extracted similar experiences from Memory 過去の似たシナリオとして抽出された 3shotsで２つがIdle,
1つがDeceleration を選択している。

16 Reflection module prompt example LLM(GPT)に対してのtaskの説明 (固定の内容)

17 Case study1 Reasoning 前⽅⾞との距離はあり、⾃⾞より少しだけ速い右レーンは前⽅⾞と距離は少しあり、⾃⾞より結構速い右レーンに移動するという決定

18 Case study2 Reasoning Driving intensionをHighwayから出るために、⼀番右のレーンに移動する必要があると変更右レーンの前⽅⾞との距離はある、⾃⾞よりは遅い。
右レーンに移動すると決定

19 Case study3 Reflection このシナリオについて、もとの Decision-makingは右レーンに移動して衝突してしまっている。

20 Case study3 Reflection 衝突の解析と教訓右レーンにいる⾞との相対距離と速度が考慮されていない。 (計算はしているが、 Appropriateという判断がされている)
右レーンにいる⾞との相対距離と速度、Time to collisionの計算がされ、右レーンへの移動は危険と判断し、減速と決定する。

21 Results 30 stepsでひとつのdriving-taskはcompleteになる。 20 experiences以上からの5-shotsは driving taskを完了できている。 40 experiencesでは
どのshot数でも中央値が25を超えている。 0-shotでは中央値が5以下

22 Compare with Reinforcement learning method Highway-envでSOTAのReinforcement Learning(RL) methodのGRAD(Graph Representation
for Autonomous Driving)と⽐較する。 it generates a global scene representation that includes estimated future trajectories of other vehicles. • lane-4-density-2で両⼿法をtrainingする。 • lane-4-density-2, lane-5-density-2.5, lane-5-density-3の3つの環境でテストする。 • DiLu: 40 experience in Memory, GRAD 600.000 training episodes GRADは異なる環境での性能劣化が⼤きい。失敗の多くは時間内にブレーキをかけられずに前⽅⾞に衝突してしまう。

23 Experiments on Generalization lane-4-density-2の環境での20 experiencesからlane-5-density-3の環境で適応できるか。中央値:13→5 中央値:30→23 それなりに低下は⾒られるが、利⽤できる shots数が多ければ低下度合いは⼩さい。

24 Experiments on Transformation Memory moduleに格納されるシナリオは⾃然⾔語で記載されており、環境が変わってもOKなはず。 Highway-envとCitySimの２つの環境でそれぞれ20experiencesを取得し、 lane-4-density-2とlane-5-density-3のシナリオでテストする。シナリオによる成功Step数のばらつきは⼤きめに⾒えるが、CitySimの実世界の⾞の軌跡が
lane-5-density-3のような複雑なシナリオにも効果があるようにも⾒える。 https://github.com/UCF-SST- Lab/UCF-SST-CitySim1-Dataset CitySim: ドローンから実際の道路状況を撮影したデータをもとにしている

25 Effectiveness of Reflection module ベースラインとしての20個のexperiences +12個のsuccessと6個のcorrection experienc +12個のsuccess experiences
+ 6個のcorrection experiences memoryにexperiencesを追加する効果が⾒られる。少数でも訂正したexperiencesを加える効果がある。

26 所感 GPTをAPI経由で使っているので、latencyは遅い。(5-10秒かかる) memory数をもっと⼤きめの設定の実験は難しい？(さらにlatencyが遅くなる？) 実験のHighway-envのsteps数や利⽤するmemory数は少なめ本当の⾃動運転のdecision-makingまでは課題もある。 data-drivenに対するknowledge-drivenだが、⼤量のデータを学習しているGTPを使ってはいる。 (task-specificなデータ・学習は少ないため、Generalized knowledge) GPTを⼈のGeneralized
knowledgeとしてフル活⽤

論文解説 DiLu: A Knowledge-Driven Approach to Auton...

論文解説 DiLu: A Knowledge-Driven Approach to Autonomous Driving with Large Language Models

koharite

More Decks by koharite

Other Decks in Research

Featured

Transcript

論⽂解説 DiLu: A Knowledge-Driven Approach to Autonomous Driving with Large

2 論⽂情報タイトル： DiLu: A Knowledge-Driven Approach to Autonomous Driving

3 knowledge-driven Drawing inspiration from the profound question posed by

4 LLM is as embodiment of human knowledge https://palm-e.github.io/ PaLM-E

5 Empower LLM to Autonomous driving (1) an environment with

6 The framework of DiLu 交通状況を⾔語化して、LLMに与えられるようにする。 Memory Moduleから過去の似た状況をとりだし、付帯する。⾃⾞の動きの

7 Demo screen Highway-env

8 Memory module Initialization Memory recall Memory storage We select

9 Reasoning module (1) encode the scenario by a descriptor;

10 Reflection module However, our goal is to make the

11 Experiments Closed-loopのsimulation environmentとしてHighway-envを使う。 • Memory moduleから引き出すshot数の違いを⽐較 0-shot, 1-shot, 3-shots,

12 Experiments GPT-3.5 GPT-4 Chroma ベクトルDB Chromaの使い⽅について https://note.com/mahlab/ n/nb6677d0fc7c2

13 Reasoning module prompt example LLM(GPT)に対してのtaskの説明 (固定の内容) ⼊⼒や望ましい出⼒形式など

14 Reasoning module prompt example Highway-envの現在フレームの状況を記述したtext ベクトル化してqueryとして Memoryに与え、保管されているシナリオから類似するも

15 Example of extracted similar experiences from Memory 過去の似たシナリオとして抽出された 3shotsで２つがIdle,

16 Reflection module prompt example LLM(GPT)に対してのtaskの説明 (固定の内容)

17 Case study1 Reasoning 前⽅⾞との距離はあり、⾃⾞より少しだけ速い右レーンは前⽅⾞と距離は少しあり、⾃⾞より結構速い右レーンに移動するという決定

18 Case study2 Reasoning Driving intensionをHighwayから出るために、⼀番右のレーンに移動する必要があると変更右レーンの前⽅⾞との距離はある、⾃⾞よりは遅い。

19 Case study3 Reflection このシナリオについて、もとの Decision-makingは右レーンに移動して衝突してしまっている。

20 Case study3 Reflection 衝突の解析と教訓右レーンにいる⾞との相対距離と速度が考慮されていない。 (計算はしているが、 Appropriateという判断がされている)

21 Results 30 stepsでひとつのdriving-taskはcompleteになる。 20 experiences以上からの5-shotsは driving taskを完了できている。 40 experiencesでは

22 Compare with Reinforcement learning method Highway-envでSOTAのReinforcement Learning(RL) methodのGRAD(Graph Representation

23 Experiments on Generalization lane-4-density-2の環境での20 experiencesからlane-5-density-3の環境で適応できるか。中央値:13→5 中央値:30→23 それなりに低下は⾒られるが、利⽤できる shots数が多ければ低下度合いは⼩さい。

25 Effectiveness of Reflection module ベースラインとしての20個のexperiences +12個のsuccessと6個のcorrection experienc +12個のsuccess experiences