Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Sebastian Schwarz- A Competitive Time-Trial AI ...

Sebastian Schwarz- A Competitive Time-Trial AI for Need for Speed: Most Wanted Using Deep Reinforcement Learning

15 years ago, at the height of my eSports career, I uploaded an (unofficial) world record at Need for Speed: Most Wanted (2005) (NFS:MW) to Youtube. In the meantime Deep Reinforcement Learning became popular and ever since I have dreamt of creating a competitive AI for my favorite racing game of all time: NFS:MW. Now finally the time was right: The hardware is fast enough, good software is available, and Sony's AI research has proven the task is actually doable.
This talk will present in detail all challenges and achievements in creating a competitive time-trial AI in NFS:MW from scratch including but not limited to hacking of the game to create a custom API, building a custom (real-time) OpenAI gym environment, steering the game using a virtual controller, and finally successfully training an AI using the Soft-Actor-Critic algorithm. All code including the API is written in Python and will be open sourced after the talk.

MunichDataGeeks

April 26, 2023
Tweet

More Decks by MunichDataGeeks

Other Decks in Science

Transcript

  1. A Competitive Time-Trial AI for A Competitive Time-Trial AI for

    Need for Speed: Most Wanted Need for Speed: Most Wanted Using Deep Reinforcement Learning Using Deep Reinforcement Learning Munich Datageeks Meetup Munich Datageeks Meetup by Sebastian Schwarz from E.ON (Data.ON) by Sebastian Schwarz from E.ON (Data.ON) Linkedin Linkedin 2023-04-25 2023-04-25 use ← ↑ ↓ → to navigate use ← ↑ ↓ → to navigate 1 / 61 1 / 61
  2. Introduction to Need for Speed:Most Wanted (2005) Introduction Arcade racing

    game, best selling game of the franchise with more than 16m copies sold Popular eSports title with major tournaments at Electronic Sports League (ESL) and World Cyber Games (WCG) Game Mode: Circuit (Basics) Played in races on circuits in 1:1 mode Usually 5-6 laps with standing start, first to complete all laps wins All tuning (except Junkman parts by WCG rules) and cars allowed (Best: Lotus Elise and Porsche Carrera GT) NOS (reason: cheating) and Collision (reason: lags) disabled by ESL and WCG rules 4 / 61
  3. Introduction: Unofficial World Record at Heritage Heights* Heritage Heights 1.06.85

    Lotus Elise, NO Nos, HD Heritage Heights 1.06.85 Lotus Elise, NO Nos, HD [*] By ESL rules an to the best of my knowledge at the time. There are no official leaderboards. 5 / 61
  4. Sony[1] F. Fuchs, Y. Song, E. Kaufmann, D. Scaramuzza and

    P. Dürr:, Super-Human Performance in Gran Turismo Sport Using Deep Reinforcement Learning, IEEE Robotics and Automation Letters, vol. 6, no. 3, pp. 4257-4264, July 2021, doi: https://10.1109/LRA.2021.3064284 Sony[2] Wurman, P.R., Barrett, S., Kawamoto, K. et al.: Outracing champion Gran Turismo drivers with deep reinforcement learning. Nature 602, 223– 228 (2022). https://doi.org/10.1038/s41586-021- 04357-7 Introduction: Sony Published Gran Turismo Sophy in 2022 6 / 61
  5. A Competitive Time-Trial AI for NFS:MW Using Deep RL A

    Competitive Time-Trial AI for NFS:MW Using Deep RL 8 / 61 8 / 61
  6. Because why Not?! Hardware to run the game at 100fps+

    and train a deep reinforcement learning model in real-time became commodity Software for deep reinforcement learning is available in python with stable-baselines3 Proven by Sony AI that it is possible with what later became Gran Turismo Sophy Use cases: Fine-tuning: car setups, usually trial and error Pushing the Boundary: what's the theoretical best? Better AI: competitive in-game AI A Competitive Time-Trial AI for NFS:MW Using Deep RL YES, but Why...? 9 / 61
  7. Implementing The Algorithm: Getting Started Neither Game API Nor Any

    Code is Publicly Available: A Start from Scratch 1. Custom gym Environment: Implement a custom (real-time) training environment (action, observation, reward, done) using OpenAI gym 2. Hack Game for API: Create a (real-time) game API in python with all necessary functions by "hacking" the game's memory and access it using pymem 3. Virtual Gamepad: Control the game in real-time with a virtual gamepd vgamepad 4. Agent Training: Train the deep reinforcement learning algorithm using the SAC algorithm (Soft Actor Critic) from stable_baselines3 with a pytorch backend 11 / 61
  8. Implementing The Algorithm: Custom gym Environment Basic Methods Need to

    be Implemented in NfsAiHotLap class NfsAiHotLap(gym.Env): """Custom Environment: NfsAiHotLap that follows the gym interface""" def __init__(self): """method to initialize especially the action and observation spaces""" self.action_space self.observation_space def step(self, action): """method to perform one action and return the reward and the next observation""" return observation, reward, done, info def reset(self): """method to reset the game and return the initial observation""" return observation def render(self, mode="human"): """method that outputs a render of the env, e.g. screenshot""" 13 / 61
  9. Implementing The Algorithm: Custom gym Environment Basic Methods Need to

    be Implemented in NfsAiHotLap class NfsAiHotLap(gym.Env): """Custom Environment: NfsAiHotLap that follows the gym interface""" def __init__(self): """method to initialize especially the action and observation spaces""" self.action_space self.observation_space def step(self, action): """method to perform one action and return the reward and the next observation""" return observation, reward, done, info def reset(self): """method to reset the game and return the initial observation""" return observation def render(self, mode="human"): """method that outputs a render of the env, e.g. screenshot""" 14 / 61
  10. Implementing The Algorithm: Custom gym Environment Basic Methods Need to

    be Implemented in NfsAiHotLap class NfsAiHotLap(gym.Env): """Custom Environment: NfsAiHotLap that follows the gym interface""" def __init__(self): """method to initialize especially the action and observation spaces""" self.action_space self.observation_space def step(self, action): """method to perform one action and return the reward and the next observation""" return observation, reward, done, info def reset(self): """method to reset the game and return the initial observation""" return observation def render(self, mode="human"): """method that outputs a render of the env, e.g. screenshot""" 15 / 61
  11. Implementing The Algorithm: Custom gym Environment Basic Methods Need to

    be Implemented in NfsAiHotLap class NfsAiHotLap(gym.Env): """Custom Environment: NfsAiHotLap that follows the gym interface""" def __init__(self): """method to initialize especially the action and observation spaces""" self.action_space self.observation_space def step(self, action): """method to perform one action and return the reward and the next observation""" return observation, reward, done, info def reset(self): """method to reset the game and return the initial observation""" return observation def render(self, mode="human"): """method that outputs a render of the env, e.g. screenshot""" 16 / 61
  12. Implementing The Algorithm: Custom gym Environment Basic Methods Need to

    be Implemented in NfsAiHotLap class NfsAiHotLap(gym.Env): """Custom Environment: NfsAiHotLap that follows the gym interface""" def __init__(self): """method to initialize especially the action and observation spaces""" self.action_space self.observation_space def step(self, action): """method to perform one action and return the reward and the next observation""" return observation, reward, done, info def reset(self): """method to reset the game and return the initial observation""" return observation def render(self, mode="human"): """method that outputs a render of the env, e.g. screenshot""" 17 / 61
  13. Implementing The Algorithm: Custom gym Environment action: Combined Steering and

    Acceleration Input Give the agent a virtual Xbox 360 Controller to steer and accelerate in-game reward: Immediate Reward for Action Taken (Target Variable) Proxy reward the agent by how quickly it goes around the track with delta lap completion observation: The Feature Vector for the Algorithm Give the agent telemetry (speed, acceleration, ...), lidar, and a gps navigation system done: Indicator if an Episode has ended Tell the agent when it's "Game Over" (one of lap completion, time limit, or reverse) 18 / 61
  14. Implementing The Algorithm: Hack Game for API There is No

    Game API Real-Time Data Required: Information from the game needs to be available in real-time, e.g. to calculate the reward, or end of an episode done even if the input to the game would be a screen capture Extra Options: NFSMW Extra Options for: windowed mode, more than 5 laps, teleport car at given speed and direction (for resetting after an episode), debug print Just Build One... With This Strategy Access speed.exe: The Game's variable are stored in the computers memory (RAM), for NFS:MW about 300 MB Search RAM: Find the right addresses (pointer) or patterns (dynamic) where the data you want is stored using a hex editor and scanning memory at various in-game situations for changes Read RAM: Use pymem to read variables from the game's memory in nanoseconds Real-Time calculation: Calculate everything else in python in real-time 20 / 61
  15. Implementing The Algorithm: Hack Game for API Example: Tracking x,

    y, z and speed from pymem import Pymem class TrackVehicle: def __init__(self): # NFS:MW process is called "speed.exe" self.pm = Pymem("speed.exe") def track(self): x = self.pm.read_float(0x00914560) # x-coordinate y = self.pm.read_float(0x00914564) # y-coordinate z = self.pm.read_float(0x00914568) # z-coordinate speed = self.pm.read_float(0x009142C8) # speed return((x, y, z, speed)) 21 / 61
  16. Implementing The Algorithm: Hack Game for API Currently Identified Variables

    Variable Type Example Detail x, y, z float -365, 1000, 156 coordinates of the car in the game in m (like gps: lat, long, elevation) speed float 88.76 speed of the car in m/s surface_l, surface_r int 0 surface the car is driving on with the left/right wheels respectively, e.g. asphalt, grass, ... angle int 0xFB12 direction of the car, must be a mapping of the Euler angle from [-pi, +pi] lap int 3 the current lap of the race steering, throttle float -0.6, 0.3 steering and throttle (currently only for Logitech RumblePad 2) gamestate int 6 dummy variable for the game state, e.g. menu or within race Based on these variables everything else is calculated in the NfsMw API 22 / 61
  17. Implementing The Algorithm: Hack Game for API Example: Racing Line

    and Track Boundaries (x, y, speed) for One Lap (01:09.810) 25 / 61
  18. Implementing The Algorithm: Hack Game for API Example: Steering and

    Speed (steering, speed) for One Lap (01:09.810) 26 / 61
  19. Implementing The Algorithm: Hack Game for API Some Available Methods

    in My NfsMw API: Vehicle Related class NfsMw(): def vehicle_telemetry(self): """returns telemetry data: x, y, z, speed, sfc_l, sfc_r, direction""" def vehicle_lidar(self, resolution_degree=1): """returns distances to next border for 180 degrees ahead""" def vehicle_collision(self): """returns if there is a collision""" def vehicle_airtime(self): """returns is vehicle is airborne""" def vehicle_reverse(self, rev_angle_threshold=0.6*np.pi): """returns is vehicle is reversing""" 27 / 61
  20. Implementing The Algorithm: Hack Game for API Some Available Methods

    in My NfsMw API: Lap Related class NfsMw(): def lap(self): """returns the current lap""" def laptime(self): """returns the lap time clock""" def lap_completion(self): """returns the lap completion""" def lap_angle_ahead(self, n_ahead=150): """returns the direction of the track n points ahead""" def lap_radii_ahead(self, n_ahead=150, inverse=True): """returns the inverse radii n points ahead""" 28 / 61
  21. Implementing The Algorithm: Hack Game for API Some Available Methods

    in My NfsMw API: Game Related class NfsMw(): def state(self): """returns the gamestate""" def reset_vehicle(self): """reset car to saved start location with hotkey from NFS:MW ExtraOpts""" def restart_race(self): """reset the whole race and start at lap 1 again""" def screenshot(self): """returns screenshot as numpy array""" 29 / 61
  22. In order to steer the car inputs need to be

    sent to the game It's known that controlling the game via Keyboard is too choppy and therefore slow I use the vgamepad library in python, wrapper for ViGEm (Virtual Gamepad Emulation Framework) to create a virtual Xbox 360 Controller Steering is mapped to the left analogue stick x-axis, acceleration is mapped to the right analogue stick y-axis (default in the game) Signal is continuous between [-1, +1] Implementing The Algorithm: Virtual Gamepad Just Give it a Virtual Xbox 360 Controller... 31 / 61
  23. Type: float32 Size: 2 Components: steering: left analogue stick x-axis,

    full left to full right encoded within [-1, +1] acceleration: right analogue stick y-axis, full brake to full throttle encoded within [-1, +1] Example: action=[-0.3, +1.0] would be 30% steering left and full throttle Notes: For steering every value makes sense For acceleration only [-0.4, 0.7, 1.0] make sense (brake, but not reverse, lift, full throttle) Implementing The Algorithm: Custom gym Environment action: Combined Steering and Acceleration Input Give the agent a virtual Xbox 360 Controller to steer and accelerate in-game 33 / 61
  24. Implementing The Algorithm: Custom gym Environment reward: Immediate Reward for

    Action Taken (Target Variable) Proxy reward the agent by how quickly it goes around the track with delta lap completion Type: float32 Size: 1 Components: Delta in lap completion between two steps Example: reward=0.00035 would mean 0.035% of additional lap completion achieved Notes: lap completion is within [0, 1] and measured to the in-game millimeter high enough discount factor [0.98, 0.99] ensures to find the racing line (unlike speed) 34 / 61
  25. Implementing The Algorithm: Custom gym Environment observation: The Feature Vector

    for the Algorithm Give the agent telemetry (speed, acceleration, ...), lidar, and a gps navigation system Type: float32 Size: 593 Components: vehicle_telemetry [4]: speed, acceleration, surface, direction vehicle_lidar [181]: 180 degree distances to the nearest boundary in 1 degree steps vehicle_collision [1]: collision indicator vehicle_reverse [1]: reverse indicator lap_radii_ahead [200]: inverse curve radii for 200 points (ca. 300m) ahead lap_angle_ahead [200]: curve angle for 200 points (ca. 300m) ahead streering_t5 [5]: last 5 steering inputs Example: Notes: This is the feature vector for the algorithm, and the only information it gets The agent is not trained on images as this would be slow All features need to be calculated in real-time from the game's variables Feature calculation takes 10ms on my machine 35 / 61
  26. Implementing The Algorithm: Custom gym Environment done: Indicator if an

    Episode has ended Tell the agent when it's "Game Over" (one of lap completion, time limit, or reverse) Type: bool Size: 1 Components: Indicator of Episode has ended Example: done=False would mean the episode has not ended Notes: There are 3 reasons for done=True lap completion (best case) time limit (here: 180 seconds) vehicle reverse (more than 110 degree turn relative to track direction) 40 / 61
  27. Implementing The Algorithm: Agent Training That's Actually the Easy Part

    With stable_baselines3 (Basic Code) from stable_baselines3 import SAC from control.vnfsgamepad import VNfsGamePad from env import NfsAiHotLap # init virtual game pad pad = VNfsGamePad() # create environment nfsmwai = NfsAiHotLap(pad) # create model model = SAC("MlpPolicy", nfsmwai, gamma=0.985) # learn model.learn(total_timesteps=1e7) The main code has some more features like a logging and saving callback 42 / 61
  28. Implementing The Algorithm: Agent Training That's Actually the Easy Part

    With stable_baselines3 (Basic Code) from stable_baselines3 import SAC from control.vnfsgamepad import VNfsGamePad from env import NfsAiHotLap # init virtual game pad pad = VNfsGamePad() # create environment nfsmwai = NfsAiHotLap(pad) # create model model = SAC("MlpPolicy", nfsmwai, gamma=0.985) # learn model.learn(total_timesteps=1e7) Instantiates the virtual Xbox 360 Gamepad. Needs to run before the game starts There is also a VNfsKeyboard class, but it's slower (only discrted -1 or +1) input 43 / 61
  29. Implementing The Algorithm: Agent Training That's Actually the Easy Part

    With stable_baselines3 (Basic Code) from stable_baselines3 import SAC from control.vnfsgamepad import VNfsGamePad from env import NfsAiHotLap # init virtual game pad pad = VNfsGamePad() # create environment nfsmwai = NfsAiHotLap(pad) # create model model = SAC("MlpPolicy", nfsmwai, gamma=0.985) # learn model.learn(total_timesteps=1e7) Instantiates the game environment, i.e. the custom env Uses the NfsMw api to interact with the game and to calculate observation, reward, done, action 44 / 61
  30. Implementing The Algorithm: Agent Training That's Actually the Easy Part

    With stable_baselines3 (Basic Code) from stable_baselines3 import SAC from control.vnfsgamepad import VNfsGamePad from env import NfsAiHotLap # init virtual game pad pad = VNfsGamePad() # create environment nfsmwai = NfsAiHotLap(pad) # create model model = SAC("MlpPolicy", nfsmwai, gamma=0.985) # learn model.learn(total_timesteps=1e7) Defines the model as Soft Actor Critic (SAC), using a Multilayer Perceptron (MLP) (fully connected feed- forward DNN) Discount factor for future reward set to 0.985 (default 0.99) 45 / 61
  31. Implementing The Algorithm: Agent Training That's Actually the Easy Part

    With stable_baselines3 (Basic Code) from stable_baselines3 import SAC from control.vnfsgamepad import VNfsGamePad from env import NfsAiHotLap # init virtual game pad pad = VNfsGamePad() # create environment nfsmwai = NfsAiHotLap(pad) # create model model = SAC("MlpPolicy", nfsmwai, gamma=0.985) # learn model.learn(total_timesteps=1e7) Starts the training progress Training in real-time at 30-60hz takes about 20h on my 7 year old gaming PC (Intel Xeon E3-1231 v3, Nvidia Geforce GTX1070) 46 / 61
  32. Conclusion Challenges with the approach: great cornering, bad on stratights

    Reward: Instant reward only with tiny, not very sensitive to jitter (higher discount factor (gamma) does not work). Penalties on steering did not work so far. Real-Time: Learning/input steps and game are not synced, possible fix: Real-Time Gym (rtgym) Game FPS: Unstable frame times of the game Exploration Trade-Off: Potentially too high exploration coefficient (ent_coef) (manually lowering did not work) Filter: Filtering at testing does not work: Inexperience high speed at corners after long straight Observation Noisy and maybe incomplete input features, e.g. does not see many things, e.g. airtime, surface ahead, obstacles, ... 55 / 61
  33. Conclusion Time-Trail AI for Need for Speed: Most Wanted Works

    Well: Using SAC from stable_baselines3 and some basic features a good speed (1:10:47) and a very stable model can be achieved in a quick time Accessible: Training in real-time at 30-60hz is possible on my 7 year old gaming PC (Intel Xeon E3-1231 v3, Nvidia Geforce GTX1070) Comparably Fast: Training takes about 24-48h (single instance of the game, single car) in real-time (Sony[1]: 56-73h, but 80 cars in parallel on 4 PS4, Sony[2]: up to 1000 PS4 in parallel) Generalization: Trained model generalizes very well to other cars without re-training Track Specific: However does generalize poorly for other tracks, makes the track, but many mistakes (same for humans btw.) 56 / 61
  34. Thank You Special Thanks Inko Elgezua Fernández, robotics and reinforcement

    learning expert, and colleague at E.ON, for many helpful comments, fruitful discussions, and motivation Anıl Gezergen (@nlgxzef) from NFSMW Extra Options, who does a sick job reverse engineering in-game functions and helped with some comments to get me started Florian Fuchs from Sony AI for pointing me at a pre-print version with more appendices for Sony[1] and some comments Yann Bouteiller autor and creator of vgamepad running a similar project tmrl for TrackMania for a helpful comment 58 / 61
  35. Thank You Now Make a Pull-Request! Let's get the WR!

    Code and Models: are Open Source and properly structured and ok-ish documented from today onwards: gitlab.com/schw4rz/nfsmwai A Competitive Time-Trial AI for NFS:MW Using Deep RL 59 / 61
  36. In-depth knowledge: about the game from my time as eSports

    athlete: benchmark (WR) and car setups available Stability of conditions: no damage, (tire) degradation, or weather effects on car or track Game properties: clear track boundaries, no weird driving techniques like wall-rides or double-steering required Active Community: Active time-trial and modding community, e.g. NFSMW Extra Options A Competitive Time-Trial AI for NFS:MW Using Deep RL Need for Speed: Most Wanted is a Good Fit 61 / 61