Mar. 2016 - Beats Lee Sedol (9-dan professional) in five-game match (v. Lee) May 2017 - Beats Ke Jie the world's top Go player (v. Master) October 2017 - AlphaGo Zero beats Alpha Go (v.Lee) (100-0) with an algorithm based solely on reinforcement learning, without human data. Alpha Go "Mastering the game of Go without human knowledge". Nature. 19 October 2017. Retrieved 19 October 2017.
psychology, concerned with how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward. [1] Like a human, our agents learn for themselves to achieve successful strategies that lead to the greatest long-term rewards. This paradigm learning by trial-and-error, solely from rewards or punishments, is known as reinforcement learning (RL). [2] [1] https://en.wikipedia.org/wiki/Reinforcement_learning [2] https://deepmind.com/blog/deep-reinforcement-learning/
learning algorithms. MIT License, Last commit: November 2017 • baselines: high-quality implementations of reinforcement learning algorithms, MIT License, Last commit: November 2017 • TensorForce, A TensorFlow library for applied reinforcement learning, Apache 2, Last commit: November 2017 • DeepRL, Highly modularized implementation of popular deep RL algorithms by PyTorch, Apache 2 License, Last commit: November 2017 • RLlab, a framework for developing and evaluating reinforcement learning algorithms, MIT License, Last commit: July 2017 • AgentNet, Python library for deep reinforcement learning using Theano+Lasagne, MIT License, Last commit: August 2017 • RLPy, the Reinforcement Learning Library for Education and Research, 3-Clause BSD License, Last commit: April 2016. • PyBrain, the Python Machine Learning Library, 3-Clause BSD License, Last commit: March 2016. Python libraries for Reinforcement learning
the State / Observation Gym action_space: The Space object corresponding to valid actions observation_space: The Space object corresponding to valid observations reward_range: A tuple corresponding to the min and max possible rewards Baselines
Observation Open AI Gym action_space: left, right observation_space: x, x_dot, theta, theta_dot reward_range: 1 - poll hasn’t felt 0 - poll has felt “The Algorithm” e.g. DeepQ Baselines