submitted https://openreview.net/forum?id=r1lyTjAqYX [2] Volodymyr Mnih, et al., Asynchronous methods for deep reinforcement learning. In International conference on machine learning, pp. 1928–1937, 2016. https://arxiv.org/abs/1602.01783 [3] Matthew Hausknecht and Peter Stone. Deep recurrent Q-learning for partially observable MDPs. CoRR, abs/1507.06527, 7(1), 2015. https://arxiv.org/abs/1507.06527 [4] Dan Horgan, et al., Distributed prioritized experience replay. ICLR2018. https://arxiv.org/abs/1803.00933 [5] Lasse Espeholt, et al. Impala: Scalable distributed deep-rl with importance weighted actor-learner architectures. arXiv preprint arXiv:1802.01561, 2018. https://arxiv.org/abs/1802.01561
International Conference on Learning Representations, 2016. https://arxiv.org/abs/1511.05952 [7] Hado van Hasselt, et al. Deep reinforcement learning with double Q-learning. In Advances in Neural Information Processing Systems, 2016. https://arxiv.org/abs/1509.06461 [8] Ziyu Wang, et al. Dueling network architectures for deep reinforcement learning. In International Conference on Machine Learning, 2016. https://arxiv.org/abs/1511.06581 [9] Richard Sutton and Andrew Barto. Reinforcement learning: An introduction. MIT press Cambridge, 1998. [10] Fortunato, et al. 2017. Noisy networks for exploration. CoRR abs/1706.10295. https://arxiv.org/abs/1706.10295 [11] Bellemare, M., et al. Unifying count-based exploration and intrinsic motivation. In Advances in Neural Information Processing Systems 2016 (pp. 1471-1479). https://arxiv.org/abs/1606.01868 [12] Hester, T., et al. Deep Q-learning from Demonstrations. arXiv preprint arXiv:1704.03732. https://arxiv.org/abs/1704.03732
by watching YouTube. arXiv preprint arXiv:1805.11592. https://arxiv.org/abs/1805.11592 [14] Learning Montezuma’s Revenge from a Single Demonstration. OpenAI Blog. https://blog.openai.com/learning-montezumas-revenge-from-a-single-demonstration/ [15] David Ha and Jürgen Schmidhuber, World Models. arXiv preprint arXiv:1803.10122. https://arxiv.org/abs/1803.10122 [16] The Laplacian in RL: Learning Representations with Efficient Approximations, ICLR2019 submitted https://openreview.net/forum?id=HJlNpoA5YQ [17] Jack Harmer, et al., Imitation Learning with Concurrent Actions in 3D Games, arXiv preprint arXiv:1803.05402. https://arxiv.org/abs/1803.05402 [18] Teh, Y., et al. Distral: Robust multitask reinforcement learning. In Advances in Neural Information Processing Systems 2017 (pp. 4496-4506). https://arxiv.org/abs/1707.04175 [19] Hassel, Matteo, et al. Multi-task deep reinforcement learning with popart. arXiv preprint arXiv:1809.04474. https://arxiv.org/abs/1809.04474
deep reinforcement learning. arXiv preprint arXiv:1507.04296. https://arxiv.org/abs/1507.04296 [21] Babaeizadeh, Mohammad, et al. "Reinforcement learning through asynchronous advantage actor-critic on a gpu." arXiv preprint arXiv:1611.06256. https://arxiv.org/abs/1611.06256 [22] Weaver Lex and Tao Nigel. The optimal reward baseline for gradient-based reinforcement learning. In: Proceedings of the Seventeenth conference on Uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc., 2001. p. 538-545. https://arxiv.org/abs/1301.2315 [23] Munos, Rémi, et al. Safe and efficient off-policy reinforcement learning. In: Advances in Neural Information Processing Systems. 2016. p. 1054-1062. https://arxiv.org/abs/1606.02647