期刊文献+

强化学习及其在电脑围棋中的应用 被引量:33

Reinforcement Learning and Its Application to the Game of Go
在线阅读 下载PDF
导出
摘要 强化学习是一类特殊的机器学习,通过与所在环境的自主交互来学习决策策略,使得策略收到的长期累积奖赏最大.最近,在围棋和电子游戏等领域,强化学习被成功用于取得人类水平的操作能力,受到了广泛关注.本文将对强化学习进行简要介绍,重点介绍基于函数近似的强化学习方法,以及在围棋等领域中的应用. Reinforcement learning is a particular type of machine learning that autonomously learns from interactions with the environment, so that its long-term reward is maximized. It has recently been successfully applied to playing the game of Go and video games, and human expert level is demonstrated. Since these results are receiving increasing attentions, this paper briefly introduces reinforcement learning, focusing on the methods with function approximation,and its applications in the game of Go.
作者 陈兴国 俞扬
出处 《自动化学报》 EI CSCD 北大核心 2016年第5期685-695,共11页 Acta Automatica Sinica
基金 国家自然科学基金(61403208 61375061) 南京邮电大学引进人才科研启动基金(NY214014)资助~~
关键词 强化学习 函数近似 核方法 神经网络 加性模型 深度强化学习 Reinforcement learning linear function approximation kernel methods neural networks additive model deep reinforcement learning
作者简介 陈兴国 南京邮电大学计算机学院/软件学院讲师.2014年获得南京大学计算机系博士学位.主要研究方向为机器学习,强化学习.E-mail:chenxg@njupt.edu.cn 俞扬 南京大学计算机系副教授,2011年获得南京大学计算机系博士学位.主要研究方向为机器学习,演化学习,强化学习.本文通信作者.E—mail:yuy@nju.edu.cn
  • 相关文献

参考文献3

二级参考文献25

  • 1Zhou Yatong Zhang Taiyi Li Xiaohe.MULTI-SCALE GAUSSIAN PROCESSES MODEL[J].Journal of Electronics(China),2006,23(4):618-622. 被引量:4
  • 2Anderson J R. Cognitive Psychology and Its Applications(third edition) [M]. New York: Freeman, 1990.
  • 3Sutton R S, Barto A G. Reinforcement Learning [M]. Cambridge. MIT Press, 1998.
  • 4Bowling M, Veloso M. Reusing learned policies between similar problems[A]. Proceedings of AI* IA-98 Workshop on New Trends in Robotics [C]. Berlin, Germany: Springer Verlag. 1998.
  • 5Femandez F, Veloso M. Probabilistic policy reuse in a reinforcement learning agent[A]. Proceedings of the Fifth International Conference on Autonomous Agents and Multi-Agent Systems[C]. New York: ACM, 2006.
  • 6Femandez F, Veloso M. Policy reuse for transfer learning across tasks with different state and action spaces[A]. Proceedings of The ICML-06 Workshop on Structural Knowledge Transfer for Machine Learning[ C]. New York: ACM, 2006.
  • 7Bemstein D S. Reusing old policies to accelerate learning on new MDPs[ R]. Amherst: Amherst College, University of Massachusetts, 1999.
  • 8Pickett M, Barto A G. PolicyBlocks: an algorithm for creating useful macro-actions in reinforcement learning[ A]. Proceedings of the Nineteenth International Conference on Machine Learning [ C]. San Francisco: Morgan Kaufmann, 2002. 506 - 513.
  • 9Mcgovem A, Barto A G. Automatic discovery of subgoals in reinforcement learning using diverse density [ A ]. Proceedings of the Eighteenth International Conference on Machine Learning[ C]. San Francisco: Morgan Kaufmann, 2001. 361 - 368.
  • 10Dietterich T G. Hierarchical reinforcement learning with the MAXQ value function decomposition[ J]. Journal of Artificial Intelligence Research, 2000, 13 (2) : 227 - 303.

共引文献458

同被引文献248

引证文献33

二级引证文献409

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部