期刊文献+

基于回报函数逼近的学徒学习综述 被引量:2

Survey of apprenticeship learning based on reward function approximating
原文传递
导出
摘要 回顾了基于回报函数逼近的学徒学习的发展历史,介绍了目前的主要工作,总结了学徒学习的一般方法,讨论了线性和非线性假设条件下的回报函数求解,比较了逆向增强学习(IRL)和边际最大化(MMP)两类逼近方法.基于IRL的学徒学习是一种通过迭代的方法用基回报函数的线性组合来逼近真实回报函数的过程.MMP方法可以看作是一类基于梯度下降的最优化方法.综合采用滤波及将策略函数概率化等方法可以降低对专家演示的最优要求.最后指出了该领域存在的问题,提出了未来的研究方向,如在部分可观察马尔可夫决策过程框架下的学徒学习及对不确定策略的学习等. This paper surveys reward function approximating based apprenticeship learning.Both the historical basis and a broad selection of current work are summarized.Two kinds of well-known frameworks,inverse reinforcement learning(IRL) and maximum margin planning(MMP),are discussed under the assumptions of both linear and nonlinear reward function.IRL based learning is an iterative process of approaching ideal reward function using linear combination of basis functions,MMP is a set of gradient-based algorithms for...
出处 《华中科技大学学报(自然科学版)》 EI CAS CSCD 北大核心 2008年第S1期288-290,294,共4页 Journal of Huazhong University of Science and Technology(Natural Science Edition)
基金 浙江省科技厅重大项目(2006c13096)
关键词 学徒学习 回报函数 综述 逆向增强学习 边际最大化 apprenticeship learning reward function survey inverse reinforcement learning maximum margin planning
  • 相关文献

参考文献10

  • 1Ratliff D N,Bagnell J A,Zinkevich M.Maximummargin planning[].Proceedings of the rd Inter-national Conference on Machine Learning.2006
  • 2Ng Y,Russell J S.Algorithms for inverse reinforce-ment learning[].Proceedings of the SeventeenthInternational Conference on Machine Learning.2000
  • 3Abbeel P Y,Ng Y.Apprenticeship learning via in-verse reinforcement learning[].Proceedings of theTwenty-first International Conference on MachineLearning.2004
  • 4Kolter J Z,Abbeel P Y,Ng A.Hierarchical appren-ticeship learning with application to quadruped loco-motion[].Advances in Neural Information Process-ing Systems.2008
  • 5Taskar B,Lacoste-Julien S,Jordan M.Structuredprediction via the extragradient method[].Proceed-ings of Neural Information Processing Systems.2005
  • 6Pieter A,Andrew Y N.Exploration and apprentice-ship learning in reinforcement learning[].Proceed-ings of the nd International Conference on MachineLearning.2005
  • 7Kolter J Z,Rodgers M P,Ng A Y.A complete con-trol architecture for quadruped locomotion over roughterrain[].Proceedings of the International Confer-ence on Robotics and Automation.2008
  • 8Rebula J R,Neuhaus P D,Bonnlander B V,et al.Acontroller for the littledog quadruped walking onrough terrain[].IEEE International Conference onRobotics and Automation.2007
  • 9Ratliff N,Bagnell J A,Srinivasa S.Imitation learn-ing for locomotion and manipulation. CMU-RI-TR-07-45 . 2007
  • 10Kaelbling L P,Littman M L,Moore A W.Rein-forcement learning:a survey[].Journal of Artifi-cial Intelligence Research.1996

同被引文献26

  • 1ATKESON C G,SCHAAL S.Robot learning from demonstration[C]//Proceedings of the Fourteenth International Conference on Machine Learning.Nashville,USA,1997:12-20.
  • 2RATLIFF N D,BAGNELL J A,ZINKEVICH M A.Maximum margin planning[C]//Proceedings of the 23rd International Conference on Machine Learning.Pittsburgh,USA,2006:729-736.
  • 3NG A Y,RUSSELL S J.Algorithms for inverse reinforcement learning[C]//Proceedings of the Seventeenth International Conference on Machine Learning.San Francisco,USA,2000:663-670.
  • 4ABBEEL P,NG A Y.Apprenticeship learning via inverse reinforcement learning[C]//Proceedings of the Twenty-first International Conference on Machine Learning.Banff,Canada,2004:1-8.
  • 5KOLTER J Z,ABBEEL P,NG A Y.Hierarchical apprenticeship learning with application to quadruped locomotion[C]//Advances in Neural Information Processing Systems.Cambridge,USA:MIT Press,2008.
  • 6RATLIFF N,BAGNELL J A,ZINKEVICH M A.Subgradient methods for maximum margin structured learning[C]//Workshop on Learning in Structured Outputs Spaces at ICML.Pittsburgh,USA,2006.
  • 7SYED U,BOWLING M,SCHAPIRE R E.Apprenticeship learning using linear programming[C]//Proceedings of the 25 International Conference on Machine Learning (ICML 2008).Helsinki,Finland,2008:1032-1039.
  • 8SYED U,SCHAPIRE R E.A game-theoretic approach to apprenticeship learning[C]//Advances in Neural Information Processing Systems.Cambridge,USA:MIT Press,2008.
  • 9GRIMES D B,RAJESH D R,RAO R P N.Learning nonparametric models for probabilistic imitation[C]//Proceedings of Neural Information Processing Systems.Cambridge,USA:MIT Press,2007:521-528.
  • 10ABBEEL P,COATES A,QUIGLEY M,et al.An application of reinforcement learning to aerobatic helicopter flight[C]//Proceedings of Neural Information Processing Systems.Cambridge,USA:MIT Press,2007:1-8.

引证文献2

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部