期刊文献+

回报函数学习的学徒学习综述 被引量:2

Survey of apprenticeship learning based on reward function learning
在线阅读 下载PDF
导出
摘要 通过研究基于回报函数学习的学徒学习的发展历史和目前的主要工作,概述了基于回报函数学习的学徒学习方法.分别在回报函数为线性和非线性条件下讨论,并且在线性条件下比较了2类方法——基于逆向增强学习(IRL)和最大化边际规划(MMP)的学徒学习.前者有较为快速的近似算法,但对于演示的最优性作了较强的假设;后者形式上更易于扩展,但计算量大.最后,提出了该领域现在还存在的问题和未来的研究方向,如把学徒学习应用于POMDP环境下,用PBVI等近似算法或者通过PCA等降维方法对数据进行学习特征的提取,从而减少高维度带来的大计算量问题. This paper focuses on apprenticeship learning, based on reward function learning. Both the historical basis of this field and a broad selection of current work were investigated. In this paper, two kinds of algorithm--apprenticeship learning methods based on inverse reinforcement learning (IRL) and maximum margin planning (MMP) frameworks were discussed under respective assumptions of linear and nonlinear reward functions. Comparison was made under the linear assumption conditions. The former can be implemented with an efficient approximate method but has made a strong supposition of optimal demonstration. The latter has a relatively easy to extend form but may take large amounts of computation. Finally, some suggestions were given for further research in reward function learning in a partially observable Markov decision process (POMDP) environment and in continuous/ high dimensional space, using either an approximate algorithm such as point-based value iteration (PBVI) or a feature abstraction algorithm using dimension reduction methods such as principle component analysis (PCA). Adopting these may alleviate the curse of dimensionality.
出处 《智能系统学报》 2009年第3期208-212,共5页 CAAI Transactions on Intelligent Systems
基金 国家自然科学基金资助项目(90820306) 浙江省科技厅重大资助项目(006c13096)
关键词 学徒学习 回报函数 逆向增强学习 最大化边际规划 apprenticeship learning reward function inverse reinforcement learning maximum margin planning
作者简介 金卓军,男,1984年生,博士研究生,主要研究方向为机器学习. 通信作者:钱徽.E—mail:qianhui@zju.edu.cn.钱徽,男,1974年生,副教授,人工智能学会智能机器人专业委员会委员,主要研究方向为人工智能、计算机视觉. 陈沈轶,男,1980生,博士研究生,主要研究方向为机器学习.
  • 相关文献

参考文献22

  • 1ATKESON C G,SCHAAL S.Robot learning from demonstration[C]//Proceedings of the Fourteenth International Conference on Machine Learning.Nashville,USA,1997:12-20.
  • 2RATLIFF N D,BAGNELL J A,ZINKEVICH M A.Maximum margin planning[C]//Proceedings of the 23rd International Conference on Machine Learning.Pittsburgh,USA,2006:729-736.
  • 3金卓军,钱徽,陈沈轶,朱淼良.基于回报函数逼近的学徒学习综述[J].华中科技大学学报(自然科学版),2008,36(S1):288-290. 被引量:2
  • 4NG A Y,RUSSELL S J.Algorithms for inverse reinforcement learning[C]//Proceedings of the Seventeenth International Conference on Machine Learning.San Francisco,USA,2000:663-670.
  • 5ABBEEL P,NG A Y.Apprenticeship learning via inverse reinforcement learning[C]//Proceedings of the Twenty-first International Conference on Machine Learning.Banff,Canada,2004:1-8.
  • 6KOLTER J Z,ABBEEL P,NG A Y.Hierarchical apprenticeship learning with application to quadruped locomotion[C]//Advances in Neural Information Processing Systems.Cambridge,USA:MIT Press,2008.
  • 7RATLIFF N,BAGNELL J A,ZINKEVICH M A.Subgradient methods for maximum margin structured learning[C]//Workshop on Learning in Structured Outputs Spaces at ICML.Pittsburgh,USA,2006.
  • 8SYED U,BOWLING M,SCHAPIRE R E.Apprenticeship learning using linear programming[C]//Proceedings of the 25 International Conference on Machine Learning (ICML 2008).Helsinki,Finland,2008:1032-1039.
  • 9SYED U,SCHAPIRE R E.A game-theoretic approach to apprenticeship learning[C]//Advances in Neural Information Processing Systems.Cambridge,USA:MIT Press,2008.
  • 10GRIMES D B,RAJESH D R,RAO R P N.Learning nonparametric models for probabilistic imitation[C]//Proceedings of Neural Information Processing Systems.Cambridge,USA:MIT Press,2007:521-528.

二级参考文献10

  • 1Ratliff D N,Bagnell J A,Zinkevich M.Maximummargin planning[].Proceedings of the rd Inter-national Conference on Machine Learning.2006
  • 2Ng Y,Russell J S.Algorithms for inverse reinforce-ment learning[].Proceedings of the SeventeenthInternational Conference on Machine Learning.2000
  • 3Abbeel P Y,Ng Y.Apprenticeship learning via in-verse reinforcement learning[].Proceedings of theTwenty-first International Conference on MachineLearning.2004
  • 4Kolter J Z,Abbeel P Y,Ng A.Hierarchical appren-ticeship learning with application to quadruped loco-motion[].Advances in Neural Information Process-ing Systems.2008
  • 5Taskar B,Lacoste-Julien S,Jordan M.Structuredprediction via the extragradient method[].Proceed-ings of Neural Information Processing Systems.2005
  • 6Pieter A,Andrew Y N.Exploration and apprentice-ship learning in reinforcement learning[].Proceed-ings of the nd International Conference on MachineLearning.2005
  • 7Kolter J Z,Rodgers M P,Ng A Y.A complete con-trol architecture for quadruped locomotion over roughterrain[].Proceedings of the International Confer-ence on Robotics and Automation.2008
  • 8Rebula J R,Neuhaus P D,Bonnlander B V,et al.Acontroller for the littledog quadruped walking onrough terrain[].IEEE International Conference onRobotics and Automation.2007
  • 9Ratliff N,Bagnell J A,Srinivasa S.Imitation learn-ing for locomotion and manipulation. CMU-RI-TR-07-45 . 2007
  • 10Kaelbling L P,Littman M L,Moore A W.Rein-forcement learning:a survey[].Journal of Artifi-cial Intelligence Research.1996

共引文献1

同被引文献31

  • 1黄炳强,曹广益,王占全.强化学习原理、算法及应用[J].河北工业大学学报,2006,35(6):34-38. 被引量:19
  • 2Nakada H, Hirofuchi T. Toward virtual machine packing optimization based on generic algorithm[C] //Proc of the 10th International Work-Conference on Artificial Neural Networks:Part Ⅱ:Distributed Computing and Ambient Assisted Living. 2009:651-654.
  • 3Nakada H, Hirofuchi T. Eliminating datacenter idle power with dynamic and intelligent VM relocation[C] //Advances in Intelligent and Soft Computing. 2010:645-648.
  • 4Skiena S S. The algorithm design manual[M] . Berlin:Springer, 2008:595-598.
  • 5De la Vega W F, Lueker G S. Bin packing can be solved within epsilon in linear time[J] . Combinatorica, 1981, 1(4):349-355.
  • 6Hyser C, Mckee B, Gardner R, et al. Autonomic virtual machine placement in the data center, HPL-2007-189[R] . [S. l.] :HP Labs, 2007.
  • 7Padala P, Hou Kaiyuan, Shin K G, et al. Automated control of multiple virtualized resources[M] . Nuremberg:ACM EuroSys, 2009:13-26.
  • 8Mylavarapu S, Sukthankar V, Banejee P. An optimized capacity planning approach for virtual infrastructure exhibiting stochastic workload[C] //Proc of ACM Symposium on Applied Computing. 2010:386-390.
  • 9Zhou Wenyu, Yang Shonbao, Fang Jun, et al. VMC-tune:a load balancing scheme for virtual machine cluster based on dynamic resource allocation[C] //Proc of International Conference on Grid and Cooperative Computing. 2010:81-86.
  • 10Wood T, Shenoy P, Venkataramani A, et al. Sandpiper:black-box and gray-box resource management for virtual machines[J] . Computer Networks, 2009, 53 (17):2923-2938.

引证文献2

二级引证文献35

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部