期刊文献+

随机需求车辆路径问题的价值逼近在线决策 被引量:4

Value-approximation-based online policy for vehicle routing problem with stochastic demand
在线阅读 下载PDF
导出
摘要 随着高效实时物流的发展,不确定车辆路径问题面临着兼顾决策精度和实时响应能力的新挑战.本文以应用最为广泛的随机需求车辆路径问题为例,研究提出一种有效的在线决策方法.首先,考虑多车辆同时在线,以总旅行成本最小化为目标,建立马尔科夫决策模型,并引入可信度约束和邻域半径减少策略缩小行动空间,提高求解效率.其次,设计强化学习中的价值逼近算法求解模型,其中,采用基函数估计期望未来成本,并将求解过程分离为离线训练和在线决策两个环节,基函数的权重被离线训练并用于在线决策以减少在线决策时间,同时,在算法中嵌入了邻域半径的动态更新机制.最后,测试多组算例验证了本文方法的有效性. With the development of effective real-time logistic, new challenges of making high-quality and real-time dynamic routing decisions have been brought to uncertain vehicle routing problem(VRP). This paper focuses on the vehicle routing problem with stochastic demand(VRPSD), a well-known uncertain VRP, and proposes an effective online method for solving it. First, considering multiple vehicles, we formulate a multi-vehicle Markov decision process(M-MDP), with the aim of minimizing the total travel cost. In the model, the credibility constraints and the neighborhood radius reduction strategy are introduced to reduce action space, which improves the efficiency. Second, we develop a reinforcement learning technology, namely value approximation iteration including offline training phase and online execution phase, to solve the model. In the method, the expected cost-to-go is estimated by a set of basis functions designed, the weight vector of basis function is trained offline to reduce online calculation time, and also, the value of neighborhood radius is dynamically updated offline. Numerical experiments show that the proposed method has good performance in both solution quality and time efficiency.
作者 张晓楠 张建雄 ZHANG Xiao-nan;ZHANG Jian-xiong(College of Management and Economics,Tianjin University,Tianjin 300072,China;College of Mechanical and Electrical Engineering,Shaanxi University of Science and Technology,Xi’an Shaanxi,710021,China)
出处 《控制理论与应用》 EI CAS CSCD 北大核心 2022年第2期241-254,共14页 Control Theory & Applications
基金 国家自然科学基金项目(71802120,71971152) 陕西省创新能力支撑计划(2020KRM024) 陕西省教育厅专项科研计划项目(19JK0125)资助。
关键词 路径问题 随机需求 马尔可夫决策 强化学习 价值逼近算法 routing problems stochastic demand Markov decision process reinforcement learning value approximation iteration
作者简介 张晓楠讲师,博士,目前研究方向为物流系统优化和智能优化算法,E-mail:WLxn_2010@126.com;通信作者:张建雄,教授,博士,博士生导师,决策科学领域中国高被引学者,目前研究方向为动态供应链管理、复杂系统建模与优化,E-mail:jxzhang@tju.edu.cn.
  • 相关文献

参考文献15

二级参考文献223

共引文献303

同被引文献43

引证文献4

二级引证文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部