Adaptive Multi-Step Evaluation Design With Stability Guarantee for Discrete-Time Optimal Learning Control 被引量：7

在线阅读下载PDF

导出

摘要 This paper is concerned with a novel integrated multi-step heuristic dynamic programming(MsHDP)algorithm for solving optimal control problems.It is shown that,initialized by the zero cost function,MsHDP can converge to the optimal solution of the Hamilton-Jacobi-Bellman(HJB)equation.Then,the stability of the system is analyzed using control policies generated by MsHDP.Also,a general stability criterion is designed to determine the admissibility of the current control policy.That is,the criterion is applicable not only to traditional value iteration and policy iteration but also to MsHDP.Further,based on the convergence and the stability criterion,the integrated MsHDP algorithm using immature control policies is developed to accelerate learning efficiency greatly.Besides,actor-critic is utilized to implement the integrated MsHDP scheme,where neural networks are used to evaluate and improve the iterative policy as the parameter architecture.Finally,two simulation examples are given to demonstrate that the learning effectiveness of the integrated MsHDP scheme surpasses those of other fixed or integrated methods.

作者 Ding Wang Jiangyu Wang Mingming Zhao Peng Xin Junfei Qiao

机构地区 IEEE Faculty of Information Technology

出处《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2023年第9期1797-1809,共13页 自动化学报（英文版）

基金 the National Key Research and Development Program of China(2021ZD0112302) the National Natural Science Foundation of China(62222301,61890930-5,62021003) the Beijing Natural Science Foundation(JQ19013).

关键词 Adaptive critic artificial neural networks Hamilton-Jacobi-Bellman(HJB)equation multi-step heuristic dynamic programming multi-step reinforcement learning optimal control

分类号 O232 [理学—运筹学与控制论] TP18 [自动化与计算机技术—控制理论与控制工程]

作者简介 Corresponding author:Ding Wang(Senior Member,IEEE)received the Ph.D.degree in control theory and control engineering from Institute of Automation,Chinese Academy of Sciences,in 2012.He was an Associate Professor with The State Key Laboratory of Management and Control for Complex Systems,Institute of Automation,Chinese Academy of Sciences.He is currently a Full Professor with the Faculty of Information Technology,Beijing University of Technology.His current research interests include adaptive critic control with industrial applications,reinforcement learning and intelligent systems.He has authored or co-authored over 120 journal and conference papers and four monographs.He was successively selected as a Clarivate Highly Cited Researcher from 2020 to 2022.He is a Member of IEEE/CAA Journal of Automatica Sinica Early Career Advisory Board.He currently or formerly serves as an Associate Editor of IEEE Transactions on Neural Networks and Learning Systems,IEEE Transactions on Systems,Man,and Cybernetics:Systems,Neural Networks,Engineering Applications of Artificial Intelligence,International Journal of Robust and Nonlinear Control,International Journal of Adaptive Control and Signal Processing,Neurocomputing,and Acta Automatica Sinica.e-mail:dingwang@bjut.edu.cn;Jiangyu Wang,received B.E.in automation from Tianjin University of Technology in 2021.He is a master student in control science and engineering at the Beijing University of Technology.His research interests include adaptive dynamic programming,reinforcement learning with industrial applications,and intelligent systems.e-mail:wangjiangyu@emails.bjut.edu.cn;Mingming Zhao,received the B.E.degree in automation from Henan Polytechnic University in 2019,and the M.E.degree in control engineering from Beijing University of Technology in 2022.He is currently a Ph.D.candidate in control science and engineering at Beijing University of Technology.His research interests include adaptive dynamic programming,reinforcement learning with industrial applications,and intelligent systems.e-mail:zhaomm@emails.bjut.edu.cn;Peng Xin,received the B.E.degree in automation and the M.E.degree in control engineering from Lanzhou University of Technology,in 2018 and 2021,respectively.He is currently a Ph.D.candidate in control science and engineering at Beijing University of Technology.His research interests include adaptive dynamic programming,model predictive control,reinforcement learning with industrial applications,and intelligent systems.e-mail:xinpeng@emails.bjut.edu.cn;Junfei Qiao(Senior Member,IEEE)received the B.E.and M.E.degrees in control engineering from Liaoning Technical University,in 1992 and 1995,respectively,and the Ph.D.degree in control theory and control engineering from Northeastern University in 1998.He is currently a Professor with the Faculty of Information Technology,Beijing University of Technology,where he is also the Director of Beijing Laboratory of Smart Environmental Protection.His current research interests include neural networks,intelligent systems,self-adaptive systems and process control.e-mail:adqiao@bjut.edu.cn.

引文网络
相关文献

同被引文献23

1Derong Liu,Yancai Xu,Qinglai Wei,Xinliang Liu.Residential Energy Scheduling for Variable Weather Solar Energy Based on Adaptive Dynamic Programming[J].IEEE/CAA Journal of Automatica Sinica,2018,5(1):36-46. 被引量：18
2Teng Liu,Bin Tian,Yunfeng Ai,Li Li,Dongpu Cao,Fei-Yue Wang.Parallel Reinforcement Learning:A Framework and Case Study[J].IEEE/CAA Journal of Automatica Sinica,2018,5(4):827-835. 被引量：10
3张璐,张嘉成,韩红桂,乔俊飞.基于动态分解多目标粒子群优化的城市污水处理过程优化控制[J].北京工业大学学报,2021,47(3):239-245. 被引量：11
4陈世明,卢家胜,高彦丽.基于神经网络的电力系统暂态稳定分布式自适应控制[J].控制与决策,2021,36(6):1407-1414. 被引量：11
5王鼎.一类离散动态系统基于事件的迭代神经控制[J].工程科学学报,2022,44(3):411-419. 被引量：7
6王鼎,赵明明,哈明鸣,乔俊飞.基于折扣广义值迭代的智能最优跟踪及应用验证[J].自动化学报,2022,48(1):182-193. 被引量：9
7Mingming Ha,Ding Wang,Derong Liu.Discounted Iterative Adaptive Critic Designs With Novel Stability Analysis for Tracking Control[J].IEEE/CAA Journal of Automatica Sinica,2022,9(7):1262-1272. 被引量：9
8Yintao Zhang,Youmin Zhang,Ziquan Yu.Path Following Control for UAV Using Deep Reinforcement Learning Approach[J].Guidance, Navigation and Control,2021,1(1):91-108. 被引量：12
9Wei Dong,Jianan Wang,Chunyan Wang,Zhenqiang Qi,Zhengtao Ding.Grap hical Minimax Game and Off-Policy Reinforcement Learning for Heterogeneous MASs with Spanning Tree Condition[J].Guidance, Navigation and Control,2021,1(3):1-23. 被引量：1
10G.Rigatos.A Nonlinear Optimal Control Approach for the Vertical Take-off and Landing Aircraft[J].Guidance, Navigation and Control,2021,1(3):24-51. 被引量：4

引证文献7

1王鼎,王将宇,乔俊飞.融合自适应评判的随机系统数据驱动策略优化[J].自动化学报,2024,50(5):980-990. 被引量：2
2王鼎,范文倩,刘奥.未知不匹配互联系统的非对称输入约束分散控制器设计[J].工程科学学报,2024,46(12):2269-2278.
3Yuan Wang,Mingming Zhao,Nan Liu,Ding Wang.Accelerated Value Iteration for Nonlinear Zero-Sum Games with Convergence Guarantee[J].Guidance, Navigation and Control,2024,4(1):121-148.
4Mingming Zhao,Ding Wang,Shijie Song,Junfei Qiao.Safe Q-Learning for Data-Driven Nonlinear Optimal Control With Asymmetric State Constraints[J].IEEE/CAA Journal of Automatica Sinica,2024,11(12):2408-2422.
5王鼎,李鑫.迁移增量启发式动态规划及污水处理应用[J].北京工业大学学报,2025,51(3):277-283.
6Ding Wang,Lingzhi Hu,Xiaoli Li,Junfei Qiao.Online Fault-Tolerant Tracking Control With Adaptive Critic for Nonaffine Nonlinear Systems[J].IEEE/CAA Journal of Automatica Sinica,2025,12(1):215-227.
7Ding Wang,Jin Ren,Haiming Huang,Junfei Qiao.Particle Swarm Optimization for Adaptive-Critic Feedback Control with Power System Applications[J].Chinese Journal of Electronics,2025,34(4):1265-1274.

二级引证文献2

1李梦花,王鼎,赵明明,乔俊飞.不依赖初始容许控制的非对称约束零和博弈智能评判设计[J].控制与决策,2025,40(4):1347-1356.
2王鼎,赵明明,刘德荣,乔俊飞,宋世杰.数据驱动自适应评判控制研究进展[J].自动化学报,2025,51(6):1170-1190.

1Lidong Wang,Reed L.Mosher,Terril C.Falls,Patti Duett.Data Analytics of an Information System Based on a Markov Decision Process and a Partially Observable Markov Decision Process[J].Journal of Computer Science Research,2023,5(1):21-30.
2CHENG Guangran,DONG Lu,YUAN Xin,SUN Changyin.Reinforcement learning-based scheduling of multi-battery energy storage system[J].Journal of Systems Engineering and Electronics,2023,34(1):117-128. 被引量：1
3Jingrui SUN,Hanxiao WANG,Jiongmin YONG.Erratum to:Turnpike Properties for Stochastic Linear-Quadratic Optimal Control Problems[J].Chinese Annals of Mathematics,Series B,2023,44(1):163-163.
4朱佳龙,周晓华,宗琳.基于ADHDP的插电式混合动力汽车能量管理策略[J].广西科技大学学报,2023,34(3):99-107. 被引量：1
5Qi ZHANG,Zongwu XIE,Baoshi CAO,Yang LIU.A policy iteration method for improving robot assembly trajectory efficiency[J].Chinese Journal of Aeronautics,2023,36(3):436-448. 被引量：2
6李千妍,王伟.常弹性方差模型下含资本利得税的最优投资策略[J].宁波大学学报（理工版）,2023,36(4):104-111.
7TANG Xiaonan,ZHU Xumei,QIAN Jiayan,SUN Hong.The Impact of Interaction Methods on Online Learning for English Majors[J].US-China Education Review(B),2023,13(1):31-36.
8Derong Liu,Mingming Ha,Shan Xue.State of the Art of Adaptive Dynamic Programming and Reinforcement Learning[J].CAAI Artificial Intelligence Research,2022,1(2):93-110. 被引量：1
9Kai-Hua Zhang,Ying Jiang,Liang-Shun Zhang.Inferring the Physics of Structural Evolution of Multicomponent Polymers via Machine-Learning-Accelerated Method[J].Chinese Journal of Polymer Science,2023,41(9):1377-1385.
10崔璨,王伟.指数保费准则下存在模糊厌恶的最优分红策略[J].天津师范大学学报（自然科学版）,2023,43(3):8-11.

IEEE/CAA Journal of Automatica Sinica

2023年第9期

浏览历史

内容加载中请稍等...

Adaptive Multi-Step Evaluation Design With Stability Guarantee for Discrete-Time Optimal Learning Control 被引量：7

同被引文献23

引证文献7

二级引证文献2

相关作者

相关机构

相关主题

浏览历史