摘要
This paper is concerned with a novel integrated multi-step heuristic dynamic programming(MsHDP)algorithm for solving optimal control problems.It is shown that,initialized by the zero cost function,MsHDP can converge to the optimal solution of the Hamilton-Jacobi-Bellman(HJB)equation.Then,the stability of the system is analyzed using control policies generated by MsHDP.Also,a general stability criterion is designed to determine the admissibility of the current control policy.That is,the criterion is applicable not only to traditional value iteration and policy iteration but also to MsHDP.Further,based on the convergence and the stability criterion,the integrated MsHDP algorithm using immature control policies is developed to accelerate learning efficiency greatly.Besides,actor-critic is utilized to implement the integrated MsHDP scheme,where neural networks are used to evaluate and improve the iterative policy as the parameter architecture.Finally,two simulation examples are given to demonstrate that the learning effectiveness of the integrated MsHDP scheme surpasses those of other fixed or integrated methods.
基金
the National Key Research and Development Program of China(2021ZD0112302)
the National Natural Science Foundation of China(62222301,61890930-5,62021003)
the Beijing Natural Science Foundation(JQ19013).
作者简介
Corresponding author:Ding Wang(Senior Member,IEEE)received the Ph.D.degree in control theory and control engineering from Institute of Automation,Chinese Academy of Sciences,in 2012.He was an Associate Professor with The State Key Laboratory of Management and Control for Complex Systems,Institute of Automation,Chinese Academy of Sciences.He is currently a Full Professor with the Faculty of Information Technology,Beijing University of Technology.His current research interests include adaptive critic control with industrial applications,reinforcement learning and intelligent systems.He has authored or co-authored over 120 journal and conference papers and four monographs.He was successively selected as a Clarivate Highly Cited Researcher from 2020 to 2022.He is a Member of IEEE/CAA Journal of Automatica Sinica Early Career Advisory Board.He currently or formerly serves as an Associate Editor of IEEE Transactions on Neural Networks and Learning Systems,IEEE Transactions on Systems,Man,and Cybernetics:Systems,Neural Networks,Engineering Applications of Artificial Intelligence,International Journal of Robust and Nonlinear Control,International Journal of Adaptive Control and Signal Processing,Neurocomputing,and Acta Automatica Sinica.e-mail:dingwang@bjut.edu.cn;Jiangyu Wang,received B.E.in automation from Tianjin University of Technology in 2021.He is a master student in control science and engineering at the Beijing University of Technology.His research interests include adaptive dynamic programming,reinforcement learning with industrial applications,and intelligent systems.e-mail:wangjiangyu@emails.bjut.edu.cn;Mingming Zhao,received the B.E.degree in automation from Henan Polytechnic University in 2019,and the M.E.degree in control engineering from Beijing University of Technology in 2022.He is currently a Ph.D.candidate in control science and engineering at Beijing University of Technology.His research interests include adaptive dynamic programming,reinforcement learning with industrial applications,and intelligent systems.e-mail:zhaomm@emails.bjut.edu.cn;Peng Xin,received the B.E.degree in automation and the M.E.degree in control engineering from Lanzhou University of Technology,in 2018 and 2021,respectively.He is currently a Ph.D.candidate in control science and engineering at Beijing University of Technology.His research interests include adaptive dynamic programming,model predictive control,reinforcement learning with industrial applications,and intelligent systems.e-mail:xinpeng@emails.bjut.edu.cn;Junfei Qiao(Senior Member,IEEE)received the B.E.and M.E.degrees in control engineering from Liaoning Technical University,in 1992 and 1995,respectively,and the Ph.D.degree in control theory and control engineering from Northeastern University in 1998.He is currently a Professor with the Faculty of Information Technology,Beijing University of Technology,where he is also the Director of Beijing Laboratory of Smart Environmental Protection.His current research interests include neural networks,intelligent systems,self-adaptive systems and process control.e-mail:adqiao@bjut.edu.cn.