Robust reinforcement learning with UUB guarantee for safe motion control of autonomous robots 被引量：1

导出

摘要 This paper addresses the issue of safety in reinforcement learning(RL)with disturbances and its application in the safety-constrained motion control of autonomous robots.To tackle this problem,a robust Lyapunov value function(rLVF)is proposed.The rLVF is obtained by introducing a data-based LVF under the worst-case disturbance of the observed state.Using the rLVF,a uniformly ultimate boundedness criterion is established.This criterion is desired to ensure that the cost function,which serves as a safety criterion,ultimately converges to a range via the policy to be designed.Moreover,to mitigate the drastic variation of the rLVF caused by differences in states,a smoothing regularization of the rLVF is introduced.To train policies with safety guarantees under the worst disturbances of the observed states,an off-policy robust RL algorithm is proposed.The proposed algorithm is applied to motion control tasks of an autonomous vehicle and a cartpole,which involve external disturbances and variations of the model parameters,respectively.The experimental results demonstrate the effectiveness of the theoretical findings and the advantages of the proposed algorithm in terms of robustness and safety.

作者 ZHANG RuiXian HAN YiNing SU Man LIN ZeFeng LI HaoWei ZHANG LiXian

机构地区 School of Astronautics School of Management Beijing Institute of Tracking and Telecommunication Technology

出处《Science China(Technological Sciences)》 SCIE EI CAS CSCD 2024年第1期172-182,共11页 中国科学（技术科学英文版）

基金 supported by the National Natural Science Foundation of China(Grant Nos.62225305 and 12072088) the Fundamental Research Funds for the Central Universities,China(Grant Nos.HIT.BRET.2022004,HIT.OCEF.2022047,and HIT.DZIJ.2023049) the Grant JCKY2022603C016,State Key Laboratory of Robotics and System(HIT) the Heilongjiang Touyan Team。

关键词 motion control reinforcement learning ROBUSTNESS stability

分类号 TP242 [自动化与计算机技术—检测技术与自动化装置]

作者简介 Corresponding author:ZHANG LiXian,email:lixianzhang@hit.edu.cn。

引文网络
相关文献

参考文献5

1CAI GuoRong,YANG ShengMing,DU Jing,WANG ZongYue,HUANG Bin,GUAN Yin,SU SongJian,SU JinHe,SU SongZhi.Convolution without multiplication:A general speed up strategy for CNNs[J].Science China(Technological Sciences),2021,64(12):2627-2639. 被引量：7
2YU YuePing,LIU JiChuan,WEI Chen.Hawk and pigeon's intelligence for UAV swarm dynamic combat game via competitive learning pigeon-inspired optimization[J].Science China(Technological Sciences),2022,65(5):1072-1086. 被引量：10
3BAI TingTing,WANG DaoBo,MASOOD Rana Javed.Formation control of quad-rotor UAV via PIO[J].Science China(Technological Sciences),2022,65(2):432-439. 被引量：9
4WANG QiShao,ZHUANG Han,DUAN ZhiSheng,WANG QingYun.Robust control of uncertain robotic systems:An adaptive friction compensation approach[J].Science China(Technological Sciences),2021,64(6):1228-1237. 被引量：3
5DONG Xiang,ZHANG Jing,CHENG Long,XU WenJun,SU Hang,MEI Tao.A policy gradient algorithm integrating long and short-term rewards for soft continuum arm control[J].Science China(Technological Sciences),2022,65(10):2409-2419. 被引量：3

二级参考文献21

1ZHAO Zhen1,2, LIU Caishan2 & CHEN Bin2 1. Beijing Institute of Graphic Communication, Beijing 102600, China,2. Department of Mechanics & Engineering Science, Peking University, Beijing 100871, China.The numerical method for three-dimensional impact with friction of multi-rigid-body system[J].Science China(Physics,Mechanics & Astronomy),2006,49(1):102-118. 被引量：8
2ZHOU Jin,WU XingJie,LIU ZengRong.Distributed coordinated adaptive tracking in networked redundant robotic systems with a dynamic leader[J].Science China(Technological Sciences),2014,57(5):905-913. 被引量：14
3Zhang Shujian,Duan Haibin.Gaussian pigeon-inspired optimization approach to orbital spacecraft formation reconfiguration[J].Chinese Journal of Aeronautics,2015,28(1):200-205. 被引量：28
4HUANG Na,DUAN ZhiSheng,ZHAO Yu.Distributed consensus for multiple Euler-Lagrange systems: An event-triggered approach[J].Science China(Technological Sciences),2016,59(1):33-44. 被引量：12
5WANG LiJiao,MENG Bin.Characteristic model-based consensus of networked heterogeneous robotic manipulators with dynamic uncertainties[J].Science China(Technological Sciences),2016,59(1):63-71. 被引量：8
6罗德林,张海洋,谢荣增,吴顺祥.基于多agent系统的大规模无人机集群对抗[J].控制理论与应用,2015,32(11):1498-1504. 被引量：49
7PEI JiaZheng,SU YiXin,ZHANG DanHong.Fuzzy energy management strategy for parallel HEV based on pigeon-inspired optimization algorithm[J].Science China(Technological Sciences),2017,60(3):425-433. 被引量：15
8罗德林,徐扬,张金鹏.无人机集群对抗技术新进展[J].科技导报,2017,35(7):26-31. 被引量：42
9XIN Long,XIAN Ning.Biological object recognition approach using space variant resolution and pigeon-inspired optimization for UAV[J].Science China(Technological Sciences),2017,60(10):1577-1584. 被引量：8
10Jie-ru Fan,Dong-guang Li,Ru-peng Li,Yue Wang.Analysis on MAV/UAV cooperative combat based on complex network[J].Defence Technology（防务技术）,2020,16(1):150-157. 被引量：23

共引文献20

1WANG Ke,GUO Ping,LUO ALi,XU MingLiang.Unsupervised pseudoinverse hashing learning model for rare astronomical object retrieval[J].Science China(Technological Sciences),2022,65(6):1338-1348.
2JIN YanRui,LI ZhiYuan,LIU YunQing,LIU JinLei,QIN Chengjin,ZHAO LiQun,LIU ChengLiang.Multi-class 12-lead ECG automatic diagnosis based on a novel subdomain adaptive deep network[J].Science China(Technological Sciences),2022,65(11):2617-2630. 被引量：4
3盛磊,时满红,亓迎川,李浩,庞明军.基于态势演化博弈的无人机集群动态攻防[J].系统工程与电子技术,2023,45(8):2332-2342. 被引量：1
4杨慧欣,项子健,李响,滕英元.基于DCNN和数据增强的固体发动机烧蚀预示方法[J].测控技术,2023,42(8):64-70. 被引量：1
5任智,张栋,唐硕,王孟阳,李智军.无人机集群反制与对抗技术探讨[J].指挥与控制学报,2023,9(6):660-672. 被引量：5
6Siqing Sun,Defu Cai,Hai-Tao Zhang,Ning Xing.Reinforcement Learning-Based MAS Interception in Antagonistic Environments[J].IEEE/CAA Journal of Automatica Sinica,2024,11(1):270-272.
7REN ZiLiang,ZHANG QieShi,CHENG Qin,XU ZhenYu,YUAN Shuai,LUO DeLin.Segment differential aggregation representation and supervised compensation learning of ConvNets for human action recognition[J].Science China(Technological Sciences),2024,67(1):197-208.
8ZHU GuoLiang,LIU KeXin,GU HaiBo,LÜJinHu.Neural-network-based fully distributed formation control for nonlinear multi-agent systems with event-triggered communication[J].Science China(Technological Sciences),2024,67(1):209-220. 被引量：4
9ZHANG RuiXian,YANG JiaNan,LIANG Ye,LU ShengAo,DONG YiFei,YANG BaoQing,ZHANG LiXian.Navigation for autonomous vehicles via fast-stable and smooth reinforcement learning[J].Science China(Technological Sciences),2024,67(2):423-434.
10YUAN GuangSong,DUAN HaiBin.Extremum seeking control for UAV close formation flight via improved pigeon-inspired optimization[J].Science China(Technological Sciences),2024,67(2):435-448. 被引量：3

同被引文献9

1Tingting Gao,Yan-Jun Liu,Lei Liu,Dapeng Li.Adaptive Neural Network-Based Control for a Class of Nonlinear Pure-Feedback Systems With Time-Varying Full State Constraints[J].IEEE/CAA Journal of Automatica Sinica,2018,5(5):923-933. 被引量：14
2REN HongRu,MA Hui,LI HongYi,LU RenQuan.A disturbance observer based intelligent control for nonstrict-feedback nonlinear systems[J].Science China(Technological Sciences),2023,66(2):456-467. 被引量：3
3Hongru Ren,Hui Ma,Hongyi Li,Zhenyou Wang.Adaptive Fixed-Time Control of Nonlinear MASs With Actuator Faults[J].IEEE/CAA Journal of Automatica Sinica,2023,10(5):1252-1262. 被引量：8
4Mingzhe HOU,Wenrui SHI,Leyan FANG,Guangren DUAN.Adaptive dynamic surface control of high-order strict feedback nonlinear systems with parameter estimations[J].Science China(Information Sciences),2023,66(5):289-290. 被引量：4
5ZHANG Haoyan,ZHAO Xudong,WANG Huanqing,NIU Ben,XU Ning.Adaptive Tracking Control for Output-Constrained Switched MIMO Pure-Feedback Nonlinear Systems with Input Saturation[J].Journal of Systems Science & Complexity,2023,36(3):960-984. 被引量：5
6LIU Yang,CHI RongHu,LI HongYi,WANG LiJie,LIN Na.HiTL-based adaptive fuzzy tracking control of MASs:A distributed fixed-time strategy[J].Science China(Technological Sciences),2023,66(10):2907-2916. 被引量：10
7WANG Yan,LI Xiao-Meng,YUAN Wang,YAO DeYin,LI HongYi.Dynamic event-triggered finite-time control for multiple Euler-Lagrange systems using integral terminal sliding mode[J].Science China(Technological Sciences),2023,66(11):3164-3173. 被引量：4
8LI RongJiang,GAN Die,XIE SiYu,LüJinHu.Stability and performance analysis of the compressed Kalman filter algorithm for sparse stochastic systems[J].Science China(Technological Sciences),2024,67(2):380-394. 被引量：2
9YUAN GuangSong,DUAN HaiBin.Extremum seeking control for UAV close formation flight via improved pigeon-inspired optimization[J].Science China(Technological Sciences),2024,67(2):435-448. 被引量：3

引证文献1

1PAN YingNan,CHEN YiLin,LIANG HongJing.Event-triggered predefined-time control for full-state constrained nonlinear systems: A novel command filtering error compensation method[J].Science China(Technological Sciences),2024,67(9):2867-2880. 被引量：2

二级引证文献2

1Shoufeng YANG,Hongjing LIANG,Yingnan PAN,Tieshan LI.Security control for air-sea heterogeneous multiagent systems with cooperative-antagonistic interactions:An intermittent privacy preservation mechanism[J].Science China(Technological Sciences),2025,68(4):179-192.
2Weiyong Yu,Qi Chen,Hongbing Zhou,Xiang An,Qiang Liu.Output feedback control of nonlinear time-delay systems with multiple uncertainties via an event-triggered strategy[J].Control Theory and Technology,2025,23(2):321-340.

1Liwen Wang,Shuo Yang,Kang Yuan,Yanjun Huang,Hong Chen.A Combined Reinforcement Learning and Model Predictive Control for Car-Following Maneuver of Autonomous Vehicles[J].Chinese Journal of Mechanical Engineering,2023,36(3):315-325. 被引量：2
2刘建忠:任风云再变典力步履不停[J].现代工商,2023(11):85-85.
3Tingjun Lei,Timothy Sellers,Chaomin Luo,Daniel W.Carruth,Zhuming Bi.Graph-based robot optimal path planning with bio-inspired algorithms[J].Biomimetic Intelligence & Robotics,2023,3(3):75-90. 被引量：2
4ZHANG JieXin,NIE PingYun,ZHANG Bo.A variable structure passivity control method for elastic joint robots based on cascaded high-order state estimation[J].Science China(Technological Sciences),2024,67(2):395-407. 被引量：1
5陈润泽,曹安妮,王馨苒,柳洋,杨洪新,赵巍胜.Oscillation of Dzyaloshinskii–Moriya interaction driven by weak electric fields[J].Chinese Physics B,2024,33(2):487-491.
6方乐言,蒙晗,侯明哲.带有参数精确估计的迭代学习滑模控制及应用[J].航空学报,2024,45(1):159-173. 被引量：1
7Cong Jin,Jinjie Huang,Yuanjian Chen,Yuqing Gong.Enhanced Differentiable Architecture Search Based on Asymptotic Regularization[J].Computers, Materials & Continua,2024,78(2):1547-1568.
8YAN Chunman,ZOU Meng.Electrical capacitance tomography image reconstruction based on iterative Tikhonov regularization improved algorithm[J].Optoelectronics Letters,2023,19(12):762-768.
9郭松柏,薛玉玲,何敏,崔景安.考虑疫苗接种和失效的疟疾传播动力学建模与分析[J].应用数学学报,2024,47(1):1-11.
10Elafibranor治疗原发性胆汁性胆管炎患者的临床研究[J].中国临床药理学杂志,2024,40(1):106-106.

Science China(Technological Sciences)

2024年第1期

浏览历史

内容加载中请稍等...