期刊文献+

安全强化学习算法及其在CPS智能控制中的应用 被引量:5

Safe Reinforcement Learning Algorithm and Its Application in Intelligent Control for CPS
在线阅读 下载PDF
导出
摘要 信息物理系统(cyber-physical system,CPS)的安全控制器设计是一个热门研究方向,现有基于形式化方法的安全控制器设计存在过度依赖模型、可扩展性差等问题.基于深度强化学习的智能控制可处理高维非线性复杂系统和不确定性系统,正成为非常有前景的CPS控制技术,但是缺乏对安全性的保障.针对强化学习控制在安全性方面的不足,围绕一个工业油泵控制系统典型案例,开展安全强化学习算法和智能控制应用研究.首先,形式化了工业油泵控制的安全强化学习问题,搭建了工业油泵仿真环境;随后,通过设计输出层结构和激活函数,构造了神经网络形式的油泵控制器,使得油泵开关时间的线性不等式约束得到满足;最后,为了更好地权衡安全性和最优性控制目标,基于增广拉格朗日乘子法设计实现了新型安全强化学习算法.在工业油泵案例上的对比实验表明,该算法生成的控制器在安全性和最优性上均超越了现有同类算法.在进一步评估中,所生成神经网络控制器以90%的概率通过了严格形式化验证;同时,与理论最优控制器相比实现了低至2%的最优目标值损失.所提方法有望推广至更多应用场景,实例研究的方案有望为安全智能控制和形式化验证领域其他学者提供借鉴. The problem of safe controller design for cyber-physical systems(CPS)is a hot research topic.The existing safe controller design based on formal methods has problems such as excessive reliance on system models and poor scalability.Intelligent control based on deep reinforcement learning can handle high-dimensional nonlinear complex systems and uncertain systems,and is becoming a very promising CPS control technology,but it lacks safety guarantees.This study addresses the safety issues of reinforcement learning control by focusing on a case study of a typical industrial oil pump control system,and carries out research in designing new safe reinforcement learning algorithm and applying the algorithm in intelligent control scenario.First,the safe reinforcement learning problem of the industrial oil pump is formulated,and simulation environment of the oil pump is built.Then,by designing the structure and activation function of the output layer,the neural network type oil pump controller is constructed to satisfy the linear inequality constraints of the oil pump switching time.Finally,in order to better balance the safety and optimality control objectives,a new safe reinforcement learning algorithm is designed based on the augmented Lagrange multiplier method.Comparative experiment on the industrial oil pump shows that the controller generated by the proposed algorithm surpasses existing algorithms in the same category,both in safety and optimality.In further evaluation,the neural network controllers generated in this study pass rigorous formal verification with probability of 90%.Meanwhile,compared with the theoretically optimal controller,neural network controllers achieve a loss of optimal objective value as low as 2%.The method proposed in this study is expected to be extended to more application scenarios,and the case study scheme is expected to be referenced by other researchers in the field of intelligent control and formal verification.
作者 赵恒军 李权忠 曾霞 刘志明 ZHAO Heng-Jun;LI Quan-Zhong;ZENG Xia;LIU Zhi-Ming(College of Computer and Information Science College of Software,Southwest University,Chongqing 400715,China;Centre for Intelligent and Embedded Software,Northwestern Polytechnical University,Xi’an 710129,China;Centre for Research and Innovation in Software Engineering,Southwest University,Chongqing 400715,China)
出处 《软件学报》 EI CSCD 北大核心 2022年第7期2538-2561,共24页 Journal of Software
基金 国家自然科学基金(61902325,62032019,61972385,61732019,61702425) 西南大学国家人才建设项目(SWU116007)
关键词 强化学习 智能控制 信息物理系统 安全验证 工业油泵 reinforcement learning intelligent control cyber-physical system safety verification industrial oil pump
作者简介 赵恒军(1985-),男,博士,讲师,CCF专业会员,主要研究领域为信息物理系统,形式化方法;李权忠(1995-),男,硕士生,主要研究领域为强化学习,智能控制;通信作者:曾霞(1987-),女,博士,讲师,主要研究领域为信息物理系统,数值符号计算,E-mail:xzeng0712@swu.edu.cn;刘志明(1961-),男,博士,教授,博士生导师,CCF高级会员,主要研究领域为软件理论与方法
  • 相关文献

参考文献2

二级参考文献3

共引文献21

同被引文献37

引证文献5

二级引证文献20

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部