Learning the continuous-time optimal decision law from discrete-time rewards 被引量：1

导出

摘要 The concept of reward is fundamental in reinforcement learning with a wide range of applications in natural and social sciences.Seeking an interpretable reward for decision-making that largely shapes the system's behavior has always been a challenge in reinforcement learning.In this work,we explore a discrete-time reward for reinforcement learning in continuous time and action spaces that represent many phenomena captured by applying physical laws.We find that the discrete-time reward leads to the extraction of the unique continuous-time decision law and improved computational efficiency by dropping the integrator operator that appears in classical results with integral rewards.We apply this finding to solve output-feedback design problems in power systems.The results reveal that our approach removes an intermediate stage of identifying dynamical models.Our work suggests that the discrete-time reward is efficient in search of the desired decision law,which provides a computational tool to understand and modify the behavior of large-scale engineering systems using the optimal learned decision.

作者 Ci Chen Lihua Xie Kan Xie Frank Leroy Lewis Yilu Liu Shengli Xie

机构地区 School of Automation Key Laboratory of Intelligent Information Processing and System Integration of IoT School of Electrical and Electronic Engineering [ UTA Research Institute Department of Electrical Engineering and Computer Science Oak Ridge National Laboratory Guangdong-HongKong-Macao Joint Laboratory for Smart Discrete Manufacturing

出处《National Science Open》 2024年第5期130-147,共18页 国家科学进展（英文）

基金 supported by the Guangdong Basic and Applied Basic Research Foundation(2024A1515011936) the National Natural Science Foundation of China(62320106008)

关键词 continuous-time state and action decision law learning discrete-time reward dynamical systems reinforcement learning

分类号 TP181 [自动化与计算机技术—控制理论与控制工程]

作者简介 Corresponding author:Ci Chen,emails:ci.chen@gdut.edu.cn;Corresponding author:Lihua Xie,emails:elhxie@ntu.edu.sg;Corresponding author:Shengli Xie,emails:shlxie@gdut.edu.cn

引文网络
相关文献

同被引文献2

1DONG Yuchen,GAO Weinan,JIANG Zhong-Ping.New Results in Cooperative Adaptive Optimal Output Regulation[J].Journal of Systems Science & Complexity,2024,37(1):253-272. 被引量：1
2ZHANG Bao-Qiang,WANG Bing-Chang,CAO Ying.An Online Q-Learning Method for Linear-Quadratic Nonzero-Sum Stochastic Differential Games with Completely Unknown Dynamics[J].Journal of Systems Science & Complexity,2024,37(5):1907-1922. 被引量：3

引证文献1

1XIE Kedi,LU Maobin,DENG Fang,SUN Jian,CHEN Jie.Data-Driven Dynamic Output Feedback Nash Strategy for Multi-Player Non-Zero-Sum Games[J].Journal of Systems Science & Complexity,2025,38(2):597-612.

1Mohammad Ghani,Dwi Rantini,Maryamah.Kalman filter based on a fractional discrete-time stochastic augmented CoVid-19 model[J].Journal of Biosafety and Biosecurity,2024,6(2):76-98.
2Rui Jiang,Dong Ye,Yan Xiao,Zhaowei Sun,Zeming Zhang.Orbital Interception Pursuit Strategy for Random Evasion Using Deep Reinforcement Learning[J].Space(Science & Technology),2023,3(1):606-619. 被引量：4
3Ali Irshayyid,Jun Chen,Guojiang Xiong.A review on reinforcement learning-based highway autonomous vehicle control[J].Green Energy and Intelligent Transportation,2024,3(4):72-90. 被引量：1
4Xin Zhao,Ming Yi,Zhou-Chao Wei,Yuan Zhu,Lu-Lu Lu.A solution method for decomposing vector fields in Hamilton energy[J].Chinese Physics B,2024,33(9):645-653.
5陈佳园.英语教学中基于思维品质培养的问题链研究[J].社会科学前沿,2024,13(9):89-97.
6Xin Qu,Peng Xu,Zhiyong Liu,Jintao Wang,Fei Wang,Wei Huang,Zhongxin Li,Weichang Xu,Xinguo Ren.Charge self-consistent dynamical mean field theory calculations incombination with linear combination of numerical atomic orbitalsframework based density functional theory[J].Chinese Physics B,2024,33(10):62-69.
7Ye Jin,Yujun Xie,Zhihan Zhang,Donglai Lu,Menghan Yang,Ang Li,Xiangyan Meng,Yang Qu,Leliang Li,Nuannuan Shi,Wei Li,Ninghua Zhu,Nan Qi,Ming Li.4×112 Gb/s hybrid integrated silicon receiver based on photonic-electronic co-design[J].Chinese Optics Letters,2024,22(8):106-112.
8Liu Changjun,Shi Jia,Tie Zhuangzhuang,Wang Yongchao.Cognitive Jammers Assisted Covert Communication in Cognitive Radio Networks[J].China Communications,2024,21(9):75-89.
9郭军宏,曹太强,林轩,黄何伟,代锦,郑敏.一种改进滑模观测器的永磁同步电机无传感器控制[J].控制工程,2024,31(10):1797-1804. 被引量：1
10Elliot Wegman,Marlena Wosiski-Kuhn,Yu Luo.The dual role of striatal interneurons:circuit modulation and trophic support for the basal ganglia[J].Neural Regeneration Research,2024,19(6):1277-1283. 被引量：3

National Science Open

2024年第5期

浏览历史

内容加载中请稍等...

Learning the continuous-time optimal decision law from discrete-time rewards 被引量：1

同被引文献2

引证文献1

相关作者

相关机构

相关主题

浏览历史