期刊文献+

Learning the continuous-time optimal decision law from discrete-time rewards 被引量:1

原文传递
导出
摘要 The concept of reward is fundamental in reinforcement learning with a wide range of applications in natural and social sciences.Seeking an interpretable reward for decision-making that largely shapes the system's behavior has always been a challenge in reinforcement learning.In this work,we explore a discrete-time reward for reinforcement learning in continuous time and action spaces that represent many phenomena captured by applying physical laws.We find that the discrete-time reward leads to the extraction of the unique continuous-time decision law and improved computational efficiency by dropping the integrator operator that appears in classical results with integral rewards.We apply this finding to solve output-feedback design problems in power systems.The results reveal that our approach removes an intermediate stage of identifying dynamical models.Our work suggests that the discrete-time reward is efficient in search of the desired decision law,which provides a computational tool to understand and modify the behavior of large-scale engineering systems using the optimal learned decision.
出处 《National Science Open》 2024年第5期130-147,共18页 国家科学进展(英文)
基金 supported by the Guangdong Basic and Applied Basic Research Foundation(2024A1515011936) the National Natural Science Foundation of China(62320106008)
作者简介 Corresponding author:Ci Chen,emails:ci.chen@gdut.edu.cn;Corresponding author:Lihua Xie,emails:elhxie@ntu.edu.sg;Corresponding author:Shengli Xie,emails:shlxie@gdut.edu.cn
  • 相关文献

同被引文献2

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部