摘要
研究分布式在线复合优化场景中的几种反馈延迟,包括梯度反馈、单点Bandit反馈和两点Bandit反馈.其中,每个智能体的局部目标函数由一个强凸光滑函数与一个凸的非光滑正则项组成.在分布式场景下,研究每个智能体具有不同时变延迟的场景.基于近端梯度下降算法,分别设计这三种延迟反馈的分布式在线复合优化算法,并且对动态遗憾上界进行分析.分析结果表示,延迟梯度反馈和延迟两点Bandit反馈的动态遗憾上界阶数在期望意义下相同,而延迟单点Bandit反馈的动态遗憾上界稍差于前两者.这表明,存在延迟时,两点Bandit反馈可以在期望意义下达到与梯度反馈相同阶数的动态遗憾上界,且在步长选择合适的情况下,三种反馈类型的平均延迟在动态遗憾上具有相同的阶数.最后通过仿真实验验证了算法的性能和理论分析结果.
This paper investigates several types of feedback delays in distributed online composite optimization scenarios,including gradient feedback,one-point Bandit feedback,and two-point Bandit feedback,where each agent's local objective function consists of a strongly convex and smooth function combined with a convex nonsmooth regularizer.In the distributed scenarios,this paper investigates the scenario where each agent has a different time-varying delay.Based on the proximal gradient descent algorithm,distributed online composite optimization algorithms are designed for these three delayed feedback cases respectively and the upper bounds of dynamic regret are analyzed.The analysis results showed that the order of dynamic regret upper bound under delayed gradient feedback and delayed two-point Bandit feedback is identical in the mathematical expectation sense,while that under delayed single-point Bandit feedback is slightly worse than the former two.Therefore,under delays,two-point Bandit feedback can achieve the same order of dynamic regret upper bound as gradient feedback in the mathematical expectation sense,and the average delay for the three feedbacks has the same orders on dynamic regret when choosing suitable step sizes.Finally,the performance of the algorithms and the results of the theoretical analysis are verified through simulations experiments.
作者
侯瑞捷
李修贤
易新蕾
洪奕光
谢立华
HOU Rui-Jie;LI Xiu-Xian;YI Xin-Lei;HONG Yi-Guang;XIE Li-Hua(Department of Control Science and Engineering,College of Electronics and Information Engineering,Tongji University,Shanghai 201800,China;National Key Laboratory of Autonomous Intelligent Unmanned System,Frontiers Science Center for Intelligent Autonomous Systems,Ministry of Education,Shanghai Research Institute for Intelligent Autonomous Systems,and Shanghai Institute of Intelligent Science and Technology,Tongji University,Shanghai 201210,China;Laboratory for Information and Decision Systems,Massachusetts Institute of Technology,Cambridge 02139,USA;School of Electrical and Electronic Engineering,Nanyang Technological University,Singapore 639798,Singapore)
出处
《自动化学报》
北大核心
2025年第4期835-856,共22页
Acta Automatica Sinica
基金
国家自然科学基金(62473292,62088101)
上海市科技重大专项(2021SHZDZX0100)资助。
关键词
分布式在线凸优化
复合优化
反馈延迟
BANDIT
反馈
动态遗憾
Distributed online convex optimization
composite optimization
feedback delays
Bandit feedback
dynamic regret
作者简介
侯瑞捷,同济大学电子与信息工程学院控制科学与工程系博士研究生.2021年获得兰州大学学士学位.主要研究方向为分布式在线优化.E-mail:hourj21@tongji.edu.cn;通信作者:李修贤,同济大学电子与信息工程学院控制科学与工程系、上海自主智能无人系统科学中心教授.主要研究方向为分布式控制与优化,智能算法,博弈论及自主智能无人系统.E-mail:xli@tongji.edu.cn;易新蕾,美国麻省理工学院信息与决策系统实验室博士后,现为同济大学准聘教授.主要研究方向为分布式优化,在线优化,元学习和图神经网络.E-mail:xinleiy@kth.se;洪奕光,同济大学电子与信息工程学院控制科学与工程系、上海自主智能无人系统科学中心教授.他是IEEE会士,CAA会士.主要研究方向为非线性控制,多智能体系统,分布式优化与博弈,机器学习和社交网络.E-mail:yghong@tongji.edu.cn;谢立华,新加坡南洋理工大学电气与电子工程学院教授.他是新加坡工程院院士、IEEE会士、IFAC会士和CAA会士.主要研究方向为鲁棒控制与估计,网络控制系统,分布式优化,多智能体网络、定位和无人系统.E-mail:elhxie@ntu.edu.sg。