期刊文献+
共找到4篇文章
< 1 >
每页显示 20 50 100
Distributed Weighted Data Aggregation Algorithm in End-to-Edge Communication Networks Based on Multi-armed Bandit 被引量:1
1
作者 Yifei ZOU Senmao QI +1 位作者 Cong'an XU Dongxiao YU 《计算机科学》 CSCD 北大核心 2023年第2期13-22,共10页
As a combination of edge computing and artificial intelligence,edge intelligence has become a promising technique and provided its users with a series of fast,precise,and customized services.In edge intelligence,when ... As a combination of edge computing and artificial intelligence,edge intelligence has become a promising technique and provided its users with a series of fast,precise,and customized services.In edge intelligence,when learning agents are deployed on the edge side,the data aggregation from the end side to the designated edge devices is an important research topic.Considering the various importance of end devices,this paper studies the weighted data aggregation problem in a single hop end-to-edge communication network.Firstly,to make sure all the end devices with various weights are fairly treated in data aggregation,a distributed end-to-edge cooperative scheme is proposed.Then,to handle the massive contention on the wireless channel caused by end devices,a multi-armed bandit(MAB)algorithm is designed to help the end devices find their most appropriate update rates.Diffe-rent from the traditional data aggregation works,combining the MAB enables our algorithm a higher efficiency in data aggregation.With a theoretical analysis,we show that the efficiency of our algorithm is asymptotically optimal.Comparative experiments with previous works are also conducted to show the strength of our algorithm. 展开更多
关键词 Weighted data aggregation End-to-edge communication multi-armed bandit Edge intelligence
在线阅读 下载PDF
Strict greedy design paradigm applied to the stochastic multi-armed bandit problem
2
作者 Joey Hong 《机床与液压》 北大核心 2015年第6期1-6,共6页
The process of making decisions is something humans do inherently and routinely,to the extent that it appears commonplace. However,in order to achieve good overall performance,decisions must take into account both the... The process of making decisions is something humans do inherently and routinely,to the extent that it appears commonplace. However,in order to achieve good overall performance,decisions must take into account both the outcomes of past decisions and opportunities of future ones. Reinforcement learning,which is fundamental to sequential decision-making,consists of the following components: 1 A set of decisions epochs; 2 A set of environment states; 3 A set of available actions to transition states; 4 State-action dependent immediate rewards for each action.At each decision,the environment state provides the decision maker with a set of available actions from which to choose. As a result of selecting a particular action in the state,the environment generates an immediate reward for the decision maker and shifts to a different state and decision. The ultimate goal for the decision maker is to maximize the total reward after a sequence of time steps.This paper will focus on an archetypal example of reinforcement learning,the stochastic multi-armed bandit problem. After introducing the dilemma,I will briefly cover the most common methods used to solve it,namely the UCB and εn- greedy algorithms. I will also introduce my own greedy implementation,the strict-greedy algorithm,which more tightly follows the greedy pattern in algorithm design,and show that it runs comparably to the two accepted algorithms. 展开更多
关键词 Greedy algorithms Allocation strategy Stochastic multi-armed bandit problem
在线阅读 下载PDF
Impedance control of multi-arm space robot for the capture of non-cooperative targets 被引量:7
3
作者 GE Dongming SUN Guanghui +1 位作者 ZOU Yuanjie SHI Jixin 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2020年第5期1051-1061,共11页
Robotic systems are expected to play an increasingly important role in future space activities. The robotic on-orbital service, whose key is the capturing technology, becomes a research hot spot in recent years. This ... Robotic systems are expected to play an increasingly important role in future space activities. The robotic on-orbital service, whose key is the capturing technology, becomes a research hot spot in recent years. This paper studies the dynamics modeling and impedance control of a multi-arm free-flying space robotic system capturing a non-cooperative target. Firstly, a control-oriented dynamics model is essential in control algorithm design and code realization. Unlike a numerical algorithm, an analytical approach is suggested. Using a general and a quasi-coordinate Lagrangian formulation, the kinematics and dynamics equations are derived.Then, an impedance control algorithm is developed which allows coordinated control of the multiple manipulators to capture a target.Through enforcing a reference impedance, end-effectors behave like a mass-damper-spring system fixed in inertial space in reaction to any contact force between the capture hands and the target. Meanwhile, the position and the attitude of the base are maintained stably by using gas jet thrusters to work against the manipulators' reaction. Finally, a simulation by using a space robot with two manipulators and a free-floating non-cooperative target is illustrated to verify the effectiveness of the proposed method. 展开更多
关键词 multi-arm space robot impedance control non-cooperative target CAPTURE
在线阅读 下载PDF
Optimal index shooting policy for layered missile defense system 被引量:2
4
作者 LI Longyue FAN Chengli +2 位作者 XING Qinghua XU Hailong ZHAO Huizhen 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2020年第1期118-129,共12页
In order to cope with the increasing threat of the ballistic missile(BM)in a shorter reaction time,the shooting policy of the layered defense system needs to be optimized.The main decisionmaking problem of shooting op... In order to cope with the increasing threat of the ballistic missile(BM)in a shorter reaction time,the shooting policy of the layered defense system needs to be optimized.The main decisionmaking problem of shooting optimization is how to choose the next BM which needs to be shot according to the previous engagements and results,thus maximizing the expected return of BMs killed or minimizing the cost of BMs penetration.Motivated by this,this study aims to determine an optimal shooting policy for a two-layer missile defense(TLMD)system.This paper considers a scenario in which the TLMD system wishes to shoot at a collection of BMs one at a time,and to maximize the return obtained from BMs killed before the system demise.To provide a policy analysis tool,this paper develops a general model for shooting decision-making,the shooting engagements can be described as a discounted reward Markov decision process.The index shooting policy is a strategy that can effectively balance the shooting returns and the risk that the defense mission fails,and the goal is to maximize the return obtained from BMs killed before the system demise.The numerical results show that the index policy is better than a range of competitors,especially the mean returns and the mean killing BM number. 展开更多
关键词 Gittins index shooting policy layered missile defense multi-armed bandits problem Markov decision process
在线阅读 下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部