无人机辅助通信系统是未来无线通信系统的重要组成部分。为进一步提高无人机辅助通信系统中时频资源的利用率,本文研究了一种基于非正交多址技术的无人机辅助通信架构,并提出了一种基于双延迟深度确定性策略梯度的TD3-TOPATM(twin delay...无人机辅助通信系统是未来无线通信系统的重要组成部分。为进一步提高无人机辅助通信系统中时频资源的利用率,本文研究了一种基于非正交多址技术的无人机辅助通信架构,并提出了一种基于双延迟深度确定性策略梯度的TD3-TOPATM(twin delayedtrajectory optimization and power allocation for total throughput maximization)算法,以最大化总吞吐量为目标,在满足最大功率约束、空间约束、最大飞行速度和服务质量(quality of service,QoS)约束的情况下,联合优化无人机的功率分配策略和3D轨迹。仿真实验分析结果表明,与随机算法相比,TD3-TOPATM算法能够实现98%的性能增益;与基于DQN(deep Q-network)的轨迹优化与资源分配算法相比,TD3-TOPATM算法获得的性能增益为19.4%;与基于深度确定性策略梯度的轨迹优化与资源分配算法相比,TD3-TOPATM算法得到的总吞吐量增加了9.7%;与基于正交多址技术的无人机辅助通信方案相比,基于非正交多址技术的无人机辅助通信方案实现了55%的性能增益。展开更多
Low earth orbit(LEO)satellites with wide coverage can carry the mobile edge computing(MEC)servers with powerful computing capabilities to form the LEO satellite edge computing system,providing computing services for t...Low earth orbit(LEO)satellites with wide coverage can carry the mobile edge computing(MEC)servers with powerful computing capabilities to form the LEO satellite edge computing system,providing computing services for the global ground users.In this paper,the computation offloading problem and resource allocation problem are formulated as a mixed integer nonlinear program(MINLP)problem.This paper proposes a computation offloading algorithm based on deep deterministic policy gradient(DDPG)to obtain the user offloading decisions and user uplink transmission power.This paper uses the convex optimization algorithm based on Lagrange multiplier method to obtain the optimal MEC server resource allocation scheme.In addition,the expression of suboptimal user local CPU cycles is derived by relaxation method.Simulation results show that the proposed algorithm can achieve excellent convergence effect,and the proposed algorithm significantly reduces the system utility values at considerable time cost compared with other algorithms.展开更多
The low Earth orbit(LEO)satellite networks have outstanding advantages such as wide coverage area and not being limited by geographic environment,which can provide a broader range of communication services and has bec...The low Earth orbit(LEO)satellite networks have outstanding advantages such as wide coverage area and not being limited by geographic environment,which can provide a broader range of communication services and has become an essential supplement to the terrestrial network.However,the dynamic changes and uneven distribution of satellite network traffic inevitably bring challenges to multipath routing.Even worse,the harsh space environment often leads to incomplete collection of network state data for routing decision-making,which further complicates this challenge.To address this problem,this paper proposes a state-incomplete intelligent dynamic multipath routing algorithm(SIDMRA)to maximize network efficiency even with incomplete state data as input.Specifically,we model the multipath routing problem as a markov decision process(MDP)and then combine the deep deterministic policy gradient(DDPG)and the K shortest paths(KSP)algorithm to solve the optimal multipath routing policy.We use the temporal correlation of the satellite network state to fit the incomplete state data and then use the message passing neuron network(MPNN)for data enhancement.Simulation results show that the proposed algorithm outperforms baseline algorithms regarding average end-to-end delay and packet loss rate and performs stably under certain missing rates of state data.展开更多
The coordinated optimization problem of the electricity-gas-heat integrated energy system(IES)has the characteristics of strong coupling,non-convexity,and nonlinearity.The centralized optimization method has a high co...The coordinated optimization problem of the electricity-gas-heat integrated energy system(IES)has the characteristics of strong coupling,non-convexity,and nonlinearity.The centralized optimization method has a high cost of communication and complex modeling.Meanwhile,the traditional numerical iterative solution cannot deal with uncertainty and solution efficiency,which is difficult to apply online.For the coordinated optimization problem of the electricity-gas-heat IES in this study,we constructed a model for the distributed IES with a dynamic distribution factor and transformed the centralized optimization problem into a distributed optimization problem in the multi-agent reinforcement learning environment using multi-agent deep deterministic policy gradient.Introducing the dynamic distribution factor allows the system to consider the impact of changes in real-time supply and demand on system optimization,dynamically coordinating different energy sources for complementary utilization and effectively improving the system economy.Compared with centralized optimization,the distributed model with multiple decision centers can achieve similar results while easing the pressure on system communication.The proposed method considers the dual uncertainty of renewable energy and load in the training.Compared with the traditional iterative solution method,it can better cope with uncertainty and realize real-time decision making of the system,which is conducive to the online application.Finally,we verify the effectiveness of the proposed method using an example of an IES coupled with three energy hub agents.展开更多
Device-to-device(D2D)communications underlying cellular networks enabled by unmanned aerial vehicles(UAV)have been regarded as promising techniques for next-generation communications.To mitigate the strong interferenc...Device-to-device(D2D)communications underlying cellular networks enabled by unmanned aerial vehicles(UAV)have been regarded as promising techniques for next-generation communications.To mitigate the strong interference caused by the line-of-sight(LoS)airto-ground channels,we deploy a reconfigurable intelligent surface(RIS)to rebuild the wireless channels.A joint optimization problem of the transmit power of UAV,the transmit power of D2D users and the RIS phase configuration are investigated to maximize the achievable rate of D2D users while satisfying the quality of service(QoS)requirement of cellular users.Due to the high channel dynamics and the coupling among cellular users,the RIS,and the D2D users,it is challenging to find a proper solution.Thus,a RIS softmax deep double deterministic(RIS-SD3)policy gradient method is proposed,which can smooth the optimization space as well as reduce the number of local optimizations.Specifically,the SD3 algorithm maximizes the reward of the agent by training the agent to maximize the value function after the softmax operator is introduced.Simulation results show that the proposed RIS-SD3 algorithm can significantly improve the rate of the D2D users while controlling the interference to the cellular user.Moreover,the proposed RIS-SD3 algorithm has better robustness than the twin delayed deep deterministic(TD3)policy gradient algorithm in a dynamic environment.展开更多
文摘无人机辅助通信系统是未来无线通信系统的重要组成部分。为进一步提高无人机辅助通信系统中时频资源的利用率,本文研究了一种基于非正交多址技术的无人机辅助通信架构,并提出了一种基于双延迟深度确定性策略梯度的TD3-TOPATM(twin delayedtrajectory optimization and power allocation for total throughput maximization)算法,以最大化总吞吐量为目标,在满足最大功率约束、空间约束、最大飞行速度和服务质量(quality of service,QoS)约束的情况下,联合优化无人机的功率分配策略和3D轨迹。仿真实验分析结果表明,与随机算法相比,TD3-TOPATM算法能够实现98%的性能增益;与基于DQN(deep Q-network)的轨迹优化与资源分配算法相比,TD3-TOPATM算法获得的性能增益为19.4%;与基于深度确定性策略梯度的轨迹优化与资源分配算法相比,TD3-TOPATM算法得到的总吞吐量增加了9.7%;与基于正交多址技术的无人机辅助通信方案相比,基于非正交多址技术的无人机辅助通信方案实现了55%的性能增益。
基金supported by National Natural Science Foundation of China No.62231012Natural Science Foundation for Outstanding Young Scholars of Heilongjiang Province under Grant YQ2020F001Heilongjiang Province Postdoctoral General Foundation under Grant AUGA4110004923.
文摘Low earth orbit(LEO)satellites with wide coverage can carry the mobile edge computing(MEC)servers with powerful computing capabilities to form the LEO satellite edge computing system,providing computing services for the global ground users.In this paper,the computation offloading problem and resource allocation problem are formulated as a mixed integer nonlinear program(MINLP)problem.This paper proposes a computation offloading algorithm based on deep deterministic policy gradient(DDPG)to obtain the user offloading decisions and user uplink transmission power.This paper uses the convex optimization algorithm based on Lagrange multiplier method to obtain the optimal MEC server resource allocation scheme.In addition,the expression of suboptimal user local CPU cycles is derived by relaxation method.Simulation results show that the proposed algorithm can achieve excellent convergence effect,and the proposed algorithm significantly reduces the system utility values at considerable time cost compared with other algorithms.
文摘The low Earth orbit(LEO)satellite networks have outstanding advantages such as wide coverage area and not being limited by geographic environment,which can provide a broader range of communication services and has become an essential supplement to the terrestrial network.However,the dynamic changes and uneven distribution of satellite network traffic inevitably bring challenges to multipath routing.Even worse,the harsh space environment often leads to incomplete collection of network state data for routing decision-making,which further complicates this challenge.To address this problem,this paper proposes a state-incomplete intelligent dynamic multipath routing algorithm(SIDMRA)to maximize network efficiency even with incomplete state data as input.Specifically,we model the multipath routing problem as a markov decision process(MDP)and then combine the deep deterministic policy gradient(DDPG)and the K shortest paths(KSP)algorithm to solve the optimal multipath routing policy.We use the temporal correlation of the satellite network state to fit the incomplete state data and then use the message passing neuron network(MPNN)for data enhancement.Simulation results show that the proposed algorithm outperforms baseline algorithms regarding average end-to-end delay and packet loss rate and performs stably under certain missing rates of state data.
基金supported by The National Key R&D Program of China(2020YFB0905900):Research on artificial intelligence application of power internet of things.
文摘The coordinated optimization problem of the electricity-gas-heat integrated energy system(IES)has the characteristics of strong coupling,non-convexity,and nonlinearity.The centralized optimization method has a high cost of communication and complex modeling.Meanwhile,the traditional numerical iterative solution cannot deal with uncertainty and solution efficiency,which is difficult to apply online.For the coordinated optimization problem of the electricity-gas-heat IES in this study,we constructed a model for the distributed IES with a dynamic distribution factor and transformed the centralized optimization problem into a distributed optimization problem in the multi-agent reinforcement learning environment using multi-agent deep deterministic policy gradient.Introducing the dynamic distribution factor allows the system to consider the impact of changes in real-time supply and demand on system optimization,dynamically coordinating different energy sources for complementary utilization and effectively improving the system economy.Compared with centralized optimization,the distributed model with multiple decision centers can achieve similar results while easing the pressure on system communication.The proposed method considers the dual uncertainty of renewable energy and load in the training.Compared with the traditional iterative solution method,it can better cope with uncertainty and realize real-time decision making of the system,which is conducive to the online application.Finally,we verify the effectiveness of the proposed method using an example of an IES coupled with three energy hub agents.
基金supported by the National Natural Science Foundation of China under Grant Nos.62201462 and 62271412.
文摘Device-to-device(D2D)communications underlying cellular networks enabled by unmanned aerial vehicles(UAV)have been regarded as promising techniques for next-generation communications.To mitigate the strong interference caused by the line-of-sight(LoS)airto-ground channels,we deploy a reconfigurable intelligent surface(RIS)to rebuild the wireless channels.A joint optimization problem of the transmit power of UAV,the transmit power of D2D users and the RIS phase configuration are investigated to maximize the achievable rate of D2D users while satisfying the quality of service(QoS)requirement of cellular users.Due to the high channel dynamics and the coupling among cellular users,the RIS,and the D2D users,it is challenging to find a proper solution.Thus,a RIS softmax deep double deterministic(RIS-SD3)policy gradient method is proposed,which can smooth the optimization space as well as reduce the number of local optimizations.Specifically,the SD3 algorithm maximizes the reward of the agent by training the agent to maximize the value function after the softmax operator is introduced.Simulation results show that the proposed RIS-SD3 algorithm can significantly improve the rate of the D2D users while controlling the interference to the cellular user.Moreover,the proposed RIS-SD3 algorithm has better robustness than the twin delayed deep deterministic(TD3)policy gradient algorithm in a dynamic environment.