针对现有基于深度确定性策略梯度(deep deterministic policy gradient,DDPG)算法的再入制导方法计算精度较差,对强扰动条件适应性不足等问题,在DDPG算法训练框架的基础上,提出一种基于长短期记忆-DDPG(long short term memory-DDPG,LST...针对现有基于深度确定性策略梯度(deep deterministic policy gradient,DDPG)算法的再入制导方法计算精度较差,对强扰动条件适应性不足等问题,在DDPG算法训练框架的基础上,提出一种基于长短期记忆-DDPG(long short term memory-DDPG,LSTM-DDPG)的再入制导方法。该方法采用纵、侧向制导解耦设计思想,在纵向制导方面,首先针对再入制导问题构建强化学习所需的状态、动作空间;其次,确定决策点和制导周期内的指令计算策略,并设计考虑综合性能的奖励函数;然后,引入LSTM网络构建强化学习训练网络,进而通过在线更新策略提升算法的多任务适用性;侧向制导则采用基于横程误差的动态倾侧反转方法,获得倾侧角符号。以美国超音速通用飞行器(common aero vehicle-hypersonic,CAV-H)再入滑翔为例进行仿真,结果表明:与传统数值预测-校正方法相比,所提制导方法具有相当的终端精度和更高的计算效率优势;与现有基于DDPG算法的再入制导方法相比,所提制导方法具有相当的计算效率以及更高的终端精度和鲁棒性。展开更多
In the field of calculating the attack area of air-to-air missiles in modern air combat scenarios,the limitations of existing research,including real-time calculation,accuracy efficiency trade-off,and the absence of t...In the field of calculating the attack area of air-to-air missiles in modern air combat scenarios,the limitations of existing research,including real-time calculation,accuracy efficiency trade-off,and the absence of the three-dimensional attack area model,restrict their practical applications.To address these issues,an improved backtracking algorithm is proposed to improve calculation efficiency.A significant reduction in solution time and maintenance of accuracy in the three-dimensional attack area are achieved by using the proposed algorithm.Furthermore,the age-layered population structure genetic programming(ALPS-GP)algorithm is introduced to determine an analytical polynomial model of the three-dimensional attack area,considering real-time requirements.The accuracy of the polynomial model is enhanced through the coefficient correction using an improved gradient descent algorithm.The study reveals a remarkable combination of high accuracy and efficient real-time computation,with a mean error of 91.89 m using the analytical polynomial model of the three-dimensional attack area solved in just 10^(-4)s,thus meeting the requirements of real-time combat scenarios.展开更多
The deep deterministic policy gradient(DDPG)algo-rithm is an off-policy method that combines two mainstream reinforcement learning methods based on value iteration and policy iteration.Using the DDPG algorithm,agents ...The deep deterministic policy gradient(DDPG)algo-rithm is an off-policy method that combines two mainstream reinforcement learning methods based on value iteration and policy iteration.Using the DDPG algorithm,agents can explore and summarize the environment to achieve autonomous deci-sions in the continuous state space and action space.In this paper,a cooperative defense with DDPG via swarms of unmanned aerial vehicle(UAV)is developed and validated,which has shown promising practical value in the effect of defending.We solve the sparse rewards problem of reinforcement learning pair in a long-term task by building the reward function of UAV swarms and optimizing the learning process of artificial neural network based on the DDPG algorithm to reduce the vibration in the learning process.The experimental results show that the DDPG algorithm can guide the UAVs swarm to perform the defense task efficiently,meeting the requirements of a UAV swarm for non-centralization,autonomy,and promoting the intelligent development of UAVs swarm as well as the decision-making process.展开更多
A novel space-borne antenna adaptive anti-jamming method based on the genetic algorithm (GA), which is combined with gradient-like reproduction operators is presented, to search for the best weight for pattern synth...A novel space-borne antenna adaptive anti-jamming method based on the genetic algorithm (GA), which is combined with gradient-like reproduction operators is presented, to search for the best weight for pattern synthesis in radio frequency (RF). Combined, the GA's the capability of the whole searching is, but not limited by selection of the initial parameter, with the gradient algorithm's advantage of fast searching. The proposed method requires a smaller sized initial population and lower computational complexity. Therefore, it is flexible to implement this method in the real-time systems. By using the proposed algorithm, the designer can efficiently control both main-lobe shaping and side-lobe level. Simulation results based on the spot survey data show that the algorithm proposed is efficient and feasible.展开更多
文摘针对现有基于深度确定性策略梯度(deep deterministic policy gradient,DDPG)算法的再入制导方法计算精度较差,对强扰动条件适应性不足等问题,在DDPG算法训练框架的基础上,提出一种基于长短期记忆-DDPG(long short term memory-DDPG,LSTM-DDPG)的再入制导方法。该方法采用纵、侧向制导解耦设计思想,在纵向制导方面,首先针对再入制导问题构建强化学习所需的状态、动作空间;其次,确定决策点和制导周期内的指令计算策略,并设计考虑综合性能的奖励函数;然后,引入LSTM网络构建强化学习训练网络,进而通过在线更新策略提升算法的多任务适用性;侧向制导则采用基于横程误差的动态倾侧反转方法,获得倾侧角符号。以美国超音速通用飞行器(common aero vehicle-hypersonic,CAV-H)再入滑翔为例进行仿真,结果表明:与传统数值预测-校正方法相比,所提制导方法具有相当的终端精度和更高的计算效率优势;与现有基于DDPG算法的再入制导方法相比,所提制导方法具有相当的计算效率以及更高的终端精度和鲁棒性。
基金National Natural Science Foundation of China(62373187)Forward-looking Layout Special Projects(ILA220591A22)。
文摘In the field of calculating the attack area of air-to-air missiles in modern air combat scenarios,the limitations of existing research,including real-time calculation,accuracy efficiency trade-off,and the absence of the three-dimensional attack area model,restrict their practical applications.To address these issues,an improved backtracking algorithm is proposed to improve calculation efficiency.A significant reduction in solution time and maintenance of accuracy in the three-dimensional attack area are achieved by using the proposed algorithm.Furthermore,the age-layered population structure genetic programming(ALPS-GP)algorithm is introduced to determine an analytical polynomial model of the three-dimensional attack area,considering real-time requirements.The accuracy of the polynomial model is enhanced through the coefficient correction using an improved gradient descent algorithm.The study reveals a remarkable combination of high accuracy and efficient real-time computation,with a mean error of 91.89 m using the analytical polynomial model of the three-dimensional attack area solved in just 10^(-4)s,thus meeting the requirements of real-time combat scenarios.
基金supported by the Key Research and Development Program of Shaanxi(2022GY-089)the Natural Science Basic Research Program of Shaanxi(2022JQ-593).
文摘The deep deterministic policy gradient(DDPG)algo-rithm is an off-policy method that combines two mainstream reinforcement learning methods based on value iteration and policy iteration.Using the DDPG algorithm,agents can explore and summarize the environment to achieve autonomous deci-sions in the continuous state space and action space.In this paper,a cooperative defense with DDPG via swarms of unmanned aerial vehicle(UAV)is developed and validated,which has shown promising practical value in the effect of defending.We solve the sparse rewards problem of reinforcement learning pair in a long-term task by building the reward function of UAV swarms and optimizing the learning process of artificial neural network based on the DDPG algorithm to reduce the vibration in the learning process.The experimental results show that the DDPG algorithm can guide the UAVs swarm to perform the defense task efficiently,meeting the requirements of a UAV swarm for non-centralization,autonomy,and promoting the intelligent development of UAVs swarm as well as the decision-making process.
基金the National Natural Science Foundation of China (60502045).
文摘A novel space-borne antenna adaptive anti-jamming method based on the genetic algorithm (GA), which is combined with gradient-like reproduction operators is presented, to search for the best weight for pattern synthesis in radio frequency (RF). Combined, the GA's the capability of the whole searching is, but not limited by selection of the initial parameter, with the gradient algorithm's advantage of fast searching. The proposed method requires a smaller sized initial population and lower computational complexity. Therefore, it is flexible to implement this method in the real-time systems. By using the proposed algorithm, the designer can efficiently control both main-lobe shaping and side-lobe level. Simulation results based on the spot survey data show that the algorithm proposed is efficient and feasible.