The problem of passive detection discussed in this paper involves searching and locating an aerial emitter by dualaircraft using passive radars. In order to improve the detection probability and accuracy, a fuzzy Q le...The problem of passive detection discussed in this paper involves searching and locating an aerial emitter by dualaircraft using passive radars. In order to improve the detection probability and accuracy, a fuzzy Q learning algorithrn for dual-aircraft flight path planning is proposed. The passive detection task model of the dual-aircraft is set up based on the partition of the target active radar's radiation area. The problem is formulated as a Markov decision process (MDP) by using the fuzzy theory to make a generalization of the state space and defining the transition functions, action space and reward function properly. Details of the path planning algorithm are presented. Simulation results indicate that the algorithm can provide adaptive strategies for dual-aircraft to control their flight paths to detect a non-maneuvering or maneu- vering target.展开更多
日益频繁的鸟类活动给输电线路的安全运行带来了极大威胁,而现有拟声驱鸟装置由于缺乏智能性,无法长期有效驱鸟.为了解决上述问题,本文提出基于改进Q⁃learning算法的拟声驱鸟策略.首先,为了评价各音频的驱鸟效果,结合模糊理论,将鸟类听...日益频繁的鸟类活动给输电线路的安全运行带来了极大威胁,而现有拟声驱鸟装置由于缺乏智能性,无法长期有效驱鸟.为了解决上述问题,本文提出基于改进Q⁃learning算法的拟声驱鸟策略.首先,为了评价各音频的驱鸟效果,结合模糊理论,将鸟类听到音频后的动作行为量化为不同鸟类反应类型.然后,设计单一音频驱鸟实验,统计各音频驱鸟效果数据,得到各音频的初始权重值,为拟声驱鸟装置的音频选择提供实验依据.为了使计算所得的音频权重值更符合实际实验情况,对CRITIC(Criteria Impor⁃tance Though Intercrieria Correlation)方法的权重计算公式进行了优化.最后,使用实验所得音频权重值对Q⁃learning算法进行改进,并设计与其他拟声驱鸟策略的对比实验,实验数据显示改进Q⁃learning算法的拟声驱鸟策略驱鸟效果优于其他三种驱鸟策略,收敛速度快,驱鸟效果稳定,能够降低鸟类的适应性.展开更多
For multi-agent reinforcement learning in Markov games, knowledge extraction and sharing are key research problems. State list extracting means to calculate the optimal shared state path from state trajectories with c...For multi-agent reinforcement learning in Markov games, knowledge extraction and sharing are key research problems. State list extracting means to calculate the optimal shared state path from state trajectories with cycles. A state list extracting algorithm checks cyclic state lists of a current state in the state trajectory, condensing the optimal action set of the current state. By reinforcing the optimal action selected, the action policy of cyclic states is optimized gradually. The state list extracting is repeatedly learned and used as the experience knowledge which is shared by teams. Agents speed up the rate of convergence by experience sharing. Competition games of preys and predators are used for the experiments. The results of experiments prove that the proposed algorithms overcome the lack of experience in the initial stage, speed up learning and improve the performance.展开更多
基金supported by the National Natural Science Foundation of China(60874040)
文摘The problem of passive detection discussed in this paper involves searching and locating an aerial emitter by dualaircraft using passive radars. In order to improve the detection probability and accuracy, a fuzzy Q learning algorithrn for dual-aircraft flight path planning is proposed. The passive detection task model of the dual-aircraft is set up based on the partition of the target active radar's radiation area. The problem is formulated as a Markov decision process (MDP) by using the fuzzy theory to make a generalization of the state space and defining the transition functions, action space and reward function properly. Details of the path planning algorithm are presented. Simulation results indicate that the algorithm can provide adaptive strategies for dual-aircraft to control their flight paths to detect a non-maneuvering or maneu- vering target.
文摘日益频繁的鸟类活动给输电线路的安全运行带来了极大威胁,而现有拟声驱鸟装置由于缺乏智能性,无法长期有效驱鸟.为了解决上述问题,本文提出基于改进Q⁃learning算法的拟声驱鸟策略.首先,为了评价各音频的驱鸟效果,结合模糊理论,将鸟类听到音频后的动作行为量化为不同鸟类反应类型.然后,设计单一音频驱鸟实验,统计各音频驱鸟效果数据,得到各音频的初始权重值,为拟声驱鸟装置的音频选择提供实验依据.为了使计算所得的音频权重值更符合实际实验情况,对CRITIC(Criteria Impor⁃tance Though Intercrieria Correlation)方法的权重计算公式进行了优化.最后,使用实验所得音频权重值对Q⁃learning算法进行改进,并设计与其他拟声驱鸟策略的对比实验,实验数据显示改进Q⁃learning算法的拟声驱鸟策略驱鸟效果优于其他三种驱鸟策略,收敛速度快,驱鸟效果稳定,能够降低鸟类的适应性.
基金supported by the National Natural Science Foundation of China (61070143 61173088)
文摘For multi-agent reinforcement learning in Markov games, knowledge extraction and sharing are key research problems. State list extracting means to calculate the optimal shared state path from state trajectories with cycles. A state list extracting algorithm checks cyclic state lists of a current state in the state trajectory, condensing the optimal action set of the current state. By reinforcing the optimal action selected, the action policy of cyclic states is optimized gradually. The state list extracting is repeatedly learned and used as the experience knowledge which is shared by teams. Agents speed up the rate of convergence by experience sharing. Competition games of preys and predators are used for the experiments. The results of experiments prove that the proposed algorithms overcome the lack of experience in the initial stage, speed up learning and improve the performance.