This work proposes a recorded recurrent twin delayed deep deterministic(RRTD3)policy gradient algorithm to solve the challenge of constructing guidance laws for intercepting endoatmospheric maneuvering missiles with u...This work proposes a recorded recurrent twin delayed deep deterministic(RRTD3)policy gradient algorithm to solve the challenge of constructing guidance laws for intercepting endoatmospheric maneuvering missiles with uncertainties and observation noise.The attack-defense engagement scenario is modeled as a partially observable Markov decision process(POMDP).Given the benefits of recurrent neural networks(RNNs)in processing sequence information,an RNN layer is incorporated into the agent’s policy network to alleviate the bottleneck of traditional deep reinforcement learning methods while dealing with POMDPs.The measurements from the interceptor’s seeker during each guidance cycle are combined into one sequence as the input to the policy network since the detection frequency of an interceptor is usually higher than its guidance frequency.During training,the hidden states of the RNN layer in the policy network are recorded to overcome the partially observable problem that this RNN layer causes inside the agent.The training curves show that the proposed RRTD3 successfully enhances data efficiency,training speed,and training stability.The test results confirm the advantages of the RRTD3-based guidance laws over some conventional guidance laws.展开更多
Aiming at the characteristics of multi-stage and(extremely)small samples of the identification problem of key effectiveness indexes of weapon equipment system-of-systems(WESoS),a Bayesian intelligent identification an...Aiming at the characteristics of multi-stage and(extremely)small samples of the identification problem of key effectiveness indexes of weapon equipment system-of-systems(WESoS),a Bayesian intelligent identification and inference model for system effectiveness assessment indexes based on dynamic grey incidence is proposed.The method uses multi-layer Bayesian techniques,makes full use of historical statistics and empirical information,and determines the Bayesian estima-tion of the incidence degree of indexes,which effectively solves the difficulties of small sample size of effectiveness indexes and difficulty in obtaining incidence rules between indexes.Sec-ondly,The method quantifies the incidence relationship between evaluation indexes and combat effectiveness based on Bayesian posterior grey incidence,and then identifies key system effec-tiveness evaluation indexes.Finally,the proposed method is applied to a case of screening key effectiveness indexes of a missile defensive system,and the analysis results show that the proposed method can fuse multi-moment information and extract multi-stage key indexes,and has good data extraction capability in the case of small samples.展开更多
基金supported by the National Natural Science Foundation of China(Grant No.12072090)。
文摘This work proposes a recorded recurrent twin delayed deep deterministic(RRTD3)policy gradient algorithm to solve the challenge of constructing guidance laws for intercepting endoatmospheric maneuvering missiles with uncertainties and observation noise.The attack-defense engagement scenario is modeled as a partially observable Markov decision process(POMDP).Given the benefits of recurrent neural networks(RNNs)in processing sequence information,an RNN layer is incorporated into the agent’s policy network to alleviate the bottleneck of traditional deep reinforcement learning methods while dealing with POMDPs.The measurements from the interceptor’s seeker during each guidance cycle are combined into one sequence as the input to the policy network since the detection frequency of an interceptor is usually higher than its guidance frequency.During training,the hidden states of the RNN layer in the policy network are recorded to overcome the partially observable problem that this RNN layer causes inside the agent.The training curves show that the proposed RRTD3 successfully enhances data efficiency,training speed,and training stability.The test results confirm the advantages of the RRTD3-based guidance laws over some conventional guidance laws.
基金supported by the National Natural Science Foundation of China(72271124,72071111).
文摘Aiming at the characteristics of multi-stage and(extremely)small samples of the identification problem of key effectiveness indexes of weapon equipment system-of-systems(WESoS),a Bayesian intelligent identification and inference model for system effectiveness assessment indexes based on dynamic grey incidence is proposed.The method uses multi-layer Bayesian techniques,makes full use of historical statistics and empirical information,and determines the Bayesian estima-tion of the incidence degree of indexes,which effectively solves the difficulties of small sample size of effectiveness indexes and difficulty in obtaining incidence rules between indexes.Sec-ondly,The method quantifies the incidence relationship between evaluation indexes and combat effectiveness based on Bayesian posterior grey incidence,and then identifies key system effec-tiveness evaluation indexes.Finally,the proposed method is applied to a case of screening key effectiveness indexes of a missile defensive system,and the analysis results show that the proposed method can fuse multi-moment information and extract multi-stage key indexes,and has good data extraction capability in the case of small samples.