Future unmanned battles desperately require intelli-gent combat policies,and multi-agent reinforcement learning offers a promising solution.However,due to the complexity of combat operations and large size of the comb...Future unmanned battles desperately require intelli-gent combat policies,and multi-agent reinforcement learning offers a promising solution.However,due to the complexity of combat operations and large size of the combat group,this task suffers from credit assignment problem more than other rein-forcement learning tasks.This study uses reward shaping to relieve the credit assignment problem and improve policy train-ing for the new generation of large-scale unmanned combat operations.We first prove that multiple reward shaping func-tions would not change the Nash Equilibrium in stochastic games,providing theoretical support for their use.According to the characteristics of combat operations,we propose tactical reward shaping(TRS)that comprises maneuver shaping advice and threat assessment-based attack shaping advice.Then,we investigate the effects of different types and combinations of shaping advice on combat policies through experiments.The results show that TRS improves both the efficiency and attack accuracy of combat policies,with the combination of maneuver reward shaping advice and ally-focused attack shaping advice achieving the best performance compared with that of the base-line strategy.展开更多
Shock wave caused by a sudden release of high-energy,such as explosion and blast,usually affects a significant range of areas.The utilization of a uniform fine mesh to capture sharp shock wave and to obtain precise re...Shock wave caused by a sudden release of high-energy,such as explosion and blast,usually affects a significant range of areas.The utilization of a uniform fine mesh to capture sharp shock wave and to obtain precise results is inefficient in terms of computational resource.This is particularly evident when large-scale fluid field simulations are conducted with significant differences in computational domain size.In this work,a variable-domain-size adaptive mesh enlargement(vAME)method is developed based on the proposed adaptive mesh enlargement(AME)method for modeling multi-explosives explosion problems.The vAME method reduces the division of numerous empty areas or unnecessary computational domains by adaptively suspending enlargement operation in one or two directions,rather than in all directions as in AME method.A series of numerical tests via AME and vAME with varying nonintegral enlargement ratios and different mesh numbers are simulated to verify the efficiency and order of accuracy.An estimate of speedup ratio is analyzed for further efficiency comparison.Several large-scale near-ground explosion experiments with single/multiple explosives are performed to analyze the shock wave superposition formed by the incident wave,reflected wave,and Mach wave.Additionally,the vAME method is employed to validate the accuracy,as well as to investigate the performance of the fluid field and shock wave propagation,considering explosive quantities ranging from 1 to 5 while maintaining a constant total mass.The results show a satisfactory correlation between the overpressure versus time curves for experiments and numerical simulations.The vAME method yields a competitive efficiency,increasing the computational speed to 3.0 and approximately 120,000 times in comparison to AME and the fully fine mesh method,respectively.It indicates that the vAME method reduces the computational cost with minimal impact on the results for such large-scale high-energy release problems with significant differences in computational domain size.展开更多
This paper addresses the time-varying formation-containment(FC) problem for nonholonomic multi-agent systems with a desired trajectory constraint, where only the leaders can acquire information about the desired traje...This paper addresses the time-varying formation-containment(FC) problem for nonholonomic multi-agent systems with a desired trajectory constraint, where only the leaders can acquire information about the desired trajectory. Input the fixed time-varying formation template to the leader and start executing, this process also needs to track the desired trajectory, and the follower needs to converge to the convex hull that the leader crosses. Firstly, the dynamic models of nonholonomic systems are linearized to second-order dynamics. Then, based on the desired trajectory and formation template, the FC control protocols are proposed. Sufficient conditions to achieve FC are introduced and an algorithm is proposed to resolve the control parameters by solving an algebraic Riccati equation. The system is demonstrated to achieve FC, with the average position and velocity of the leaders converging asymptotically to the desired trajectory. Finally, the theoretical achievements are verified in simulations by a multi-agent system composed of virtual human individuals.展开更多
Accurate positioning is one of the essential requirements for numerous applications of remote sensing data,especially in the event of a noisy or unreliable satellite signal.Toward this end,we present a novel framework...Accurate positioning is one of the essential requirements for numerous applications of remote sensing data,especially in the event of a noisy or unreliable satellite signal.Toward this end,we present a novel framework for aircraft geo-localization in a large range that only requires a downward-facing monocular camera,an altimeter,a compass,and an open-source Vector Map(VMAP).The algorithm combines the matching and particle filter methods.Shape vector and correlation between two building contour vectors are defined,and a coarse-to-fine building vector matching(CFBVM)method is proposed in the matching stage,for which the original matching results are described by the Gaussian mixture model(GMM).Subsequently,an improved resampling strategy is designed to reduce computing expenses with a huge number of initial particles,and a credibility indicator is designed to avoid location mistakes in the particle filter stage.An experimental evaluation of the approach based on flight data is provided.On a flight at a height of 0.2 km over a flight distance of 2 km,the aircraft is geo-localized in a reference map of 11,025 km~2using 0.09 km~2aerial images without any prior information.The absolute localization error is less than 10 m.展开更多
As an important mechanism in multi-agent interaction,communication can make agents form complex team relationships rather than constitute a simple set of multiple independent agents.However,the existing communication ...As an important mechanism in multi-agent interaction,communication can make agents form complex team relationships rather than constitute a simple set of multiple independent agents.However,the existing communication schemes can bring much timing redundancy and irrelevant messages,which seriously affects their practical application.To solve this problem,this paper proposes a targeted multiagent communication algorithm based on state control(SCTC).The SCTC uses a gating mechanism based on state control to reduce the timing redundancy of communication between agents and determines the interaction relationship between agents and the importance weight of a communication message through a series connection of hard-and self-attention mechanisms,realizing targeted communication message processing.In addition,by minimizing the difference between the fusion message generated from a real communication message of each agent and a fusion message generated from the buffered message,the correctness of the final action choice of the agent is ensured.Our evaluation using a challenging set of Star Craft II benchmarks indicates that the SCTC can significantly improve the learning performance and reduce the communication overhead between agents,thus ensuring better cooperation between agents.展开更多
文摘Future unmanned battles desperately require intelli-gent combat policies,and multi-agent reinforcement learning offers a promising solution.However,due to the complexity of combat operations and large size of the combat group,this task suffers from credit assignment problem more than other rein-forcement learning tasks.This study uses reward shaping to relieve the credit assignment problem and improve policy train-ing for the new generation of large-scale unmanned combat operations.We first prove that multiple reward shaping func-tions would not change the Nash Equilibrium in stochastic games,providing theoretical support for their use.According to the characteristics of combat operations,we propose tactical reward shaping(TRS)that comprises maneuver shaping advice and threat assessment-based attack shaping advice.Then,we investigate the effects of different types and combinations of shaping advice on combat policies through experiments.The results show that TRS improves both the efficiency and attack accuracy of combat policies,with the combination of maneuver reward shaping advice and ally-focused attack shaping advice achieving the best performance compared with that of the base-line strategy.
基金supported by the National Natural Science Foundation of China(Grant Nos.12302435 and 12221002)。
文摘Shock wave caused by a sudden release of high-energy,such as explosion and blast,usually affects a significant range of areas.The utilization of a uniform fine mesh to capture sharp shock wave and to obtain precise results is inefficient in terms of computational resource.This is particularly evident when large-scale fluid field simulations are conducted with significant differences in computational domain size.In this work,a variable-domain-size adaptive mesh enlargement(vAME)method is developed based on the proposed adaptive mesh enlargement(AME)method for modeling multi-explosives explosion problems.The vAME method reduces the division of numerous empty areas or unnecessary computational domains by adaptively suspending enlargement operation in one or two directions,rather than in all directions as in AME method.A series of numerical tests via AME and vAME with varying nonintegral enlargement ratios and different mesh numbers are simulated to verify the efficiency and order of accuracy.An estimate of speedup ratio is analyzed for further efficiency comparison.Several large-scale near-ground explosion experiments with single/multiple explosives are performed to analyze the shock wave superposition formed by the incident wave,reflected wave,and Mach wave.Additionally,the vAME method is employed to validate the accuracy,as well as to investigate the performance of the fluid field and shock wave propagation,considering explosive quantities ranging from 1 to 5 while maintaining a constant total mass.The results show a satisfactory correlation between the overpressure versus time curves for experiments and numerical simulations.The vAME method yields a competitive efficiency,increasing the computational speed to 3.0 and approximately 120,000 times in comparison to AME and the fully fine mesh method,respectively.It indicates that the vAME method reduces the computational cost with minimal impact on the results for such large-scale high-energy release problems with significant differences in computational domain size.
文摘This paper addresses the time-varying formation-containment(FC) problem for nonholonomic multi-agent systems with a desired trajectory constraint, where only the leaders can acquire information about the desired trajectory. Input the fixed time-varying formation template to the leader and start executing, this process also needs to track the desired trajectory, and the follower needs to converge to the convex hull that the leader crosses. Firstly, the dynamic models of nonholonomic systems are linearized to second-order dynamics. Then, based on the desired trajectory and formation template, the FC control protocols are proposed. Sufficient conditions to achieve FC are introduced and an algorithm is proposed to resolve the control parameters by solving an algebraic Riccati equation. The system is demonstrated to achieve FC, with the average position and velocity of the leaders converging asymptotically to the desired trajectory. Finally, the theoretical achievements are verified in simulations by a multi-agent system composed of virtual human individuals.
文摘Accurate positioning is one of the essential requirements for numerous applications of remote sensing data,especially in the event of a noisy or unreliable satellite signal.Toward this end,we present a novel framework for aircraft geo-localization in a large range that only requires a downward-facing monocular camera,an altimeter,a compass,and an open-source Vector Map(VMAP).The algorithm combines the matching and particle filter methods.Shape vector and correlation between two building contour vectors are defined,and a coarse-to-fine building vector matching(CFBVM)method is proposed in the matching stage,for which the original matching results are described by the Gaussian mixture model(GMM).Subsequently,an improved resampling strategy is designed to reduce computing expenses with a huge number of initial particles,and a credibility indicator is designed to avoid location mistakes in the particle filter stage.An experimental evaluation of the approach based on flight data is provided.On a flight at a height of 0.2 km over a flight distance of 2 km,the aircraft is geo-localized in a reference map of 11,025 km~2using 0.09 km~2aerial images without any prior information.The absolute localization error is less than 10 m.
文摘As an important mechanism in multi-agent interaction,communication can make agents form complex team relationships rather than constitute a simple set of multiple independent agents.However,the existing communication schemes can bring much timing redundancy and irrelevant messages,which seriously affects their practical application.To solve this problem,this paper proposes a targeted multiagent communication algorithm based on state control(SCTC).The SCTC uses a gating mechanism based on state control to reduce the timing redundancy of communication between agents and determines the interaction relationship between agents and the importance weight of a communication message through a series connection of hard-and self-attention mechanisms,realizing targeted communication message processing.In addition,by minimizing the difference between the fusion message generated from a real communication message of each agent and a fusion message generated from the buffered message,the correctness of the final action choice of the agent is ensured.Our evaluation using a challenging set of Star Craft II benchmarks indicates that the SCTC can significantly improve the learning performance and reduce the communication overhead between agents,thus ensuring better cooperation between agents.