This paper addresses the time-varying formation-containment(FC) problem for nonholonomic multi-agent systems with a desired trajectory constraint, where only the leaders can acquire information about the desired traje...This paper addresses the time-varying formation-containment(FC) problem for nonholonomic multi-agent systems with a desired trajectory constraint, where only the leaders can acquire information about the desired trajectory. Input the fixed time-varying formation template to the leader and start executing, this process also needs to track the desired trajectory, and the follower needs to converge to the convex hull that the leader crosses. Firstly, the dynamic models of nonholonomic systems are linearized to second-order dynamics. Then, based on the desired trajectory and formation template, the FC control protocols are proposed. Sufficient conditions to achieve FC are introduced and an algorithm is proposed to resolve the control parameters by solving an algebraic Riccati equation. The system is demonstrated to achieve FC, with the average position and velocity of the leaders converging asymptotically to the desired trajectory. Finally, the theoretical achievements are verified in simulations by a multi-agent system composed of virtual human individuals.展开更多
As the ability of a single agent is limited while information and resources in multi-agent systems are distributed, cooperation is necessary for agents to accomplish a complex task. In the open and changeable environm...As the ability of a single agent is limited while information and resources in multi-agent systems are distributed, cooperation is necessary for agents to accomplish a complex task. In the open and changeable environment on the Internet, it is of great significance to research a system flexible and capable in dynamic evolution that can find a collaboration method for agents which can be used in dynamic evolution process. With such a method, agents accomplish tasks for an overall target and at the same time, the collaborative relationship of agents can be adjusted with the change of environment. A method of task decomposition and collaboration of agents by improved contract net protocol is introduced. Finally, analysis on the result of the experiments is performed to verify the improved contract net protocol can greatly increase the efficiency of communication and collaboration in multi-agent system.展开更多
For multi-agent reinforcement learning in Markov games, knowledge extraction and sharing are key research problems. State list extracting means to calculate the optimal shared state path from state trajectories with c...For multi-agent reinforcement learning in Markov games, knowledge extraction and sharing are key research problems. State list extracting means to calculate the optimal shared state path from state trajectories with cycles. A state list extracting algorithm checks cyclic state lists of a current state in the state trajectory, condensing the optimal action set of the current state. By reinforcing the optimal action selected, the action policy of cyclic states is optimized gradually. The state list extracting is repeatedly learned and used as the experience knowledge which is shared by teams. Agents speed up the rate of convergence by experience sharing. Competition games of preys and predators are used for the experiments. The results of experiments prove that the proposed algorithms overcome the lack of experience in the initial stage, speed up learning and improve the performance.展开更多
Future unmanned battles desperately require intelli-gent combat policies,and multi-agent reinforcement learning offers a promising solution.However,due to the complexity of combat operations and large size of the comb...Future unmanned battles desperately require intelli-gent combat policies,and multi-agent reinforcement learning offers a promising solution.However,due to the complexity of combat operations and large size of the combat group,this task suffers from credit assignment problem more than other rein-forcement learning tasks.This study uses reward shaping to relieve the credit assignment problem and improve policy train-ing for the new generation of large-scale unmanned combat operations.We first prove that multiple reward shaping func-tions would not change the Nash Equilibrium in stochastic games,providing theoretical support for their use.According to the characteristics of combat operations,we propose tactical reward shaping(TRS)that comprises maneuver shaping advice and threat assessment-based attack shaping advice.Then,we investigate the effects of different types and combinations of shaping advice on combat policies through experiments.The results show that TRS improves both the efficiency and attack accuracy of combat policies,with the combination of maneuver reward shaping advice and ally-focused attack shaping advice achieving the best performance compared with that of the base-line strategy.展开更多
As an important mechanism in multi-agent interaction,communication can make agents form complex team relationships rather than constitute a simple set of multiple independent agents.However,the existing communication ...As an important mechanism in multi-agent interaction,communication can make agents form complex team relationships rather than constitute a simple set of multiple independent agents.However,the existing communication schemes can bring much timing redundancy and irrelevant messages,which seriously affects their practical application.To solve this problem,this paper proposes a targeted multiagent communication algorithm based on state control(SCTC).The SCTC uses a gating mechanism based on state control to reduce the timing redundancy of communication between agents and determines the interaction relationship between agents and the importance weight of a communication message through a series connection of hard-and self-attention mechanisms,realizing targeted communication message processing.In addition,by minimizing the difference between the fusion message generated from a real communication message of each agent and a fusion message generated from the buffered message,the correctness of the final action choice of the agent is ensured.Our evaluation using a challenging set of Star Craft II benchmarks indicates that the SCTC can significantly improve the learning performance and reduce the communication overhead between agents,thus ensuring better cooperation between agents.展开更多
To solve the problem of multi-platform collaborative use in anti-ship missile (ASM) path planning, this paper pro-posed multi-operator real-time constraints particle swarm opti-mization (MRC-PSO) algorithm. MRC-PSO al...To solve the problem of multi-platform collaborative use in anti-ship missile (ASM) path planning, this paper pro-posed multi-operator real-time constraints particle swarm opti-mization (MRC-PSO) algorithm. MRC-PSO algorithm utilizes a semi-rasterization environment modeling technique and inte-grates the geometric gradient law of ASMs which distinguishes itself from other collaborative path planning algorithms by fully considering the coupling between collaborative paths. Then, MRC-PSO algorithm conducts chunked stepwise recursive evo-lution of particles while incorporating circumvent, coordination, and smoothing operators which facilitates local selection opti-mization of paths, gradually reducing algorithmic space, accele-rating convergence, and enhances path cooperativity. Simula-tion experiments comparing the MRC-PSO algorithm with the PSO algorithm, genetic algorithm and operational area cluster real-time restriction (OACRR)-PSO algorithm, which demon-strate that the MRC-PSO algorithm has a faster convergence speed, and the average number of iterations is reduced by approximately 75%. It also proves that it is equally effective in resolving complex scenarios involving multiple obstacles. More-over it effectively addresses the problem of path crossing and can better satisfy the requirements of multi-platform collabora-tive path planning. The experiments are conducted in three col-laborative operation modes, namely, three-to-two, three-to-three, and four-to-two, and the outcomes demonstrate that the algorithm possesses strong universality.展开更多
With the rapid development of low-altitude economy and unmanned aerial vehicles (UAVs) deployment technology, aerial-ground collaborative delivery (AGCD) is emerging as a novel mode of last-mile delivery, where the ve...With the rapid development of low-altitude economy and unmanned aerial vehicles (UAVs) deployment technology, aerial-ground collaborative delivery (AGCD) is emerging as a novel mode of last-mile delivery, where the vehicle and its onboard UAVs are utilized efficiently. Vehicles not only provide delivery services to customers but also function as mobile ware-houses and launch/recovery platforms for UAVs. This paper addresses the vehicle routing problem with UAVs considering time window and UAV multi-delivery (VRPU-TW&MD). A mixed integer linear programming (MILP) model is developed to mini-mize delivery costs while incorporating constraints related to UAV energy consumption. Subsequently, a micro-evolution aug-mented large neighborhood search (MEALNS) algorithm incor-porating adaptive large neighborhood search (ALNS) and micro-evolution mechanism is proposed. Numerical experiments demonstrate the effectiveness of both the model and algorithm in solving the VRPU-TW&MD. The impact of key parameters on delivery performance is explored by sensitivity analysis.展开更多
The rapid evolution of unmanned aerial vehicle(UAV)technology and autonomous capabilities has positioned UAV as promising last-mile delivery means.Vehicle and onboard UAV collaborative delivery is introduced as a nove...The rapid evolution of unmanned aerial vehicle(UAV)technology and autonomous capabilities has positioned UAV as promising last-mile delivery means.Vehicle and onboard UAV collaborative delivery is introduced as a novel delivery mode.Spatiotemporal collaboration,along with energy consumption with payload and wind conditions play important roles in delivery route planning.This paper introduces the traveling salesman problem with time window and onboard UAV(TSPTWOUAV)and emphasizes the consideration of real-world scenarios,focusing on time collaboration and energy consumption with wind and payload.To address this,a mixed integer linear programming(MILP)model is formulated to minimize the energy consumption costs of vehicle and UAV.Furthermore,an adaptive large neighborhood search(ALNS)algorithm is applied to identify high-quality solutions efficiently.The effectiveness of the proposed model and algorithm is validated through numerical tests on real geographic instances and sensitivity analysis of key parameters is conducted.展开更多
文摘This paper addresses the time-varying formation-containment(FC) problem for nonholonomic multi-agent systems with a desired trajectory constraint, where only the leaders can acquire information about the desired trajectory. Input the fixed time-varying formation template to the leader and start executing, this process also needs to track the desired trajectory, and the follower needs to converge to the convex hull that the leader crosses. Firstly, the dynamic models of nonholonomic systems are linearized to second-order dynamics. Then, based on the desired trajectory and formation template, the FC control protocols are proposed. Sufficient conditions to achieve FC are introduced and an algorithm is proposed to resolve the control parameters by solving an algebraic Riccati equation. The system is demonstrated to achieve FC, with the average position and velocity of the leaders converging asymptotically to the desired trajectory. Finally, the theoretical achievements are verified in simulations by a multi-agent system composed of virtual human individuals.
基金Projects(61173026,61373045,61202039)supported by the National Natural Science Foundation of ChinaProjects(K5051223008,BDY221411)supported by the Fundamental Research Funds for the Central Universities of ChinaProject(2012AA02A603)supported by the High-Tech Research and Development Program of China
文摘As the ability of a single agent is limited while information and resources in multi-agent systems are distributed, cooperation is necessary for agents to accomplish a complex task. In the open and changeable environment on the Internet, it is of great significance to research a system flexible and capable in dynamic evolution that can find a collaboration method for agents which can be used in dynamic evolution process. With such a method, agents accomplish tasks for an overall target and at the same time, the collaborative relationship of agents can be adjusted with the change of environment. A method of task decomposition and collaboration of agents by improved contract net protocol is introduced. Finally, analysis on the result of the experiments is performed to verify the improved contract net protocol can greatly increase the efficiency of communication and collaboration in multi-agent system.
基金supported by the National Natural Science Foundation of China (61070143 61173088)
文摘For multi-agent reinforcement learning in Markov games, knowledge extraction and sharing are key research problems. State list extracting means to calculate the optimal shared state path from state trajectories with cycles. A state list extracting algorithm checks cyclic state lists of a current state in the state trajectory, condensing the optimal action set of the current state. By reinforcing the optimal action selected, the action policy of cyclic states is optimized gradually. The state list extracting is repeatedly learned and used as the experience knowledge which is shared by teams. Agents speed up the rate of convergence by experience sharing. Competition games of preys and predators are used for the experiments. The results of experiments prove that the proposed algorithms overcome the lack of experience in the initial stage, speed up learning and improve the performance.
文摘Future unmanned battles desperately require intelli-gent combat policies,and multi-agent reinforcement learning offers a promising solution.However,due to the complexity of combat operations and large size of the combat group,this task suffers from credit assignment problem more than other rein-forcement learning tasks.This study uses reward shaping to relieve the credit assignment problem and improve policy train-ing for the new generation of large-scale unmanned combat operations.We first prove that multiple reward shaping func-tions would not change the Nash Equilibrium in stochastic games,providing theoretical support for their use.According to the characteristics of combat operations,we propose tactical reward shaping(TRS)that comprises maneuver shaping advice and threat assessment-based attack shaping advice.Then,we investigate the effects of different types and combinations of shaping advice on combat policies through experiments.The results show that TRS improves both the efficiency and attack accuracy of combat policies,with the combination of maneuver reward shaping advice and ally-focused attack shaping advice achieving the best performance compared with that of the base-line strategy.
文摘As an important mechanism in multi-agent interaction,communication can make agents form complex team relationships rather than constitute a simple set of multiple independent agents.However,the existing communication schemes can bring much timing redundancy and irrelevant messages,which seriously affects their practical application.To solve this problem,this paper proposes a targeted multiagent communication algorithm based on state control(SCTC).The SCTC uses a gating mechanism based on state control to reduce the timing redundancy of communication between agents and determines the interaction relationship between agents and the importance weight of a communication message through a series connection of hard-and self-attention mechanisms,realizing targeted communication message processing.In addition,by minimizing the difference between the fusion message generated from a real communication message of each agent and a fusion message generated from the buffered message,the correctness of the final action choice of the agent is ensured.Our evaluation using a challenging set of Star Craft II benchmarks indicates that the SCTC can significantly improve the learning performance and reduce the communication overhead between agents,thus ensuring better cooperation between agents.
基金supported by Hunan Provincial Natural Science Foundation(2024JJ5173,2023JJ50047)Hunan Provincial Department of Education Scientific Research Project(23A0494)Hunan Provincial Innovation Foundation for Postgraduate(CX20231221).
文摘To solve the problem of multi-platform collaborative use in anti-ship missile (ASM) path planning, this paper pro-posed multi-operator real-time constraints particle swarm opti-mization (MRC-PSO) algorithm. MRC-PSO algorithm utilizes a semi-rasterization environment modeling technique and inte-grates the geometric gradient law of ASMs which distinguishes itself from other collaborative path planning algorithms by fully considering the coupling between collaborative paths. Then, MRC-PSO algorithm conducts chunked stepwise recursive evo-lution of particles while incorporating circumvent, coordination, and smoothing operators which facilitates local selection opti-mization of paths, gradually reducing algorithmic space, accele-rating convergence, and enhances path cooperativity. Simula-tion experiments comparing the MRC-PSO algorithm with the PSO algorithm, genetic algorithm and operational area cluster real-time restriction (OACRR)-PSO algorithm, which demon-strate that the MRC-PSO algorithm has a faster convergence speed, and the average number of iterations is reduced by approximately 75%. It also proves that it is equally effective in resolving complex scenarios involving multiple obstacles. More-over it effectively addresses the problem of path crossing and can better satisfy the requirements of multi-platform collabora-tive path planning. The experiments are conducted in three col-laborative operation modes, namely, three-to-two, three-to-three, and four-to-two, and the outcomes demonstrate that the algorithm possesses strong universality.
基金supported by the Fundamental Research Funds for the Central Universities(2024JBZX038)the National Natural Science Foundation of China(62076023).
文摘With the rapid development of low-altitude economy and unmanned aerial vehicles (UAVs) deployment technology, aerial-ground collaborative delivery (AGCD) is emerging as a novel mode of last-mile delivery, where the vehicle and its onboard UAVs are utilized efficiently. Vehicles not only provide delivery services to customers but also function as mobile ware-houses and launch/recovery platforms for UAVs. This paper addresses the vehicle routing problem with UAVs considering time window and UAV multi-delivery (VRPU-TW&MD). A mixed integer linear programming (MILP) model is developed to mini-mize delivery costs while incorporating constraints related to UAV energy consumption. Subsequently, a micro-evolution aug-mented large neighborhood search (MEALNS) algorithm incor-porating adaptive large neighborhood search (ALNS) and micro-evolution mechanism is proposed. Numerical experiments demonstrate the effectiveness of both the model and algorithm in solving the VRPU-TW&MD. The impact of key parameters on delivery performance is explored by sensitivity analysis.
基金Fundamental Research Funds for the Central Universities(2024JBZX038)National Natural Science F oundation of China(62076023)。
文摘The rapid evolution of unmanned aerial vehicle(UAV)technology and autonomous capabilities has positioned UAV as promising last-mile delivery means.Vehicle and onboard UAV collaborative delivery is introduced as a novel delivery mode.Spatiotemporal collaboration,along with energy consumption with payload and wind conditions play important roles in delivery route planning.This paper introduces the traveling salesman problem with time window and onboard UAV(TSPTWOUAV)and emphasizes the consideration of real-world scenarios,focusing on time collaboration and energy consumption with wind and payload.To address this,a mixed integer linear programming(MILP)model is formulated to minimize the energy consumption costs of vehicle and UAV.Furthermore,an adaptive large neighborhood search(ALNS)algorithm is applied to identify high-quality solutions efficiently.The effectiveness of the proposed model and algorithm is validated through numerical tests on real geographic instances and sensitivity analysis of key parameters is conducted.