Abstract
The team of Prof. Han Zhou from the National University of Defense Technology published a knowledge-enhanced DRL approach for multi-agent pursuit-evasion at IROS 2025. The NOKOV motion capture system provided the position and velocity data of multiple Crazyflie UAVs to support the validation of the proposed algorithm.
The article entitled “Emergent Cooperative Strategies for Pursuit-Evasion in Cluttered Environments: A Knowledge-Enhanced Multi-Agent Deep Reinforcement Learning Approach” at IROS 2025 proposes a knowledge-enhanced DRL method for cooperative pursuit-evasion in complex environments and validates its efficiency and superiority through extensive numerical simulations and real-world experiments. The NOKOV optical motion capture system provided high-precision position and velocity data of Crazyflie UAVs in real-world experiments, enabling the verification of the proposed algorithm.
Background
To enhance the autonomy and adaptability of multi-agent systems in cooperative pursuit tasks, model-free deep reinforcement learning (DRL) has emerged as a promising alternative. However, most existing DRL-based pursuit approaches still rely on individual rewards and struggle in complex scenarios.
Contributions
To foster collaborative behaviors among perceptually limited pursuers in cluttered environments, this paper proposes a team reward based knowledge-enhanced multi-agent twin delayed deep deterministic policy gradient (KE-MATD3) algorithm. The main contributions are summarized as follows:
1. A team reward based MADRL approach is proposed for multi-agent cooperative pursuit in cluttered environments, where the task is modeled as a decentralized partially observable Markov decision process.
2. A knowledge-enhanced (KE) mechanism is introduced to leverage insights from an improved artificial potential field (IAPF) method, thereby facilitating the learning of challenging team rewards.
3. The emergence of cooperative behaviors among pursuers is verified through both simulation and physical experiments.
System Framework for Cooperative Pursuit Tasks
(a) Multi-agent pursuit-evasion environment. (b) The proposed KE-MATD3 algorithm.
Numerical Simulation Experiments
In numerical simulations, the proposed KE-MATD3 algorithm was compared with several baseline algorithms, including MATD3, MADDPG, MADDQN and their variants.
Results show that by incorporating the knowledge-enhanced mechanism, KE-MATD3 significantly improves both learning efficiency and final performance, achieving the highest capture success rate and the lowest collision rate.
Across varying obstacle densities, KE-MATD3 consistently maintained superior performance, demonstrating strong generalization capability. This indicates that the proposed approach can effectively promote cooperative behaviors in cluttered environments and achieve efficient target capture.
Real-World Experiments
The experimental setup consisted of a 6.4 × 11 × 2 m arena, including five Crazyflie 2.1 UAVs, a NOKOV motion capture system, twenty cylindrical obstacles (radius: 20 cm, height: 1 m), and an onboard computer.
Real-World Experiment Results
The NOKOV motion capture system tracked Crazyflie UAVs with high precision and provided real-time position and velocity data, which were transmitted to the onboard computer via ROS.
The real-world experiments demonstrated that the proposed method safely and effectively completed the capture task while enabling emergent cooperative behaviors among the pursuers.
Real-World Experiment Results - Video
The NOKOV motion capture system provided accurate position and velocity data of multiple Crazyflie UAVs, supporting the validation of the proposed algorithm.
Authors
Yihao Sun — Ph.D. student, College of Intelligence Science and Technology, National University of Defense Technology. Research Interests: distributed decision-making for UAV swarms.
Chao Yan — Associate Researcher, College of Automation Engineering, Nanjing University of Aeronautics and Astronautics. Research Interests: deep learning, multi-agent reinforcement learning, UAV swarm cooperative control, and intelligent decision-making.
Han Zhou — Associate Professor, College of Intelligence Science and Technology, National University of Defense Technology. Research Interests: cooperative control of unmanned systems.
Xiaojia Xiang — Professor, College of Intelligence Science and Technology, National University of Defense Technology, Ph.D. supervisor. Research Interests: unmanned systems technology.
Jie Jiang — Academician of the Chinese Academy of Sciences, China Academy of Launch Vehicle Technology, Ph.D. supervisor. Research Interests: navigation, guidance, and control, as well as overall design of launch vehicles.