English 中文 日本語 Русский
NOKOV Showcases Banner

Capturing Motion,
Crafting Stories

Explore Our Case Studies: Transforming Motion into Masterpieces Across Industries

IEEE T-RO | Flying Co-Stereo: Enabling Long-Range Aerial Dense Mapping via Collaborative Stereo Vision of Dynamic-Baseline

Client
Shanghai Jiao Tong University and Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI)
Capture volume
Application
Flying Co-Stereo system,Multi-View Stereo (MVS)
Objects
UAV
Equipment used

Recently, the team led by Prof. Wei Dong from Shanghai Jiao Tong University, in collaboration with Prof. Xingxing Zuo from Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI), published a paper titled “Flying Co-Stereo: Enabling Long-Range Aerial Dense Mapping via Collaborative Stereo Vision of Dynamic-Baseline” in IEEE Transactions on Robotics. This work presents a flying collaborative stereo vision system in which two UAVs form a wide-baseline configuration to enable long-range dense 3D mapping. The proposed system achieves dense reconstruction at distances of up to 70 meters, with a relative error ranging from 2.3% to 9.7%.

NOKOV motion capture system provides high-precision ground-truth pose data to validate the proposed relative pose estimation algorithm.

Background

For UAVs operating in large-scale unknown environments, long-range perception is essential for safe navigation. Compared with LiDAR systems, stereo cameras offer advantages in terms of cost-effectiveness and lightweight design. However, conventional stereo cameras are constrained by short fixed baselines, which typically limit their perception range to within 20 meters. Existing wide-baseline systems are often too large to be deployed on small UAV platforms. Meanwhile, distributing stereo cameras across two dynamically flying UAVs introduces additional challenges, including dynamically varying baselines and difficulties in cross-view feature association.

System architecture of Flying Co-Stereo within our proposed CDBSM framework

System architecture of Flying Co-Stereo within our proposed CDBSM framework

Contributions

1) A Flying Co-Stereo system is proposed, in which two collaborative UAVs form a wide-baseline, cross-agent stereo vision setup within a unified CDBSM framework, enabling long-range dense mapping in large-scale unknown environments.

2) A DS-VIRE is developed to achieve robust and accurate online estimation of the dynamic inter-UAV baseline in complex outdoor conditions.

3) A hybrid visual feature association strategy is designed, combining cross-agent deep matching with intra-agent feature tracking, to ensure real-time and persistent co-visible feature correspondences under varying viewpoints.

4) A sparse-to-dense depth recovery scheme is proposed, which refines dense monocular depth predictions using exponential fitting of long-range triangulated sparse landmarks for precise metric-scale mapping.

Experimental Validation

1. Dynamic Baseline Estimation

Experiments are conducted to evaluate the accuracy of relative pose estimation between the two UAVs in the Flying Co-Stereo system. The two UAVs autonomously fly synchronized circular trajectories in the East-North-Up (ENU) coordinate frame, with a baseline length of 3 m. The relative pose estimates from the proposed Dual-Spectrum Visual-Inertial-Ranging Estimator (DS-VIRE) are compared against two baseline methods: (1) a visual PnP-based method relying solely on inter-UAV observations, and (2) a VIO differencing method that derives the relative pose by subtracting the individual VIO poses of the two UAVs.

NOKOV motion capture system is employed to provide ground-truth relative poses as the evaluation benchmark.

Experiments for relative pose estimation of Flying Co-Stereo under NOKOV motion capture system

Experiments for relative pose estimation of Flying Co-Stereo under NOKOV motion capture system

Experimental results show that the DS-VIRE achieves a total mean absolute error (MAE) of 0.013 m for relative position estimation, significantly outperforming the visual PnP-based method (0.018 m) and the VIO differencing method (0.024 m). For relative orientation estimation, the MAE of yaw is 0.214°.

In addition, the robustness of dynamic baseline estimation is evaluated through real-world outdoor experiments under challenging conditions, including intense sunlight, complex background clutter, severe infrared noise, and long observation distances. Results demonstrate that the proposed Dual-Spectrum Marker-Based Visual Detection and Tracking (DS-MVDT) algorithm achieves a tracking success rate of over 96% across all scenarios, significantly surpassing the baseline method (YOLOv4-tiny + MOSSE), which ranges between 17% and 70%.

Experiments of DS-MVDT with challenges from intense sunlight, cluttered background, light noises, and remote observation.

Experiments of DS-MVDT with challenges from intense sunlight, cluttered background, light noises, and remote observation.

2. Cross-Camera Feature Association Performance Evaluation

Experiments are conducted to compare the real-time performance of the proposed Guidance-Prediction SuperPoint-SuperGlue (GP-SS) algorithm against three baseline methods: the original SuperPoint-SuperGlue (SS), ORB, and SURF. Results show that GP-SS achieves a feature association frequency of nearly 30 Hz, substantially outperforming the SS baseline (13 Hz).

3. Collaborative Triangulation Accuracy Evaluation of Sparse Landmarks

Experiments are performed to evaluate the number and accuracy of reconstructed landmarks across different depth segments (0–10 m, 10–30 m, 30–50 m, 50–70 m). The results demonstrate that the proposed system maintains effective triangulation capability beyond 30 m, whereas the single-UAV approach fails to triangulate landmarks at such distances.

4. Long-Range Dense Mapping Performance Evaluation

Dense mapping experiments are conducted in multiple real-world and simulated environments. The proposed exponential fitting model is compared against quadratic and linear fitting models, as well as two advanced Multi-View Stereo (MVS) methods, SimpleRecon and MVSAnywhere. Experimental results show that the proposed system achieves dense mapping at distances of up to 70 m with a relative error ranging from 2.3% to 9.7%. Compared to conventional stereo cameras, the system achieves up to a 350% improvement in maximum perception range and up to a 450% increase in coverage area.

Long-range dense reconstruction experiments in outdoor environments and photorealistic simulation

Long-range dense reconstruction experiments in outdoor environments and photorealistic simulation

Corresponding Authors

Wei Dong

Tenured associate professor at the School of Mechanical Engineering, Shanghai Jiao Tong University. His research focuses on multi-robot collaboration and active perception.

Xingxing Zuo

Tenured assistant professor in the Department of Robotics at Mohamed Bin Zayed University of Artificial Intelligence. His research interests include robotics, spatial intelligence, state estimation, and embodied intelligence.

At the upcoming ICRA 2026, Prof. Xingxing Zuo, together with international scholars, will organize the workshop titled “MM-SpatialAI: Multi-Modal Spatial AI for Robust Navigation and Open-World Understanding.”

NOKOV Motion Capture is a sponsor of this workshop. Researchers in related fields are welcome to participate and contribute to advancing multimodal spatial intelligence for robust navigation and open-world understanding.

The workshop homepage is https://xingxingzuo.github.io/MM-SpatialAI/.

workshop

Prev
IEEE RA-L | GeoPF: Infusing Geometry into Potential Fields for Reactive Planning in Non-trivial Environments

NOKOV Motion Capture Basketball Game Demo

UMI Game
2022-03-29

Kung Fu Motion Capture Performance

Shu-Gu Entertainment
2023-02-06

Applications of motion capture systems in wire-driven continuum robot research

Sichuan University
2022-06-17

Horse Motion Capture in Game Development

2022-03-29

By using this site, you agree to our terms, which outline our use of cookies. CLOSE ×

Contact us
We are committed to responding promptly and will connect with you through our local distributors for further assistance.
Engineering Virtual Reality Life Sciences Entertainment
I would like to receive a quote
Beijing NOKOV Science & Technology Co., Ltd (Headquarter)
LocationRoom820, China Minmetals Tower, Chaoyang Dist., Beijing
Emailinfo@nokov.cn
Phone+ 86-10-64922321
Capture Volume*
Objective*
Full Bodies Drones/Robots Others
Quantity
Camera Type
Pluto1.3C Mars1.3H Mars2H Mars4H Underwater Others/I do not know
Camera Count
4 6 8 12 16 20 24 Others/I don't know