Abstract: A research team from Harbin Institute of Technology, Shenzhen (HITSZ) has proposed a novel underwater SLAM system, achieving robust and accurate 6-Degree-of-Freedom (6-DoF) localization.
The research team from Harbin Institute of Technology, Shenzhen, has published a paper titled "RUSSO: Robust Underwater SLAM with Sonar Optimization against Visual Degradation" in the IEEE/ASME Transactions on Mechatronics (TMECH), which has also been accepted by IROS 2025.
Addressing the challenge of visual degradation in underwater environments, this paper presents, for the first time, a robust underwater SLAM system named RUSSO that seamlessly fuses a stereo camera, an IMU, and an imaging sonar for robust and accurate 6-DoF state estimation. In the indoor experiments, NOKOV motion capture system (underwater) provided highly accurate ground truth poses for the underwater robot, enabling the evaluation of the localization accuracy and robustness of the RUSSO system.

Citation
S. Pan, Z. Hong, Z. Hu, X. Xu, W. Lu and L. Hu, "RUSSO: Robust Underwater SLAM With Sonar Optimization Against Visual Degradation," in IEEE/ASME Transactions on Mechatronics, vol. 30, no. 6, pp. 5456-5467, Dec. 2025, doi: 10.1109/TMECH.2025.3550730.
Background
The underwater environment presents unique challenges to SLAM systems that are rarely encountered on land or in the air, such as the unavailability of GPS, rapid illumination variations due to light attenuation, and the lack of structures and features in open water. To address these challenges, multimodal sensor fusion strategies are widely adopted in existing underwater SLAM methods.
Contributions
1. To the best of the authors' knowledge, this work is the first SLAM research to fuse imaging sonar with a stereo camera and IMU for underwater applications.
2. A novel IMU propagation optimization method is proposed, which utilizes sonar pose estimates to provide a good prior during visual degradation, thereby enhancing the accuracy of IMU propagation and reducing localization drift.
3. To address initialization failure in challenging visual environments, a robust SLAM initialization method is proposed, which directly utilizes the relative pose estimated between two consecutive frames from the imaging sonar.
4. Extensive experiments, ranging from underwater simulators to real laboratory pools and open-sea environments, validate the robustness and accuracy of the proposed RUSSO system in visually degraded conditions.
Methodology
Building upon a visual-inertial odometry (VIO) framework, the proposed RUSSO system integrates an imaging sonar. A novel IMU propagation optimization method is introduced: when visual degradation leads to deteriorating pose estimates, sonar pose estimates are utilized to provide a good prior for IMU propagation, thereby improving reliability. Furthermore, when visual features are insufficient during SLAM initialization, this work leverages the imaging sonar to accomplish the initialization.

Indoor Experiments

In the experiments, RUSSO is compared with the state-of-the-art underwater SLAM algorithm SVIn2 (using only camera and IMU) and the VIO algorithm VINS-Fusion.
NOKOV motion capture system (underwater) provided highly accurate ground truth poses for the underwater robot in the following indoor experiments, facilitating the performance comparison of the algorithms.
1. Algorithm Comparison: In underwater experiments involving visual degradation, RUSSO demonstrated the highest localization accuracy and map consistency across all sequences. It remained stable even during periods of severe visual degradation, significantly outperforming SVIn2 and VINS-Fusion.
2. Initialization Validation: In scenarios lacking visual features, RUSSO successfully initialized using sonar assistance and maintained high accuracy, whereas SVIn2 failed to initialize and VINS-Fusion exhibited significant drift due to inaccurate initial values.
3. IMU Propagation Optimization: By incorporating sonar pose estimates as a prior during visual degradation, RUSSO effectively reduced IMU propagation error and enhanced the stability of state estimation under degraded conditions. Even after visual information was removed, the system maintained basic localization capability, though errors increased for some degrees of freedom.
Conclusions
This study conducted extensive experiments in simulation environments, laboratory pools, and shallow sea areas. The results indicate that RUSSO outperforms the other two advanced visual-inertial SLAM algorithms across all experimental scenarios. The proposed method effectively improves the yaw angle estimation accuracy for underwater tasks at near-constant depths, such as underwater surveying and mapping.
NOKOV motion capture system (underwater) provided highly accurate ground truth poses for the underwater robot in this study, enabling the quantitative evaluation of the localization accuracy and robustness of the RUSSO system.