DeepDCT-VO: 3D directional coordinate transformation for low-complexity monocular visual odometry using deep learning

Şimşek, Eyup; Özyer, Barış

doi:10.1016/j.imavis.2025.105742

DeepDCT-VO: 3D directional coordinate transformation for low-complexity monocular visual odometry using deep learning

Şimşek E., Özyer B.

IMAGE AND VISION COMPUTING, cilt.163, 2025 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 163
Basım Tarihi: 2025
Doi Numarası: 10.1016/j.imavis.2025.105742
Dergi Adı: IMAGE AND VISION COMPUTING
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, Applied Science & Technology Source, Biotechnology Research Abstracts, Compendex, Computer & Applied Sciences, INSPEC
Atatürk Üniversitesi Adresli: Evet

Özet

Deep learning-based monocular visual odometry has gained importance in robotics and autonomous navigation due to its robustness in visually challenging environments and minimal sensor requirements. However, many existing deep learning-based MVO methods suffer from high computational costs and large model sizes, making them less suitable for real-time applications in resource-limited systems. In this study, we propose DeepDCT-VO, a lightweight visual odometry method that combines three-dimensional directional coordinate transformation with a compact deep learning architecture. Unlike traditional approaches that estimate translation in a global coordinate system and are prone to drift accumulation, DeepDCT-VO uses local directional motion derived from composite rotations. This approach avoids global trajectory reconstruction, thereby improving the method's stability and reliability. The proposed model operates on input images at multiple resolutions (120 x 120, 240 x 240, 360 x 360, and 480 x 480), leveraging attention-guided residual learning to extract robust features. Additionally, it incorporates multi-modal information-specifically depth and semantic maps-to further improve the accuracy of pose estimation. Evaluations on the KITTI odometry benchmark demonstrate that DeepDCT-VO achieves competitive trajectory estimation accuracy while maintaining real-time performance-8 ms per frame on GPU and 12 ms on CPU. Compared to the existing method with the lowest translational drift (trel), DeepDCT-VO reduces model size by approximately 96.3% (from 37.5 million to 1.4 million parameters). Conversely, when compared to the lightest model in terms of parameter count, DeepDCT-VO reduces trel from 8.57% to 1.69%, achieving an 80.3% reduction in translational drift. These results underscore the effectiveness of DeepDCT-VO in delivering accurate and efficient monocular visual odometry, particularly suited for embedded and resource-limited applications, while the proposed transformation method offers an auxiliary function in reducing translational complexity.