Download PDFOpen PDF in browser

MonoViM: Enhancing Self-Supervised Monocular Depth Estimation via Mamba

EasyChair Preprint 14171

15 pagesDate: July 25, 2024

Abstract

In recent years, self-supervised monocular depth estimation has been widely applied in fields such as autonomous driving and robotics. While Convolutional Neural Networks (CNNs) and Transformers are predominant in this area, they face challenges with efficiently handling long-term dependencies and reducing computational complexity. To address this problem, we propose MonoViM, the first model integrating the Mamba to enhance the efficiency of self-supervised monocular depth estimation. Inspired by recent advancements in State Space Models (SSM), MonoViM integrates the SSM-based Mamba architecture into its encoder stage and employs a 2D selective scanning mechanism. This ensures that each image block acquires contextual knowledge through a compressed hidden state while maintaining a larger receptive field and reducing computational complexity from quadratic to linear. Comprehensive evaluations on the KITTI dataset, with fine-tuning and zero-shot on Cityscapes and Make3D, show that MonoViM outperforms current CNN-based and Transformer-based methods, achieving state-of-the-art performance and excellent generalization. Additionally, MonoViM demonstrates stronger ability in inference speed and GPU utilization than Transformer-based methods, particularly with high-resolution inputs. The code is available at https://github.com/aifeixingdelv/MonoViM

Keyphrases: Monocular Depth Estimation, self-supervised learning, state-space models

BibTeX entry
BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:
@booklet{EasyChair:14171,
  author    = {Qiang Gao and Gang Peng and Zeyuan Chen and Bingchuan Yang},
  title     = {MonoViM: Enhancing Self-Supervised Monocular Depth Estimation via Mamba},
  howpublished = {EasyChair Preprint 14171},
  year      = {EasyChair, 2024}}
Download PDFOpen PDF in browser