A Hybrid UNet with Attention and a Perceptual Loss Function for Monocular Depth Estimation


Turkmen H., Akgun D.

Mathematics, cilt.13, sa.16, 2025 (SCI-Expanded, Scopus) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 13 Sayı: 16
  • Basım Tarihi: 2025
  • Doi Numarası: 10.3390/math13162567
  • Dergi Adı: Mathematics
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, zbMATH, Directory of Open Access Journals
  • Anahtar Kelimeler: autonomous driving, boundary-aware depth consistency loss, hybrid UNet model, monocular depth estimation, transformer attention
  • İstanbul Gelişim Üniversitesi Adresli: Hayır

Özet

Monocular depth estimation is a crucial technique in computer vision that determines the depth or distance of objects in a scene using a single 2D image captured by a camera. UNet-based models are a fundamental architecture for monocular depth estimation, due to their effective encoder–decoder structure. This study presents an effective depth estimation model based on a hybrid UNet architecture that incorporates ensemble features. The new model integrates Transformer-based attention blocks to capture global context and an encoder built on ResNet18 to extract spatial features. Additionally, a novel Boundary-Aware Depth Consistency Loss (BADCL) function has been introduced to enhance accuracy. This function features dynamic scaling, smoothness regularization, and boundary-aware weighting, which provides sharper edges, smoother depth transitions, and scale-consistent predictions. The proposed model has been evaluated on the NYU Depth V2 dataset, achieving a Structural Similarity Index Measure (SSIM) of 99.8%. The performance of the proposed model indicates increased depth accuracy compared to state-of-the-art methods.