Automated container guided vehicle (AGV) navigation systems that rely on magnetic pin infrastructure are currently facing serious sustainability challenges. The monocular metric depth estimation method is emerging as a promising alternative by aligning the predicted depth with the actual scale, however, its deployment in dynamic port environments (e.g., line of sight occlusion, texture loss) is still limited by precision and reliability tradeoffs. The proposed framework is implemented in three steps. Firstly, the framework initially introduces an innovative MetaFormer architecture, which incorporates a global squeeze block (GSB). This GSB employs a Squeeze Former (SF) to facilitate comprehensive modeling of inter-token relationships across the global image context. Secondly, the bins module, which combines wavelet transform convolution (WBM), is utilised to estimate the metric depth, and the backbone network is employed to estimate the relative depth. Finally, the framework fuses the two depths to achieve a refined metric depth estimation. Extensive evaluation shows that our method achieves approximately 26.3% RMSE performance improvement compared to MiDas, approximately 21.3% performance improvement on AbeRel metrics compared to SOTA model ZoeDepth. Compared to traditional baseline DS-SIDE, our approach achieves approximately two times improvement in depth prediction accuracy across all metrics, while maintaining competitive inference performance.

MFMDepth: MetaFormer-based monocular metric depth estimation for distance measurement in ports / Chen, Xinqiang; Ma, Fei; Wu, Yuzheng; Han, Bing; Luo, Lijuan; Biancardo, Salvatore Antonio. - In: COMPUTERS & INDUSTRIAL ENGINEERING. - ISSN 0360-8352. - 207:(2025). [10.1016/j.cie.2025.111325]

MFMDepth: MetaFormer-based monocular metric depth estimation for distance measurement in ports

Biancardo, Salvatore Antonio
2025

Abstract

Automated container guided vehicle (AGV) navigation systems that rely on magnetic pin infrastructure are currently facing serious sustainability challenges. The monocular metric depth estimation method is emerging as a promising alternative by aligning the predicted depth with the actual scale, however, its deployment in dynamic port environments (e.g., line of sight occlusion, texture loss) is still limited by precision and reliability tradeoffs. The proposed framework is implemented in three steps. Firstly, the framework initially introduces an innovative MetaFormer architecture, which incorporates a global squeeze block (GSB). This GSB employs a Squeeze Former (SF) to facilitate comprehensive modeling of inter-token relationships across the global image context. Secondly, the bins module, which combines wavelet transform convolution (WBM), is utilised to estimate the metric depth, and the backbone network is employed to estimate the relative depth. Finally, the framework fuses the two depths to achieve a refined metric depth estimation. Extensive evaluation shows that our method achieves approximately 26.3% RMSE performance improvement compared to MiDas, approximately 21.3% performance improvement on AbeRel metrics compared to SOTA model ZoeDepth. Compared to traditional baseline DS-SIDE, our approach achieves approximately two times improvement in depth prediction accuracy across all metrics, while maintaining competitive inference performance.
2025
MFMDepth: MetaFormer-based monocular metric depth estimation for distance measurement in ports / Chen, Xinqiang; Ma, Fei; Wu, Yuzheng; Han, Bing; Luo, Lijuan; Biancardo, Salvatore Antonio. - In: COMPUTERS & INDUSTRIAL ENGINEERING. - ISSN 0360-8352. - 207:(2025). [10.1016/j.cie.2025.111325]
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11588/1004635
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 31
  • ???jsp.display-item.citation.isi??? 28
social impact