Depth Prediction Evaluation



The depth completion and depth prediction evaluation are related to our work published in Sparsity Invariant CNNs (THREEDV 2017). It
contains over 93 thousand depth maps with corresponding raw LiDaR scans and RGB images, aligned with the "raw data" of the KITTI dataset.
Given the large amount of training data, this dataset shall allow a training of complex deep learning models for the tasks of depth completion
and single image depth prediction. Also, we provide manually selected images with unpublished depth maps to serve as a benchmark for those
two challenging tasks.

The structure of all provided depth maps is aligned with the structure of our raw data to easily find corresponding left and right images,
or other provided information.


Note: On 12.04.2018 we have fixed a small error in the file data_depth_velodyne.zip, please download this file again if you have an old version.


All methods providing less than 100 % density have been interpolated using simple background interpolation as explained in the corresponding header file in the development kit.

    Our evaluation table ranks all methods according to square root of the scale invariant logarithmic error (SILog).
    However, we also provide other metrics:
  • SILog:            Scale invariant logarithmic error [log(m)*100] (for more info click on the formula below)

  • sqErrorRel:    Relative squared error (percent)
  • absErrorRel:  Relative absolute error (percent)
  • iRMSE:           Root mean squared error of the inverse depth [1/km]


Important Policy Update: As more and more non-published work and re-implementations of existing work is submitted to KITTI, we have established a new policy: from now on, only submissions with significant novelty that are leading to a peer-reviewed paper in a conference or journal are allowed. Minor modifications of existing algorithms or student research projects are not allowed. Such work must be evaluated on a split of the training set. To ensure that our policy is adopted, new users must detail their status, describe their work and specify the targeted venue during registration. Furthermore, we will regularly delete all entries that are 6 months old but are still anonymous or do not have a paper associated with them. For conferences, 6 month is enough to determine if a paper has been accepted and to add the bibliography information. For longer review cycles, you need to resubmit your results.
Additional information used by the methods
  • Additional training data: Use of additional data sources for training (see details)

Method Setting Code SILog sqErrorRel absErrorRel iRMSE Runtime Environment
1 UniDepth code 8.13 1.09 6.54 8.24 0.1 s GPU @ 2.5 Ghz (Python)
L. Piccinelli, Y. Yang, C. Sakaridis, M. Segu, S. Li, L. Van Gool and F. Yu: UniDepth: Universal Monocular Metric Depth Estimation. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2024.
2 1PNet 9.46 1.45 7.62 9.75 0.1 s 1 core @ 2.5 Ghz (C/C++)
3 MSFusion 9.59 1.56 7.81 10.32 0.1 s 1 core @ 2.5 Ghz (Python)
4 AssFusionNet 9.62 1.57 7.82 10.33 0.1 s 1 core @ 2.5 Ghz (Python)
5 NDDepth code 9.62 1.59 7.75 10.62 0.1s 1 core @ 2.5 Ghz (C/C++)
S. Shao, Z. Pei, W. Chen, X. Wu and Z. Li: NDDepth: Normal-Distance Assisted Monocular Depth Estimation. International Conference on Computer Vision (ICCV) 2023.
6 IEBins code 9.63 1.60 7.82 10.68 0.1s 1 core @ 2.5 Ghz (C/C++)
S. Shao, Z. Pei, X. Wu, Z. Liu, W. Chen and Z. Li: IEBins: Iterative Elastic Bins for Monocular Depth Estimation. Advances in Neural Information Processing Systems (NeurIPS) 2023.
7 VMDepth 9.69 1.68 7.23 9.60 0.1 s 1 core @ 2.5 Ghz (Python)
8 VA-DepthNet code 9.84 1.66 7.96 10.44 0.1 s 1 core @ 2.5 Ghz (Python)
C. Liu, S. Kumar, S. Gu, R. Timofte and L. Van Gool: VA-DepthNet: A Variational Approach to Single Image Depth Prediction. International Conference on Learning Representations (ICLR) 2023.
9 DiffusionDepth-I code 9.85 1.64 8.06 10.58 0.2 s 1 core @ 2.5 Ghz (C/C++)
Y. Duan, X. Guo and Z. Zhu: Diffusiondepth: Diffusion denoising approach for monocular depth estimation. arXiv preprint arXiv:2303.05021 2023.
10 iDisc code 9.89 1.77 8.11 10.73 0.1 s 1 core @ 2.5 Ghz (C/C++)
L. Piccinelli, C. Sakaridis and F. Yu: iDisc: Internal Discretization for Monocular Depth Estimation. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2023.
11 MG code 9.93 1.68 7.99 10.63 0.1 s 1 core @ 2.5 Ghz (C/C++)
C. Liu, S. Kumar, S. Gu, R. Timofte and L. Van Gool: Single Image Depth Prediction Made Better: A Multivariate Gaussian Take. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023.
12 URCDC-Depth code 10.03 1.74 8.24 10.71 0.1 s 1 core @ 2.5 Ghz (C/C++)
S. Shao, Z. Pei, W. Chen, R. Li, Z. Liu and Z. Li: URCDC-Depth: Uncertainty Rectified Cross-Distillation with CutFlip for Monocular Depth Estimation. IEEE Transactions on Multimedia (TMM) 2023.
13 BinsFormer code 10.14 1.69 8.23 10.90 0.1 s 1 core @ 2.5 Ghz (C/C++)
Z. Li, X. Wang, X. Liu and J. Jiang: BinsFormer: Revisiting Adaptive Bins for Monocular Depth Estimation. arXiv preprint arXiv:2204.00987 2022.
14 TrapNet 10.15 1.66 7.92 10.45 0.1 s 1 core @ 2.5 Ghz (Python)
C. Ning and H. Gan: Trap Attention: Monocular Depth Estimation with Manual Traps. Proceedings of the IEEE/CVF International Conference on Computer Vision and Pattern Recognition 2023.
15 PixelFormer 10.28 1.82 8.16 10.84 0.1 s 1 core @ 2.5 Ghz (Python)
A. Agarwal and C. Arora: Attention Attention Everywhere: Monocular Depth Prediction with Skip Attention. WACV 2023.
16 glformer 10.28 1.73 8.19 11.09 0.05s 1 core @ 2.5 Ghz (C/C++)
17 RED-T 10.36 1.92 8.11 10.82 0.1 s GPU @ 2.5 Ghz (Python)
K. Shim, J. Kim, G. Lee and B. Shim: Depth-Relative Self Attention for Monocular Depth Estimation. 2023.
18 ZDepth 10.36 1.89 8.53 11.23 0.1 s GPU @ 2.5 Ghz (Python)
19 NeWCRFs 10.39 1.83 8.37 11.03 0.1 s 1 core @ 2.5 Ghz (Python)
W. Yuan, X. Gu, Z. Dai, S. Zhu and P. Tan: NeWCRFs: Neural Window Fully-connected CRFs for Monocular Depth Estimation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2022.
20 DepthFormer code 10.69 1.84 8.68 11.39 0.1 s 1 core @ 2.5 Ghz (Python)
Z. Li, Z. Chen, X. Liu and J. Jiang: Depthformer: Exploiting long-range correlation and local information for accurate monocular depth estimation. arXiv preprint arXiv:2203.14211 2022.
21 ViP-DeepLab 10.80 2.19 8.94 11.77 0.1 s GPU @ 2.5 Ghz (Python)
S. Qiao, Y. Zhu, H. Adam, A. Yuille and L. Chen: ViP-DeepLab: Learning Visual Perception with Depth-aware Video Panoptic Segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2021.
22 CoGF-Depth 10.99 2.04 8.82 11.23 1 s 1 core @ 2.5 Ghz (C/C++)
23 SideRT 11.42 2.25 9.28 11.88 0.02 s GPU @ 1.5 Ghz (Python)
C. Shu, Z. Chen, L. Chen, K. Ma, M. Wang and H. Ren: SideRT: A Real-time Pure Transformer Architecture for Single Image Depth Estimation. 2022.
24 PWA 11.45 2.30 9.05 12.32 0.06 s GPU @ 2.5 Ghz (Python)
S. Lee, J. Lee, B. Kim, E. Yi and J. Kim: Patch-Wise Attention Network for Monocular Depth Estimation. Proceedings of the AAAI Conference on Artificial Intelligence 2021.
25 BANet 11.55 2.31 9.34 12.17 0.04 s GPU @ 1.5 Ghz (Python + C/C++)
S. Aich, J. Vianney, M. Islam, M. Kaur and B. Liu: Bidirectional Attention Network for Monocular Depth Estimation. IEEE International Conference on Robotics and Automation (ICRA) 2021.
26 BTS code 11.67 2.21 9.04 12.23 0.06 s GPU @ 2.5 Ghz (Python + C/C++)
J. Lee, M. Han, D. Ko and I. Suh: From Big to Small: Multi-Scale Local Planar Guidance for Monocular Depth Estimation. 2019.
27 DL_61 (DORN) code 11.77 2.23 8.78 12.98 0.5 s GPU @ 2.5 Ghz (Python + C/C++)
H. Fu, M. Gong, C. Wang, K. Batmanghelich and D. Tao: Deep Ordinal Regression Network for Monocular Depth Estimation. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2018.
28 RefinedMPL 11.80 2.31 10.09 13.39 0.05 s GPU @ 2.5 Ghz (Python + C/C++)
J. Vianney, S. Aich and B. Liu: RefinedMPL: Refined Monocular PseudoLiDAR for 3D Object Detection in Autonomous Driving. arXiv preprint arXiv:1911.09712 2019.
29 DLE code 11.81 2.22 9.09 12.49 0.09 s NVIDIA Tesla V100
C. Liu, S. Gu, L. Gool and R. Timofte: Deep Line Encoding for Monocular 3D Object Detection and Depth Prediction. Proceedings of the British Machine Vision Conference (BMVC) 2021.
30 PFANet 11.84 2.46 9.23 12.63 0.1 s GPU @ 2.5 Ghz (Python)
Y. Xu, C. Peng, M. Li, Y. Li and S. Du: Pyramid Feature Attention Network for Monocular Depth Prediction. 2021 IEEE International Conference on Multimedia and Expo (ICME) 2021.
31 GAC code 12.13 2.61 9.41 12.65 0.05 s GPU @ 2.5 Ghz (Python)
Y. Liu, Y. Yuan and M. Liu: Ground-aware Monocular 3D Object Detection for Autonomous Driving. IEEE Robotics and Automation Letters 2021.
32 Cascade Depth 12.19 2.86 10.04 12.35 0.1 s 1 core @ 2.5 Ghz (Python)
33 DL_SORD_SL 12.39 2.49 10.10 13.48 0.8 s GPU @ 2.5 Ghz (Python + C/C++)
R. Diaz and A. Marathe: Soft Labels for Ordinal Regression. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2019.
34 VNL code 12.65 2.46 10.15 13.02 0.5 s 1 core @ 2.5 Ghz (C/C++)
Y. Wei, Y. Liu, C. Shen and Y. Yan: Enforcing geometric constraints of virtual normal for depth prediction. 2019.
35 P3Depth code 12.82 2.53 9.92 13.71 0.1 s GPU @ 2.5 Ghz (Python)
V. Patil, C. Sakaridis, A. Liniger and L. Van Gool: P3Depth: Monocular Depth Estimation with a Piecewise Planarity Prior. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022.
36 MS-DPT code 12.83 3.62 11.01 13.43 0.1 s GPU @ 2.5 Ghz (Python)
J. Song and S. Lee: Knowledge Distillation of Multi-scale Dense Prediction Transformer for Self-supervised Depth Estimation. 2023.
37 DS-SIDENet_ROB 12.86 2.87 10.03 14.40 0.35 s GPU @ 2.5 Ghz (Python)
H. Ren, M. El-Khamy and J. Lee: Deep Robust Single Image Depth Estimation Neural Network Using Scene Understanding. IEEE Conference on Computer Vision and Pattern Recognition Workshop (CVPRW) 2019.
38 DL_SORD_SQ 13.00 2.95 10.38 13.78 0.88 s GPU @ 2.5 Ghz (Python + C/C++)
R. Diaz and A. Marathe: Soft Labels for Ordinal Regression. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2019.
39 PAP 13.08 2.72 10.27 13.95 0.18 s GPU @ 2.5 Ghz (Python + C/C++)
Z. Zhang, Z. Cui, C. Xu, Y. Yan, N. Sebe and J. Yang: Pattern-Affinitive Propagation Across Depth, Surface Normal and Semantic Segmentation. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2019.
40 CADepth-Net code 13.34 3.33 10.67 13.61 0.08 s 1 core @ 2.5 Ghz (Python)
J. Yan, H. Zhao, P. Bu and Y. Jin: Channel-Wise Attention-Based Network for Self-Supervised Monocular Depth Estimation. 2021.
41 VGG16-UNet 13.41 2.86 10.60 15.06 0.16 s GPU @ 2.5 Ghz (Python + C/C++)
X. Guo, H. Li, S. Yi, J. Ren and X. Wang: Learning monocular depth by distilling cross-domain stereo networks. Proceedings of the European Conference on Computer Vision (ECCV) 2018.
42 DORN_ROB 13.53 3.06 10.35 15.96 2 s GPU @ 2.5 Ghz (Python)
H. Fu, M. Gong, C. Wang, K. Batmanghelich and D. Tao: Deep Ordinal Regression Network for Monocular Depth Estimation. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2018.
43 g2s code 14.16 3.65 11.40 15.53 0.04 s GPU @ 1.5 Ghz (Python)
H. Chawla, A. Varma, E. Arani and B. Zonooz: Multimodal Scale Consistency and Awareness for Monocular Self-Supervised Depth Estimation. 2021 IEEE International Conference on Robotics and Automation (ICRA) 2021.
44 MT-SfMLearner 14.25 3.72 12.52 15.83 0.04s GPU @ 1.5 Ghz (Python)
A. Varma., H. Chawla., B. Zonooz. and E. Arani.: Transformers in Self-Supervised Monocular Depth Estimation with Unknown Camera Intrinsics. Proceedings of the 17th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 4: VISAPP, 2022.
45 MLDA-Net 14.42 3.41 11.67 16.12 0.2 s 1 core @ 2.5 Ghz (Python)
X. Song, W. Li, D. Zhou, Y. Dai, J. Fang, H. Li and L. Zhang: MLDA-Net: Multi-Level Dual Attention-Based Network for Self-Supervised Monocular Depth Estimation. IEEE Transactions on Image Processing 2021.
46 DABC_ROB 14.49 4.08 12.72 15.53 0.7 s GPU @ 2.0 Ghz (Matlab)
R. Li, K. Xian, C. Shen, Z. Cao, H. Lu and L. Hang: Deep attention-based classification network for robust depth prediction. Proceedings of the Asian Conference on Computer Vision (ACCV) 2018.
47 BTSREF_RVC code 14.67 3.12 12.42 16.84 0.1 s 1 core @ >3.5 Ghz (Python)
J. Lee, M. Han, D. Ko and I. Suh: From big to small: Multi-scale local planar guidance for monocular depth estimation. arXiv preprint arXiv:1907.10326 2019.
48 SDNet code 14.68 3.90 12.31 15.96 0.2 s GPU @ 2.5 Ghz (C/C++)
M. Ochs, A. Kretz and R. Mester: SDNet: Semantic Guided Depth Estimation Network. German Conference on Pattern Recognition (GCPR) 2019.
49 APMoE_base_ROB code 14.74 3.88 11.74 15.63 0.2 s GPU @ 3.5 Ghz (Matlab), Geforce Titan X
S. Kong and C. Fowlkes: Pixel-wise Attentional Gating for Parsimonious Pixel Labeling. arxiv 1805.01556 2018.
50 DiPE 14.84 4.04 12.28 15.69 0.01 s GPU @ 2.5 Ghz (Python)
H. Jiang, L. Ding, Z. Sun and R. Huang: DiPE: Deeper into Photometric Errors for Unsupervised Learning of Depth and Ego-motion from Monocular Videos. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2020.
51 CSWS_E_ROB 14.85 3.48 11.84 16.38 0.2 s 1 core @ 2.5 Ghz (C/C++), Titian GTX 108
M. Bo Li: Monocular Depth Estimation with Hierarchical Fusion of Dilated CNNs and Soft-Weighted-Sum Inference. 2018.
52 HBC 15.18 3.79 12.33 17.86 0.05 s GPU @ 2.5 Ghz (Python)
H. Jiang and R. Huang: Hierarchical Binary Classification for Monocular Depth Estimation. IEEE International Conference on Robotics and Biomimetics 2019.
53 SGDepth code 15.30 5.00 13.29 15.80 0.1 s GPU @ 2.5 Ghz (Python)
M. Klingner, J. Termöhlen, J. Mikolajczyk and T. Fingscheidt: Self-Supervised Monocular Depth Estimation: Solving the Dynamic Object Problem by Semantic Guidance. ECCV 2020.
54 DHGRL 15.47 4.04 12.52 15.72 0.2 s GPU @ 2.5 Ghz (Python)
Z. Zhang, C. Xu, J. Yang, Y. Tai and L. Chen: Deep hierarchical guidance and regularization learning for end-to-end depth estimation. Pattern Recognition 2018.
55 GCNDepth code 15.54 4.26 12.75 15.99 0.05 s GPU @ 2.5 Ghz (Python)
A. Masoumian, H. Rashwan, S. Abdulwahab, J. Cristiano and D. Puig: GCNDepth: Self-supervised Monocular Depth Estimation based on Graph Convolutional Network. arXiv preprint arXiv:2112.06782 2021.
56 packnSFMHR_RVC code 15.80 4.73 12.28 17.96 0.5 s GPU @ 2.5 Ghz (Python)
V. Guizilini, R. Ambrus, S. Pillai, A. Raventos and A. Gaidon: 3D Packing for Self-Supervised Monocular Depth Estimation. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) .
57 MultiDepth code 16.05 3.89 13.82 18.21 0.01 s GPU @ 1.5 Ghz (Python)
L. Liebel and M. Körner: MultiDepth: Single-Image Depth Estimation via Multi-Task Regression and Classification. IEEE Intelligent Transportation Systems Conference (ITSC) 2019.
58 LSIM 17.92 6.88 14.04 17.62 0.08 s GPU @ 2.5 Ghz (Python)
M. Goldman, T. Hassner and S. Avidan: Learn Stereo, Infer Mono: Siamese Networks for Self-Supervised, Monocular, Depth Estimation. Computer Vision and Pattern Recognition Workshops (CVPRW) 2019.
Table as LaTeX | Only published Methods




Related Datasets

  • SYNTHIA Dataset: SYNTHIA is a collection of photo-realistic frames rendered from a virtual city and comes with precise pixel-level semantic annotations as well as pixel-wise depth information. The dataset consists of +200,000 HD images from video streams and +20,000 HD images from independent snapshots.
  • Middlebury Stereo Evaluation: The classic stereo evaluation benchmark, featuring four test images in version 2 of the benchmark, with very accurate ground truth from a structured light system. 38 image pairs are provided in total.
  • Make3D Range Image Data: Images with small-resolution ground truth used to learn and evaluate depth from single monocular images.
  • Virtual KITTI Dataset: Virtual KITTI contains 50 high-resolution monocular videos (21,260 frames) generated from five different virtual worlds in urban settings under different imaging and weather conditions.
  • Scene Flow Dataset: The Freiburg Scene Flow Dataset collection has been used to train convolutional networks for disparity, optical flow, and scene flow estimation. The collection contains more than 39000 stereo frames in 960x540 pixel resolution, rendered from various synthetic sequences.

Citation

When using this dataset in your research, we will be happy if you cite us:
@inproceedings{Uhrig2017THREEDV,
  author = {Jonas Uhrig and Nick Schneider and Lukas Schneider and Uwe Franke and Thomas Brox and Andreas Geiger},
  title = {Sparsity Invariant CNNs},
  booktitle = {International Conference on 3D Vision (3DV)},
  year = {2017}
}



eXTReMe Tracker