Method

MT-SfMLearner [MT-SfMLearner]


Submitted on 12 Oct. 2021 11:09 by
Hemang Chawla (Navinfo Europe)

Running time:0.04s
Environment:GPU @ 1.5 Ghz (Python)

Method Description:
The advent of autonomous driving and advanced driver assistance systems necessitates continuous developments in computer vision for 3D scene understanding. Self-supervised monocular depth estimation, a method for pixel-wise distance estimation of objects from a single camera without the use of ground truth labels, is an important task in 3D scene understanding. However, existing methods for this task are limited to convolutional neural network (CNN) architectures. In contrast with CNNs that use localized linear operations and lose feature resolution across the layers, vision transformers process at constant resolution with a global receptive field at every stage. While recent works have compared transformers against their CNN counterparts for tasks such as image classification, no study exists that investigates the impact of using transformers
for self-supervised monocular depth estimation. Here, we first demonstrate how to adapt vision transformers for self-supervised monocula
Parameters:
See paper for details
Latex Bibtex:
@conference{mtsfmlearner,
author={Arnav Varma. and Hemang Chawla. and Bahram Zonooz. and Elahe Arani.},
title={Transformers in Self-Supervised Monocular Depth Estimation with Unknown Camera Intrinsics},
booktitle={Proceedings of the 17th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 4: VISAPP,},
year={2022},
pages={758-769},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010884000003124},
isbn={978-989-758-555-5},
}

Detailed Results

This page provides detailed results for the method(s) selected. For the first 20 test images, the percentage of erroneous pixels is depicted in the table. We use the error metric described in Sparsity Invariant CNNs (THREEDV 2017), which considers a pixel to be correctly estimated if the disparity or flow end-point error is <3px or <5% (for scene flow this criterion needs to be fulfilled for both disparity maps and the flow map). Underneath, the left input image, the estimated results and the error maps are shown (for disp_0/disp_1/flow/scene_flow, respectively). The error map uses the log-color scale described in Sparsity Invariant CNNs (THREEDV 2017), depicting correct estimates (<3px or <5% error) in blue and wrong estimates in red color tones. Dark regions in the error images denote the occluded pixels which fall outside the image boundaries. The false color maps of the results are scaled to the largest ground truth disparity values / flow magnitudes.

Test Set Average

SILog sqErrorRel absErrorRel iRMSE
Error 14.25 3.72 12.52 15.83
This table as LaTeX

Test Image 0

SILog sqErrorRel absErrorRel iRMSE
Error 7.09 1.36 8.70 10.15
This table as LaTeX

Input Image

D1 Result

D1 Error


Test Image 1

SILog sqErrorRel absErrorRel iRMSE
Error 19.26 5.99 13.65 25.04
This table as LaTeX

Input Image

D1 Result

D1 Error


Test Image 2

SILog sqErrorRel absErrorRel iRMSE
Error 20.23 3.45 15.21 31.54
This table as LaTeX

Input Image

D1 Result

D1 Error


Test Image 3

SILog sqErrorRel absErrorRel iRMSE
Error 9.80 1.69 8.62 13.33
This table as LaTeX

Input Image

D1 Result

D1 Error


Test Image 4

SILog sqErrorRel absErrorRel iRMSE
Error 19.71 4.57 16.40 23.95
This table as LaTeX

Input Image

D1 Result

D1 Error


Test Image 5

SILog sqErrorRel absErrorRel iRMSE
Error 17.59 3.11 14.66 21.43
This table as LaTeX

Input Image

D1 Result

D1 Error


Test Image 6

SILog sqErrorRel absErrorRel iRMSE
Error 13.95 2.98 11.28 14.62
This table as LaTeX

Input Image

D1 Result

D1 Error


Test Image 7

SILog sqErrorRel absErrorRel iRMSE
Error 11.22 1.90 8.59 12.92
This table as LaTeX

Input Image

D1 Result

D1 Error


Test Image 8

SILog sqErrorRel absErrorRel iRMSE
Error 17.45 4.86 17.03 19.61
This table as LaTeX

Input Image

D1 Result

D1 Error


Test Image 9

SILog sqErrorRel absErrorRel iRMSE
Error 22.14 8.66 16.59 19.02
This table as LaTeX

Input Image

D1 Result

D1 Error


Test Image 10

SILog sqErrorRel absErrorRel iRMSE
Error 8.42 2.97 14.66 12.01
This table as LaTeX

Input Image

D1 Result

D1 Error


Test Image 11

SILog sqErrorRel absErrorRel iRMSE
Error 19.57 3.97 13.91 18.65
This table as LaTeX

Input Image

D1 Result

D1 Error


Test Image 12

SILog sqErrorRel absErrorRel iRMSE
Error 10.92 2.91 13.68 9.75
This table as LaTeX

Input Image

D1 Result

D1 Error


Test Image 13

SILog sqErrorRel absErrorRel iRMSE
Error 12.53 3.93 7.59 8.02
This table as LaTeX

Input Image

D1 Result

D1 Error


Test Image 14

SILog sqErrorRel absErrorRel iRMSE
Error 11.74 3.23 14.35 14.92
This table as LaTeX

Input Image

D1 Result

D1 Error


Test Image 15

SILog sqErrorRel absErrorRel iRMSE
Error 9.33 3.09 14.05 19.36
This table as LaTeX

Input Image

D1 Result

D1 Error


Test Image 16

SILog sqErrorRel absErrorRel iRMSE
Error 13.13 2.92 9.82 11.28
This table as LaTeX

Input Image

D1 Result

D1 Error


Test Image 17

SILog sqErrorRel absErrorRel iRMSE
Error 19.37 5.93 13.16 28.59
This table as LaTeX

Input Image

D1 Result

D1 Error


Test Image 18

SILog sqErrorRel absErrorRel iRMSE
Error 30.03 8.33 22.88 35.51
This table as LaTeX

Input Image

D1 Result

D1 Error


Test Image 19

SILog sqErrorRel absErrorRel iRMSE
Error 18.94 4.34 16.37 26.52
This table as LaTeX

Input Image

D1 Result

D1 Error




eXTReMe Tracker