Method

A Real-time Pure Transformer for Single Image Depth Estimation [SideRT]


Submitted on 4 Jan. 2022 13:24 by
chen ziming (Jilin University)

Running time:0.02 s
Environment:GPU @ 1.5 Ghz (Python)

Method Description:
We propose a pure transformer architecture which
is able to achieve state-of-the-art
performance in real-time. Swin transformers are
used as the encoder, the decoder is built on a
novel attention mechanism named cross-scale
attention (CSA).
Parameters:
learning rate=0.0001
Latex Bibtex:
@misc{https://doi.org/10.48550/arxiv.2204.13892,
doi = {10.48550/ARXIV.2204.13892},

url = {https://arxiv.org/abs/2204.13892},

author = {Shu, Chang and Chen, Ziming and Chen,
Lei and Ma, Kuan and Wang, Minghui and Ren,
Haibing},

keywords = {Computer Vision and Pattern
Recognition (cs.CV), FOS: Computer and information
sciences, FOS: Computer and information sciences},

title = {SideRT: A Real-time Pure Transformer
Architecture for Single Image Depth Estimation},

publisher = {arXiv},

year = {2022},

copyright = {arXiv.org perpetual, non-exclusive
license}
}

Detailed Results

This page provides detailed results for the method(s) selected. For the first 20 test images, the percentage of erroneous pixels is depicted in the table. We use the error metric described in Sparsity Invariant CNNs (THREEDV 2017), which considers a pixel to be correctly estimated if the disparity or flow end-point error is <3px or <5% (for scene flow this criterion needs to be fulfilled for both disparity maps and the flow map). Underneath, the left input image, the estimated results and the error maps are shown (for disp_0/disp_1/flow/scene_flow, respectively). The error map uses the log-color scale described in Sparsity Invariant CNNs (THREEDV 2017), depicting correct estimates (<3px or <5% error) in blue and wrong estimates in red color tones. Dark regions in the error images denote the occluded pixels which fall outside the image boundaries. The false color maps of the results are scaled to the largest ground truth disparity values / flow magnitudes.

Test Set Average

SILog sqErrorRel absErrorRel iRMSE
Error 11.42 2.25 9.28 11.88
This table as LaTeX

Test Image 0

SILog sqErrorRel absErrorRel iRMSE
Error 7.78 1.24 7.06 6.86
This table as LaTeX

Input Image

D1 Result

D1 Error


Test Image 1

SILog sqErrorRel absErrorRel iRMSE
Error 14.69 3.57 9.89 19.65
This table as LaTeX

Input Image

D1 Result

D1 Error


Test Image 2

SILog sqErrorRel absErrorRel iRMSE
Error 19.91 3.24 14.30 32.25
This table as LaTeX

Input Image

D1 Result

D1 Error


Test Image 3

SILog sqErrorRel absErrorRel iRMSE
Error 9.06 1.73 9.74 14.55
This table as LaTeX

Input Image

D1 Result

D1 Error


Test Image 4

SILog sqErrorRel absErrorRel iRMSE
Error 15.20 2.80 10.91 19.98
This table as LaTeX

Input Image

D1 Result

D1 Error


Test Image 5

SILog sqErrorRel absErrorRel iRMSE
Error 13.79 2.57 12.02 18.06
This table as LaTeX

Input Image

D1 Result

D1 Error


Test Image 6

SILog sqErrorRel absErrorRel iRMSE
Error 9.91 1.04 7.03 8.80
This table as LaTeX

Input Image

D1 Result

D1 Error


Test Image 7

SILog sqErrorRel absErrorRel iRMSE
Error 8.04 1.53 9.25 13.12
This table as LaTeX

Input Image

D1 Result

D1 Error


Test Image 8

SILog sqErrorRel absErrorRel iRMSE
Error 14.32 3.84 14.57 16.23
This table as LaTeX

Input Image

D1 Result

D1 Error


Test Image 9

SILog sqErrorRel absErrorRel iRMSE
Error 16.48 4.35 12.38 15.49
This table as LaTeX

Input Image

D1 Result

D1 Error


Test Image 10

SILog sqErrorRel absErrorRel iRMSE
Error 4.81 0.49 5.84 6.45
This table as LaTeX

Input Image

D1 Result

D1 Error


Test Image 11

SILog sqErrorRel absErrorRel iRMSE
Error 16.68 4.08 16.52 17.08
This table as LaTeX

Input Image

D1 Result

D1 Error


Test Image 12

SILog sqErrorRel absErrorRel iRMSE
Error 5.99 0.86 7.12 5.89
This table as LaTeX

Input Image

D1 Result

D1 Error


Test Image 13

SILog sqErrorRel absErrorRel iRMSE
Error 8.17 0.92 5.35 5.43
This table as LaTeX

Input Image

D1 Result

D1 Error


Test Image 14

SILog sqErrorRel absErrorRel iRMSE
Error 6.91 0.47 5.03 6.07
This table as LaTeX

Input Image

D1 Result

D1 Error


Test Image 15

SILog sqErrorRel absErrorRel iRMSE
Error 8.56 0.91 7.34 16.65
This table as LaTeX

Input Image

D1 Result

D1 Error


Test Image 16

SILog sqErrorRel absErrorRel iRMSE
Error 12.51 2.76 7.57 9.39
This table as LaTeX

Input Image

D1 Result

D1 Error


Test Image 17

SILog sqErrorRel absErrorRel iRMSE
Error 14.70 2.16 10.03 20.50
This table as LaTeX

Input Image

D1 Result

D1 Error


Test Image 18

SILog sqErrorRel absErrorRel iRMSE
Error 22.52 7.61 22.24 24.94
This table as LaTeX

Input Image

D1 Result

D1 Error


Test Image 19

SILog sqErrorRel absErrorRel iRMSE
Error 18.90 4.41 16.76 27.44
This table as LaTeX

Input Image

D1 Result

D1 Error




eXTReMe Tracker