## Method

A Self-Supervised Permutation Approach to the Stereo Matching Problem [Permutation SM]
Submitted on 31 Jul. 2021 04:17 by
 Running time: 30 s Environment: GPU @ 2.5 Ghz (Matlab)

 Stereo matching is an important task in 3D computer vision. In order to ensure its robustness and usefulness, stereo should be generalizable and trainable without ground truth. To achieve this, we propose a permutation-based self-supervised approach which is dataset agnostic. Instead of relying on the usual stereo disparity map, this method relies on a permutation volume, which provides a natural model for occlusions, as well as disparity smoothness and sharpness across depth discontinuities. The stereo matching network, despite its very small size of 0.148M parameters, performs well on out-of- distribution inference tasks, regardless of the training dataset. Experiments are presented for KITTI and SINTEL datasets. We consider that this permutation method, with its simple architecture, is an important step toward truly general self-supervised deep stereo matching. Parameters: The implementation is made using Mathematica 12.2 and their neural network framework. The $\lambda$ and $\gamma$ are set to $10$ and $0.05$ for $10$ rounds when they are then changed to $20$ and $0.1$ for $5$ additional rounds. The parameter $\gamma$ is further incremented by $10$ every $5$ rounds until $50$. Networks are therefore all trained for $30$ rounds with the Adam Optimizer and a learning rate of $1\times10^{-3}$. The $\alpha$ is set to $0.85$ as is custom and the $\tau$ is set to $0.8$. The model has $0.148$M trainable parameters and was trained on a Amazon EC2 p3.8xlarge instance for approximately $3$ hours.

## Detailed Results

This page provides detailed results for the method(s) selected. For the first 20 test images, the percentage of erroneous pixels is depicted in the table. We use the error metric described in Object Scene Flow for Autonomous Vehicles (CVPR 2015), which considers a pixel to be correctly estimated if the disparity or flow end-point error is <3px or <5% (for scene flow this criterion needs to be fulfilled for both disparity maps and the flow map). Underneath, the left input image, the estimated results and the error maps are shown (for disp_0/disp_1/flow/scene_flow, respectively). The error map uses the log-color scale described in Object Scene Flow for Autonomous Vehicles (CVPR 2015), depicting correct estimates (<3px or <5% error) in blue and wrong estimates in red color tones. Dark regions in the error images denote the occluded pixels which fall outside the image boundaries. The false color maps of the results are scaled to the largest ground truth disparity values / flow magnitudes.

## Test Set Average

 Error D1-bg D1-fg D1-all All / All 7.97 17.04 9.48 All / Est 7.82 16.98 9.34 Noc / All 7.12 15.37 8.48 Noc / Est 6.99 15.35 8.37
## Test Image 0

 Error D1-bg D1-fg D1-all All / All 11.12 2.45 9.93 All / Est 11.01 2.45 9.83 Noc / All 10.37 2.45 9.26 Noc / Est 10.25 2.45 9.16
## Test Image 1

 Error D1-bg D1-fg D1-all All / All 6.01 6.83 6.10 All / Est 5.88 6.83 5.98 Noc / All 5.26 6.83 5.44 Noc / Est 5.12 6.83 5.31
## Test Image 2

 Error D1-bg D1-fg D1-all All / All 6.92 9.29 7.03 All / Est 6.81 9.29 6.93 Noc / All 5.95 9.29 6.11 Noc / Est 5.84 9.29 6.01
## Test Image 3

 Error D1-bg D1-fg D1-all All / All 9.50 13.24 9.84 All / Est 9.48 13.21 9.82 Noc / All 8.89 13.24 9.29 Noc / Est 8.87 13.21 9.28
## Test Image 4

 Error D1-bg D1-fg D1-all All / All 9.81 7.64 9.45 All / Est 9.78 7.64 9.42 Noc / All 8.73 7.64 8.55 Noc / Est 8.73 7.64 8.54
## Test Image 5

 Error D1-bg D1-fg D1-all All / All 16.24 10.67 15.74 All / Est 16.20 10.62 15.70 Noc / All 14.64 10.67 14.27 Noc / Est 14.62 10.62 14.25
## Test Image 6

 Error D1-bg D1-fg D1-all All / All 18.16 8.10 17.10 All / Est 17.88 8.07 16.84 Noc / All 17.30 8.10 16.31 Noc / Est 17.10 8.07 16.13
## Test Image 7

 Error D1-bg D1-fg D1-all All / All 5.37 13.03 6.87 All / Est 5.36 13.03 6.86 Noc / All 4.82 13.03 6.45 Noc / Est 4.81 13.03 6.44
## Test Image 8

 Error D1-bg D1-fg D1-all All / All 6.08 3.06 5.52 All / Est 6.03 3.06 5.49 Noc / All 6.06 3.06 5.51 Noc / Est 6.02 3.06 5.47
## Test Image 9

 Error D1-bg D1-fg D1-all All / All 6.58 4.25 5.99 All / Est 6.57 4.25 5.98 Noc / All 6.08 4.45 5.67 Noc / Est 6.07 4.45 5.66
## Test Image 10

 Error D1-bg D1-fg D1-all All / All 7.26 6.65 7.12 All / Est 7.17 6.65 7.05 Noc / All 6.71 6.65 6.70 Noc / Est 6.62 6.65 6.63
## Test Image 11

 Error D1-bg D1-fg D1-all All / All 4.27 1.61 3.79 All / Est 4.25 1.61 3.78 Noc / All 3.79 1.61 3.40 Noc / Est 3.77 1.61 3.38
## Test Image 12

 Error D1-bg D1-fg D1-all All / All 2.97 2.78 2.96 All / Est 2.96 2.78 2.95 Noc / All 2.59 2.78 2.60 Noc / Est 2.58 2.78 2.59
## Test Image 13

 Error D1-bg D1-fg D1-all All / All 3.47 3.89 3.52 All / Est 3.44 3.89 3.50 Noc / All 3.04 3.89 3.15 Noc / Est 3.01 3.89 3.12
## Test Image 14

 Error D1-bg D1-fg D1-all All / All 2.48 9.78 2.61 All / Est 2.48 9.78 2.61 Noc / All 2.03 9.78 2.17 Noc / Est 2.03 9.78 2.17
## Test Image 15

 Error D1-bg D1-fg D1-all All / All 6.67 6.63 6.67 All / Est 6.66 6.49 6.64 Noc / All 5.94 6.63 6.01 Noc / Est 5.93 6.49 5.99
## Test Image 16

 Error D1-bg D1-fg D1-all All / All 8.04 7.43 7.95 All / Est 7.82 7.27 7.74 Noc / All 7.49 7.43 7.48 Noc / Est 7.26 7.27 7.26
## Test Image 17

 Error D1-bg D1-fg D1-all All / All 4.66 5.10 4.71 All / Est 4.63 5.10 4.68 Noc / All 3.68 5.10 3.83 Noc / Est 3.68 5.10 3.83
## Test Image 18

 Error D1-bg D1-fg D1-all All / All 9.06 16.61 12.65 All / Est 9.02 16.61 12.62 Noc / All 8.35 16.61 12.31 Noc / Est 8.31 16.61 12.29
## Test Image 19

 Error D1-bg D1-fg D1-all All / All 3.44 5.09 3.63 All / Est 3.41 5.09 3.60 Noc / All 2.69 5.09 2.96 Noc / Est 2.66 5.09 2.94
