Andreas Geiger

Publications of Carolin Schmitt

SMD-Nets: Stereo Mixture Density Networks
F. Tosi, Y. Liao, C. Schmitt and A. Geiger
Conference on Computer Vision and Pattern Recognition (CVPR), 2021
Abstract: Despite stereo matching accuracy has greatly improved by deep learning in the last few years, recovering sharp boundaries and high-resolution outputs efficiently remains challenging. In this paper, we propose Stereo Mixture Density Networks (SMD-Nets), a simple yet effective learning framework compatible with a wide class of 2D and 3D architectures which ameliorates both issues. Specifically, we exploit bimodal mixture densities as output representation and show that this allows for sharp and precise disparity estimates near discontinuities while explicitly modeling the aleatoric uncertainty inherent in the observations. Moreover, we formulate disparity estimation as a continuous problem in the image domain, allowing our model to query disparities at arbitrary spatial precision. We carry out comprehensive experiments on a new high-resolution and highly realistic synthetic stereo dataset, consisting of stereo pairs at 8Mpx resolution, as well as on real-world stereo datasets. Our experiments demonstrate increased depth accuracy near object boundaries and prediction of ultra high-resolution disparity maps on standard GPUs. We demonstrate the flexibility of our technique by improving the performance of a variety of stereo backbones.
Latex Bibtex Citation:
  author = {Fabio Tosi and Yiyi Liao and Carolin Schmitt and Andreas Geiger},
  title = {SMD-Nets: Stereo Mixture Density Networks},
  booktitle = {Conference on Computer Vision and Pattern Recognition (CVPR)},
  year = {2021}
On Joint Estimation of Pose, Geometry and svBRDF from a Handheld Scanner
C. Schmitt, S. Donne, G. Riegler, V. Koltun and A. Geiger
Conference on Computer Vision and Pattern Recognition (CVPR), 2020
Abstract: We propose a novel formulation for joint recovery of camera pose, object geometry and spatially-varying BRDF. The input to our approach is a sequence of RGB-D images captured by a mobile, hand-held scanner that actively illuminates the scene with point light sources. Compared to previous works that jointly estimate geometry and materials from a hand-held scanner, we formulate this problem using a single objective function that can be minimized using off-the-shelf gradient-based solvers. By integrating material clustering as a differentiable operation into the optimization process, we avoid pre-processing heuristics and demonstrate that our model is able to determine the correct number of specular materials independently. We provide a study on the importance of each component in our formulation and on the requirements of the initial geometry. We show that optimizing over the poses is crucial for accurately recovering fine details and that our approach naturally results in a semantically meaningful material segmentation.
Latex Bibtex Citation:
  author = {Carolin Schmitt and Simon Donne and Gernot Riegler and Vladlen Koltun and Andreas Geiger},
  title = {On Joint Estimation of Pose, Geometry and svBRDF from a Handheld Scanner},
  booktitle = {Conference on Computer Vision and Pattern Recognition (CVPR)},
  year = {2020}
RayNet: Learning Volumetric 3D Reconstruction with Ray Potentials (spotlight)
D. Paschalidou, A. Ulusoy, C. Schmitt, L. Gool and A. Geiger
Conference on Computer Vision and Pattern Recognition (CVPR), 2018
Abstract: In this paper, we consider the problem of reconstructing a dense 3D model using images captured from different views. Recent methods based on convolutional neural networks (CNN) allow learning the entire task from data. However, they do not incorporate the physics of image formation such as perspective geometry and occlusion. Instead, classical approaches based on Markov Random Fields (MRF) with ray-potentials explicitly model these physical processes, but they cannot cope with large surface appearance variations across different viewpoints. In this paper, we propose RayNet, which combines the strengths of both frameworks. RayNet integrates a CNN that learns view-invariant feature representations with an MRF that explicitly encodes the physics of perspective projection and occlusion. We train RayNet end-to-end using empirical risk minimization. We thoroughly evaluate our approach on challenging real-world datasets and demonstrate its benefits over a piece-wise trained baseline, hand-crafted models as well as other learning-based approaches.
Latex Bibtex Citation:
  author = {Despoina Paschalidou and Ali Osman Ulusoy and Carolin Schmitt and Luc Gool and Andreas Geiger},
  title = {RayNet: Learning Volumetric 3D Reconstruction with Ray Potentials},
  booktitle = {Conference on Computer Vision and Pattern Recognition (CVPR)},
  year = {2018}

eXTReMe Tracker