Andreas Geiger

Publications of Yiyi Liao

SMD-Nets: Stereo Mixture Density Networks
F. Tosi, Y. Liao, C. Schmitt and A. Geiger
Conference on Computer Vision and Pattern Recognition (CVPR), 2021
Abstract: Despite stereo matching accuracy has greatly improved by deep learning in the last few years, recovering sharp boundaries and high-resolution outputs efficiently remains challenging. In this paper, we propose Stereo Mixture Density Networks (SMD-Nets), a simple yet effective learning framework compatible with a wide class of 2D and 3D architectures which ameliorates both issues. Specifically, we exploit bimodal mixture densities as output representation and show that this allows for sharp and precise disparity estimates near discontinuities while explicitly modeling the aleatoric uncertainty inherent in the observations. Moreover, we formulate disparity estimation as a continuous problem in the image domain, allowing our model to query disparities at arbitrary spatial precision. We carry out comprehensive experiments on a new high-resolution and highly realistic synthetic stereo dataset, consisting of stereo pairs at 8Mpx resolution, as well as on real-world stereo datasets. Our experiments demonstrate increased depth accuracy near object boundaries and prediction of ultra high-resolution disparity maps on standard GPUs. We demonstrate the flexibility of our technique by improving the performance of a variety of stereo backbones.
Latex Bibtex Citation:
  author = {Fabio Tosi and Yiyi Liao and Carolin Schmitt and Andreas Geiger},
  title = {SMD-Nets: Stereo Mixture Density Networks},
  booktitle = {Conference on Computer Vision and Pattern Recognition (CVPR)},
  year = {2021}
Learning Steering Kernels for Guided Depth Completion
L. Liu, Y. Liao, Y. Wang, A. Geiger and Y. Liu
Transactions on Image Processing (TIP), 2021
Abstract: This paper addresses the guided depth completion task in which the goal is to predict a dense depth map given a guidance RGB image and sparse depth measurements. Recent advances on this problem nurture hopes that one day we can acquire accurate and dense depth at a very low cost. A major challenge of guided depth completion is to effectively make use of extremely sparse measurements, eg, measurements covering less than 1% of the image pixels. In this paper, we propose a fully differentiable model that avoids convolving on sparse tensors by jointly learning depth interpolation and refinement. More specifically, we propose a differentiable kernel regression layer that interpolates the sparse depth measurements via learned kernels. We further refine the interpolated depth map using a residual depth refinement layer which leads to improved performance compared to learning absolute depth prediction using a vanilla network. We provide experimental evidence that our differentiable kernel regression layer not only enables end-to-end training from very sparse measurements using standard convolutional network architectures, but also leads to better depth interpolation results compared to existing heuristically motivated methods. We demonstrate that our method outperforms many state-of-the-art guided depth completion techniques on both NYUv2 and KITTI. We further show the generalization ability of our method with respect to the density and spatial statistics of the sparse depth measurements.
Latex Bibtex Citation:
  author = {Lina Liu and Yiyi Liao and Yue Wang and Andreas Geiger and Yong Liu},
  title = {Learning Steering Kernels for Guided Depth Completion},
  journal = {Transactions on Image Processing (TIP)},
  year = {2021}
GRAF: Generative Radiance Fields for 3D-Aware Image Synthesis
K. Schwarz, Y. Liao, M. Niemeyer and A. Geiger
Advances in Neural Information Processing Systems (NeurIPS), 2020
Abstract: While 2D generative adversarial networks have enabled high-resolution image synthesis, they largely lack an understanding of the 3D world and the image formation process. Thus, they do not provide precise control over camera viewpoint or object pose. To address this problem, several recent approaches leverage intermediate voxel-based representations in combination with differentiable rendering. However, existing methods either produce low image resolution or fall short in disentangling camera and scene properties, eg, the object identity may vary with the viewpoint. In this paper, we propose a generative model for radiance fields which have recently proven successful for novel view synthesis of a single scene. In contrast to voxel-based representations, radiance fields are not confined to a coarse discretization of the 3D space, yet allow for disentangling camera and scene properties while degrading gracefully in the presence of reconstruction ambiguity. By introducing a multi-scale patch-based discriminator, we demonstrate synthesis of high-resolution images while training our model from unposed 2D images alone. We systematically analyze our approach on several challenging synthetic and real-world datasets. Our experiments reveal that radiance fields are a powerful representation for generative image synthesis, leading to 3D consistent models that render with high fidelity.
Latex Bibtex Citation:
  author = {Katja Schwarz and Yiyi Liao and Michael Niemeyer and Andreas Geiger},
  title = {GRAF: Generative Radiance Fields for 3D-Aware Image Synthesis},
  booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
  year = {2020}
Towards Unsupervised Learning of Generative Models for 3D Controllable Image Synthesis
Y. Liao, K. Schwarz, L. Mescheder and A. Geiger
Conference on Computer Vision and Pattern Recognition (CVPR), 2020
Abstract: In recent years, Generative Adversarial Networks have achieved impressive results in photorealistic image synthesis. This progress nurtures hopes that one day the classical rendering pipeline can be replaced by efficient models that are learned directly from images. However, current image synthesis models operate in the 2D domain where disentangling 3D properties such as camera viewpoint or object pose is challenging. Furthermore, they lack an interpretable and controllable representation. Our key hypothesis is that the image generation process should be modeled in 3D space as the physical world surrounding us is intrinsically three-dimensional. We define the new task of 3D controllable image synthesis and propose an approach for solving it by reasoning both in 3D space and in the 2D image domain. We demonstrate that our model is able to disentangle latent 3D factors of simple multi-object scenes in an unsupervised fashion from raw images. Compared to pure 2D baselines, it allows for synthesizing scenes that are consistent wrt. changes in viewpoint or object pose. We further evaluate various 3D representations in terms of their usefulness for this challenging task.
Latex Bibtex Citation:
  author = {Yiyi Liao and Katja Schwarz and Lars Mescheder and Andreas Geiger},
  title = {Towards Unsupervised Learning of Generative Models for 3D Controllable Image Synthesis},
  booktitle = {Conference on Computer Vision and Pattern Recognition (CVPR)},
  year = {2020}
Connecting the Dots: Learning Representations for Active Monocular Depth Estimation
G. Riegler, Y. Liao, S. Donne, V. Koltun and A. Geiger
Conference on Computer Vision and Pattern Recognition (CVPR), 2019
Abstract: We propose a technique for depth estimation with a monocular structured-light camera, \ie, a calibrated stereo set-up with one camera and one laser projector. Instead of formulating the depth estimation via a correspondence search problem, we show that a simple convolutional architecture is sufficient for high-quality disparity estimates in this setting. As accurate ground-truth is hard to obtain, we train our model in a self-supervised fashion with a combination of photometric and geometric losses. Further, we demonstrate that the projected pattern of the structured light sensor can be reliably separated from the ambient information. This can then be used to improve depth boundaries in a weakly supervised fashion by modeling the joint statistics of image and depth edges. The model trained in this fashion compares favorably to the state-of-the-art on challenging synthetic and real-world datasets. In addition, we contribute a novel simulator, which allows to benchmark active depth prediction algorithms in controlled conditions.
Latex Bibtex Citation:
  author = {Gernot Riegler and Yiyi Liao and Simon Donne and Vladlen Koltun and Andreas Geiger},
  title = {Connecting the Dots: Learning Representations for Active Monocular Depth Estimation},
  booktitle = {Conference on Computer Vision and Pattern Recognition (CVPR)},
  year = {2019}
On the Integration of Optical Flow and Action Recognition (oral)
L. Sevilla-Lara, Y. Liao, F. Güney, V. Jampani, A. Geiger and M. Black
German Conference on Pattern Recognition (GCPR), 2018
Abstract: Most of the top performing action recognition methods use optical flow as a black box input. Here we take a deeper look at the combination of flow and action recognition, and investigate why optical flow is helpful, what makes a flow method good for action recognition, and how we can make it better. In particular, we investigate the impact of different flow algorithms and input transformations to better understand how these affect a state-of-the-art action recognition method. Furthermore, we fine tune two neural-network flow methods end-to-end on the most widely used action recognition dataset (UCF101). Based on these experiments, we make the following five observations: 1) optical flow is useful for action recognition because it is invariant to appearance, 2) optical flow methods are optimized to minimize end-point-error (EPE), but the EPE of current methods is not well correlated with action recognition performance, 3) for the flow methods tested, accuracy at boundaries and at small displacements is most correlated with action recognition performance, 4) training optical flow to minimize classification error instead of minimizing EPE improves recognition performance, and 5) optical flow learned for the task of action recognition differs from traditional optical flow especially inside the human body and at the boundary of the body. These observations may encourage optical flow researchers to look beyond EPE as a goal and guide action recognition researchers to seek better motion cues, leading to a tighter integration of the optical flow and action recognition communities.
Latex Bibtex Citation:
  author = {Laura Sevilla-Lara and Yiyi Liao and Fatma Güney and Varun Jampani and Andreas Geiger and Michael Black},
  title = {On the Integration of Optical Flow and Action Recognition},
  booktitle = {German Conference on Pattern Recognition (GCPR)},
  year = {2018}
Deep Marching Cubes: Learning Explicit Surface Representations
Y. Liao, S. Donne and A. Geiger
Conference on Computer Vision and Pattern Recognition (CVPR), 2018
Abstract: Existing learning based solutions to 3D surface prediction cannot be trained end-to-end as they operate on intermediate representations (eg, TSDF) from which 3D surface meshes must be extracted in a post-processing step (eg, via the marching cubes algorithm). In this paper, we investigate the problem of end-to-end 3D surface prediction. We first demonstrate that the marching cubes algorithm is not differentiable and propose an alternative differentiable formulation which we insert as a final layer into a 3D convolutional neural network. We further propose a set of loss functions which allow for training our model with sparse point supervision. Our experiments demonstrate that the model allows for predicting sub-voxel accurate 3D shapes of arbitrary topology. Additionally, it learns to complete shapes and to separate an object's inside from its outside even in the presence of sparse and incomplete ground truth. We investigate the benefits of our approach on the task of inferring shapes from 3D point clouds. Our model is flexible and can be combined with a variety of shape encoder and shape inference techniques.
Latex Bibtex Citation:
  author = {Yiyi Liao and Simon Donne and Andreas Geiger},
  title = {Deep Marching Cubes: Learning Explicit Surface Representations},
  booktitle = {Conference on Computer Vision and Pattern Recognition (CVPR)},
  year = {2018}

eXTReMe Tracker