Andreas Geiger

Publications of Xu Chen

gDNA: Towards Generative Detailed Neural Avatars
X. Chen, T. Jiang, J. Song, J. Yang, M. Black, A. Geiger and O. Hilliges
Conference on Computer Vision and Pattern Recognition (CVPR), 2022
Abstract: To make 3D human avatars widely available, we must be able to generate a variety of 3D virtual humans with varied identities and shapes in arbitrary poses. This task is challenging due to the diversity of clothed body shapes, their complex articulations, and the resulting rich, yet stochastic geometric detail in clothing. Hence, current methods that represent 3D people do not provide a full generative model of people in clothing. In this paper, we propose a novel method that learns to generate detailed 3D shapes of people in a variety of garments with corresponding skinning weights. Specifically, we devise a multi-subject forward skinning module that is learned from only a few posed, un-rigged scans per subject. To capture the stochastic nature of high-frequency details in garments, we leverage an adversarial loss formulation that encourages the model to capture the underlying statistics. We provide empirical evidence that this leads to realistic generation of local details such as wrinkles. We show that our model is able to generate natural human avatars wearing diverse and detailed clothing. Furthermore, we show that our method can be used on the task of fitting human models to raw scans, outperforming the previous state-of-the-art.
Latex Bibtex Citation:
@INPROCEEDINGS{Chen2022CVPR,
  author = {Xu Chen and Tianjian Jiang and Jie Song and Jinlong Yang and Michael Black and Andreas Geiger and Otmar Hilliges},
  title = {gDNA: Towards Generative Detailed Neural Avatars},
  booktitle = {Conference on Computer Vision and Pattern Recognition (CVPR)},
  year = {2022}
}
PINA: Learning a Personalized Implicit Neural Avatar from a Single RGB-D Video Sequence
Z. Dong, C. Guo, J. Song, X. Chen, A. Geiger and O. Hilliges
Conference on Computer Vision and Pattern Recognition (CVPR), 2022
Abstract: We present a novel method to learn Personalized Implicit Neural Avatars (PINA) from a short RGB-D sequence. This allows non-expert users to create a detailed and personalized virtual copy of themselves, which can be animated with realistic clothing deformations. PINA does not require complete scans, nor does it require a prior learned from large datasets of clothed humans. Learning a complete avatar in this setting is challenging, since only few depth observations are available, which are noisy and incomplete (i.e.only partial visibility of the body per frame). We propose a method to learn the shape and non-rigid deformations via a pose-conditioned implicit surface and a deformation field, defined in canonical space. This allows us to fuse all partial observations into a single consistent canonical representation. Fusion is formulated as a global optimization problem over the pose, shape and skinning parameters. The method can learn neural avatars from real noisy RGB-D sequences for a diverse set of people and clothing styles and these avatars can be animated given unseen motion sequences.
Latex Bibtex Citation:
@INPROCEEDINGS{Dong2022CVPR,
  author = {Zijian Dong and Chen Guo and Jie Song and Xu Chen and Andreas Geiger and Otmar Hilliges},
  title = {PINA: Learning a Personalized Implicit Neural Avatar from a Single RGB-D Video Sequence},
  booktitle = {Conference on Computer Vision and Pattern Recognition (CVPR)},
  year = {2022}
}
SNARF: Differentiable Forward Skinning for Animating Non-Rigid Neural Implicit Shapes
X. Chen, Y. Zheng, M. Black, O. Hilliges and A. Geiger
International Conference on Computer Vision (ICCV), 2021
Abstract: Neural implicit surface representations have emerged as a promising paradigm to capture 3D shapes in a continuous and resolution-independent manner. However, adapting them to articulated shapes is non-trivial. Existing approaches learn a backward warp field that maps deformed to canonical points. However, this is problematic since the backward warp field is pose dependent and thus requires large amounts of data to learn. To address this, we introduce SNARF, which combines the advantages of linear blend skinning (LBS) for polygonal meshes with those of neural implicit surfaces by learning a forward deformation field without direct supervision. This deformation field is defined in canonical, pose-independent space, allowing for generalization to unseen poses. Learning the deformation field from posed meshes alone is challenging since the correspondences of deformed points are defined implicitly and may not be unique under changes of topology. We propose a forward skinning model that finds all canonical correspondences of any deformed point using iterative root finding. We derive analytical gradients via implicit differentiation, enabling end-to-end training from 3D meshes with bone transformations. Compared to state-of-the-art neural implicit representations, our approach generalizes better to unseen poses while preserving accuracy. We demonstrate our method in challenging scenarios on (clothed) 3D humans in diverse and unseen poses.
Latex Bibtex Citation:
@INPROCEEDINGS{Chen2021ICCV,
  author = {Xu Chen and Yufeng Zheng and Michael Black and Otmar Hilliges and Andreas Geiger},
  title = {SNARF: Differentiable Forward Skinning for Animating Non-Rigid Neural Implicit Shapes},
  booktitle = {International Conference on Computer Vision (ICCV)},
  year = {2021}
}
Category Level Object Pose Estimation via Neural Analysis-by-Synthesis
X. Chen, Z. Dong, J. Song, A. Geiger and O. Hilliges
European Conference on Computer Vision (ECCV), 2020
Abstract: Many object pose estimation algorithms rely on the analysis-by-synthesis framework which requires explicit representations of individual object instances. In this paper we combine a gradient-based fitting procedure with a parametric neural image synthesis module that is capable of implicitly representing the appearance, shape and pose of entire object categories, thus rendering the need for explicit CAD models per object instance unnecessary. The image synthesis network is designed to efficiently span the pose configuration space so that model capacity can be used to capture the shape and local appearance (i.e., texture) variations jointly. At inference time the synthesized images are compared to the target via an appearance based loss and the error signal is backpropagated through the network to the input parameters. Keeping the network parameters fixed, this allows for iterative optimization of the object pose, shape and appearance in a joint manner and we experimentally show that the method can recover orientation of objects with high accuracy from 2D images alone. When provided with depth measurements, to overcome scale ambiguities, the method can accurately recover the full 6DOF pose successfully.
Latex Bibtex Citation:
@INPROCEEDINGS{Chen2020ECCV,
  author = {Xu Chen and Zijian Dong and Jie Song and Andreas Geiger and Otmar Hilliges},
  title = {Category Level Object Pose Estimation via Neural Analysis-by-Synthesis},
  booktitle = {European Conference on Computer Vision (ECCV)},
  year = {2020}
}


eXTReMe Tracker