Publications

Publications of Xu Chen

AG3D: Learning to Generate 3D Avatars from 2D Image Collections
Z. Dong, X. Chen, J. Yang, M. Black, O. Hilliges and A. Geiger
International Conference on Computer Vision (ICCV), 2023

Abstract: While progress in 2D generative models of human appearance has been rapid, many applications require 3D avatars that can be animated and rendered. Unfortunately, most existing methods for learning generative models of 3D humans with diverse shape and appearance require 3D training data, which is limited and expensive to acquire. The key to progress is hence to learn generative models of 3D avatars from abundant unstructured 2D image collections. However, learning realistic and complete 3D appearance and geometry in this under-constrained setting remains challenging, especially in the presence of loose clothing such as dresses. In this paper, we propose a new adversarial generative model of realistic 3D people from 2D images. Our method captures shape and deformation of the body and loose clothing by adopting a holistic 3D generator and integrating an efficient and flexible articulation module. To improve realism, we train our model using multiple discriminators while also integrating geometric cues in the form of predicted 2D normal maps. We experimentally find that our method outperforms previous 3D- and articulation-aware methods in terms of geometry and appearance. We validate the effectiveness of our model and the importance of each component via systematic ablation studies.

Latex Bibtex Citation:
@inproceedings{Dong2023ICCV,
author = {Zijian Dong and Xu Chen and Jinlong Yang and Michael Black and Otmar Hilliges and Andreas Geiger},
title = {AG3D: Learning to Generate 3D Avatars from 2D Image Collections},
booktitle = {International Conference on Computer Vision (ICCV)},
year = {2023}
}

Paper

Supplementary Material

Poster

Video

Project Page

Fast-SNARF: A Fast Deformer for Articulated Neural Fields
X. Chen, T. Jiang, J. Song, M. Rietmann, A. Geiger, M. Black and O. Hilliges
Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023

Abstract: Neural fields have revolutionized the area of 3D reconstruction and novel view synthesis of rigid scenes. A key challenge in making such methods applicable to articulated objects, such as the human body, is to model the deformation of 3D locations between the rest pose (a canonical space) and the deformed space. We propose a new articulation module for neural fields, Fast-SNARF, which finds accurate correspondences between canonical space and posed space via iterative root finding. Fast-SNARF is a drop-in replacement in functionality to our previous work, SNARF, while significantly improving its computational efficiency. We contribute several algorithmic and implementation improvements over SNARF, yielding a speed-up of 150×. These improvements include voxel-based correspondence search, pre-computing the linear blend skinning function, and an efficient software implementation with CUDA kernels. Fast-SNARF enables efficient and simultaneous optimization of shape and skinning weights given deformed observations without correspondences (e.g. 3D meshes). Because learning of deformation maps is a crucial component in many 3D human avatar methods and since Fast-SNARF provides a computationally efficient solution, we believe that this work represents a significant step towards the practical creation of 3D virtual humans.

Latex Bibtex Citation:
@article{Chen2023PAMI,
author = {Xu Chen and Tianjian Jiang and Jie Song and Max Rietmann and Andreas Geiger and Michael J. Black and Otmar Hilliges},
title = {Fast-SNARF: A Fast Deformer for Articulated Neural Fields},
journal = {Transactions on Pattern Analysis and Machine Intelligence (TPAMI)},
year = {2023}
}

Paper

Project Page

gDNA: Towards Generative Detailed Neural Avatars
X. Chen, T. Jiang, J. Song, J. Yang, M. Black, A. Geiger and O. Hilliges
Conference on Computer Vision and Pattern Recognition (CVPR), 2022

Abstract: To make 3D human avatars widely available, we must be able to generate a variety of 3D virtual humans with varied identities and shapes in arbitrary poses. This task is challenging due to the diversity of clothed body shapes, their complex articulations, and the resulting rich, yet stochastic geometric detail in clothing. Hence, current methods that represent 3D people do not provide a full generative model of people in clothing. In this paper, we propose a novel method that learns to generate detailed 3D shapes of people in a variety of garments with corresponding skinning weights. Specifically, we devise a multi-subject forward skinning module that is learned from only a few posed, un-rigged scans per subject. To capture the stochastic nature of high-frequency details in garments, we leverage an adversarial loss formulation that encourages the model to capture the underlying statistics. We provide empirical evidence that this leads to realistic generation of local details such as wrinkles. We show that our model is able to generate natural human avatars wearing diverse and detailed clothing. Furthermore, we show that our method can be used on the task of fitting human models to raw scans, outperforming the previous state-of-the-art.

Latex Bibtex Citation:
@inproceedings{Chen2022CVPR,
author = {Xu Chen and Tianjian Jiang and Jie Song and Jinlong Yang and Michael Black and Andreas Geiger and Otmar Hilliges},
title = {gDNA: Towards Generative Detailed Neural Avatars},
booktitle = {Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2022}
}

Paper

Supplementary Material

Poster

Video

Project Page

PINA: Learning a Personalized Implicit Neural Avatar from a Single RGB-D Video Sequence
Z. Dong, C. Guo, J. Song, X. Chen, A. Geiger and O. Hilliges
Conference on Computer Vision and Pattern Recognition (CVPR), 2022

Abstract: We present a novel method to learn Personalized Implicit Neural Avatars (PINA) from a short RGB-D sequence. This allows non-expert users to create a detailed and personalized virtual copy of themselves, which can be animated with realistic clothing deformations. PINA does not require complete scans, nor does it require a prior learned from large datasets of clothed humans. Learning a complete avatar in this setting is challenging, since only few depth observations are available, which are noisy and incomplete (i.e.only partial visibility of the body per frame). We propose a method to learn the shape and non-rigid deformations via a pose-conditioned implicit surface and a deformation field, defined in canonical space. This allows us to fuse all partial observations into a single consistent canonical representation. Fusion is formulated as a global optimization problem over the pose, shape and skinning parameters. The method can learn neural avatars from real noisy RGB-D sequences for a diverse set of people and clothing styles and these avatars can be animated given unseen motion sequences.

Latex Bibtex Citation:
@inproceedings{Dong2022CVPR,
author = {Zijian Dong and Chen Guo and Jie Song and Xu Chen and Andreas Geiger and Otmar Hilliges},
title = {PINA: Learning a Personalized Implicit Neural Avatar from a Single RGB-D Video Sequence},
booktitle = {Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2022}
}

Paper

Supplementary Material

Poster

Video

Project Page

SNARF: Differentiable Forward Skinning for Animating Non-Rigid Neural Implicit Shapes
X. Chen, Y. Zheng, M. Black, O. Hilliges and A. Geiger
International Conference on Computer Vision (ICCV), 2021

Abstract: Neural implicit surface representations have emerged as a promising paradigm to capture 3D shapes in a continuous and resolution-independent manner. However, adapting them to articulated shapes is non-trivial. Existing approaches learn a backward warp field that maps deformed to canonical points. However, this is problematic since the backward warp field is pose dependent and thus requires large amounts of data to learn. To address this, we introduce SNARF, which combines the advantages of linear blend skinning (LBS) for polygonal meshes with those of neural implicit surfaces by learning a forward deformation field without direct supervision. This deformation field is defined in canonical, pose-independent space, allowing for generalization to unseen poses. Learning the deformation field from posed meshes alone is challenging since the correspondences of deformed points are defined implicitly and may not be unique under changes of topology. We propose a forward skinning model that finds all canonical correspondences of any deformed point using iterative root finding. We derive analytical gradients via implicit differentiation, enabling end-to-end training from 3D meshes with bone transformations. Compared to state-of-the-art neural implicit representations, our approach generalizes better to unseen poses while preserving accuracy. We demonstrate our method in challenging scenarios on (clothed) 3D humans in diverse and unseen poses.

Latex Bibtex Citation:
@inproceedings{Chen2021ICCV,
author = {Xu Chen and Yufeng Zheng and Michael Black and Otmar Hilliges and Andreas Geiger},
title = {SNARF: Differentiable Forward Skinning for Animating Non-Rigid Neural Implicit Shapes},
booktitle = {International Conference on Computer Vision (ICCV)},
year = {2021}
}

Paper

Supplementary Material

Category Level Object Pose Estimation via Neural Analysis-by-Synthesis
X. Chen, Z. Dong, J. Song, A. Geiger and O. Hilliges
European Conference on Computer Vision (ECCV), 2020

Abstract: Many object pose estimation algorithms rely on the analysis-by-synthesis framework which requires explicit representations of individual object instances. In this paper we combine a gradient-based fitting procedure with a parametric neural image synthesis module that is capable of implicitly representing the appearance, shape and pose of entire object categories, thus rendering the need for explicit CAD models per object instance unnecessary. The image synthesis network is designed to efficiently span the pose configuration space so that model capacity can be used to capture the shape and local appearance (i.e., texture) variations jointly. At inference time the synthesized images are compared to the target via an appearance based loss and the error signal is backpropagated through the network to the input parameters. Keeping the network parameters fixed, this allows for iterative optimization of the object pose, shape and appearance in a joint manner and we experimentally show that the method can recover orientation of objects with high accuracy from 2D images alone. When provided with depth measurements, to overcome scale ambiguities, the method can accurately recover the full 6DOF pose successfully.

Latex Bibtex Citation:
@inproceedings{Chen2020ECCV,
author = {Xu Chen and Zijian Dong and Jie Song and Andreas Geiger and Otmar Hilliges},
title = {Category Level Object Pose Estimation via Neural Analysis-by-Synthesis},
booktitle = {European Conference on Computer Vision (ECCV)},
year = {2020}
}

Paper

Supplementary Material

Project Page