Publications

Publications of Zijian Dong

AG3D: Learning to Generate 3D Avatars from 2D Image Collections
Z. Dong, X. Chen, J. Yang, M. Black, O. Hilliges and A. Geiger
International Conference on Computer Vision (ICCV), 2023

Abstract: While progress in 2D generative models of human appearance has been rapid, many applications require 3D avatars that can be animated and rendered. Unfortunately, most existing methods for learning generative models of 3D humans with diverse shape and appearance require 3D training data, which is limited and expensive to acquire. The key to progress is hence to learn generative models of 3D avatars from abundant unstructured 2D image collections. However, learning realistic and complete 3D appearance and geometry in this under-constrained setting remains challenging, especially in the presence of loose clothing such as dresses. In this paper, we propose a new adversarial generative model of realistic 3D people from 2D images. Our method captures shape and deformation of the body and loose clothing by adopting a holistic 3D generator and integrating an efficient and flexible articulation module. To improve realism, we train our model using multiple discriminators while also integrating geometric cues in the form of predicted 2D normal maps. We experimentally find that our method outperforms previous 3D- and articulation-aware methods in terms of geometry and appearance. We validate the effectiveness of our model and the importance of each component via systematic ablation studies.

Latex Bibtex Citation:
@inproceedings{Dong2023ICCV,
author = {Zijian Dong and Xu Chen and Jinlong Yang and Michael Black and Otmar Hilliges and Andreas Geiger},
title = {AG3D: Learning to Generate 3D Avatars from 2D Image Collections},
booktitle = {International Conference on Computer Vision (ICCV)},
year = {2023}
}

Paper

Supplementary Material

Poster

Video

Project Page

PINA: Learning a Personalized Implicit Neural Avatar from a Single RGB-D Video Sequence
Z. Dong, C. Guo, J. Song, X. Chen, A. Geiger and O. Hilliges
Conference on Computer Vision and Pattern Recognition (CVPR), 2022

Abstract: We present a novel method to learn Personalized Implicit Neural Avatars (PINA) from a short RGB-D sequence. This allows non-expert users to create a detailed and personalized virtual copy of themselves, which can be animated with realistic clothing deformations. PINA does not require complete scans, nor does it require a prior learned from large datasets of clothed humans. Learning a complete avatar in this setting is challenging, since only few depth observations are available, which are noisy and incomplete (i.e.only partial visibility of the body per frame). We propose a method to learn the shape and non-rigid deformations via a pose-conditioned implicit surface and a deformation field, defined in canonical space. This allows us to fuse all partial observations into a single consistent canonical representation. Fusion is formulated as a global optimization problem over the pose, shape and skinning parameters. The method can learn neural avatars from real noisy RGB-D sequences for a diverse set of people and clothing styles and these avatars can be animated given unseen motion sequences.

Latex Bibtex Citation:
@inproceedings{Dong2022CVPR,
author = {Zijian Dong and Chen Guo and Jie Song and Xu Chen and Andreas Geiger and Otmar Hilliges},
title = {PINA: Learning a Personalized Implicit Neural Avatar from a Single RGB-D Video Sequence},
booktitle = {Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2022}
}

Paper

Supplementary Material

Poster

Video

Project Page

Category Level Object Pose Estimation via Neural Analysis-by-Synthesis
X. Chen, Z. Dong, J. Song, A. Geiger and O. Hilliges
European Conference on Computer Vision (ECCV), 2020

Abstract: Many object pose estimation algorithms rely on the analysis-by-synthesis framework which requires explicit representations of individual object instances. In this paper we combine a gradient-based fitting procedure with a parametric neural image synthesis module that is capable of implicitly representing the appearance, shape and pose of entire object categories, thus rendering the need for explicit CAD models per object instance unnecessary. The image synthesis network is designed to efficiently span the pose configuration space so that model capacity can be used to capture the shape and local appearance (i.e., texture) variations jointly. At inference time the synthesized images are compared to the target via an appearance based loss and the error signal is backpropagated through the network to the input parameters. Keeping the network parameters fixed, this allows for iterative optimization of the object pose, shape and appearance in a joint manner and we experimentally show that the method can recover orientation of objects with high accuracy from 2D images alone. When provided with depth measurements, to overcome scale ambiguities, the method can accurately recover the full 6DOF pose successfully.

Latex Bibtex Citation:
@inproceedings{Chen2020ECCV,
author = {Xu Chen and Zijian Dong and Jie Song and Andreas Geiger and Otmar Hilliges},
title = {Category Level Object Pose Estimation via Neural Analysis-by-Synthesis},
booktitle = {European Conference on Computer Vision (ECCV)},
year = {2020}
}

Paper

Supplementary Material

Project Page