Publications

Publications of Anpei Chen

GraphDreamer: Compositional 3D Scene Synthesis from Scene Graphs
G. Gao, W. Liu, A. Chen, A. Geiger and B. Schölkopf
Conference on Computer Vision and Pattern Recognition (CVPR), 2024

Abstract: As pretrained text-to-image diffusion models become increasingly powerful, recent efforts have been made to distill knowledge from these text-to-image pretrained models for optimizing a text-guided 3D model. Most of the existing methods generate a holistic 3D model from a plain text input. This can be problematic when the text describes a complex scene with multiple objects, because the vectorized text embeddings are inherently unable to capture a complex description with multiple entities and relationships. Holistic 3D modeling of the entire scene further prevents accurate grounding of text entities and concepts. To address this limitation, we propose GraphDreamer, a novel framework to generate compositional 3D scenes from scene graphs, where objects are represented as nodes and their interactions as edges. By exploiting node and edge information in scene graphs, our method makes better use of the pretrained text-to-image diffusion model and is able to fully disentangle different objects without image-level supervision. To facilitate modeling of object-wise relationships, we use signed distance fields as representation and impose a constraint to avoid inter-penetration of objects. To avoid manual scene graph creation, we design a text prompt for ChatGPT to generate scene graphs based on text inputs. We conduct both qualitative and quantitative experiments to validate the effectiveness of GraphDreamer in generating high-fidelity compositional 3D scenes with disentangled object entities.

Latex Bibtex Citation:
@inproceedings{Gao2024CVPR,
author = {Gege Gao and Weiyang Liu and Anpei Chen and Andreas Geiger and Bernhard Schölkopf},
title = {GraphDreamer: Compositional 3D Scene Synthesis from Scene Graphs},
booktitle = {Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2024}
}

Paper

Project Page

MuRF: Multi-Baseline Radiance Fields
H. Xu, A. Chen, Y. Chen, C. Sakaridis, Y. Zhang, M. Pollefeys, A. Geiger and F. Yu
Conference on Computer Vision and Pattern Recognition (CVPR), 2024

Abstract: We present Multi-Baseline Radiance Fields (MuRF), a general feed-forward approach to solving sparse view synthesis under multiple different baseline settings (small and large baselines, and different number of input views). To render a target novel view, we discretize the 3D space into planes parallel to the target image plane, and accordingly construct a target view frustum volume. Such a target volume representation is spatially aligned with the target view, which effectively aggregates relevant information from the input views for high-quality rendering. It also facilitates subsequent radiance field regression with a convolutional network thanks to its axis-aligned nature. The 3D context modeled by the convolutional network enables our method to synthesis sharper scene structures than prior works. Our MuRF achieves state-of-the-art performance across multiple different baseline settings and diverse scenarios ranging from simple objects (DTU) to complex indoor and outdoor scenes (RealEstate10K and LLFF). We also show promising zero-shot generalization abilities on the Mip-NeRF 360 dataset, demonstrating the general applicability of MuRF.

Latex Bibtex Citation:
@inproceedings{Xu2024CVPR,
author = {Haofei Xu and Anpei Chen and Yuedong Chen and Christos Sakaridis and Yulun Zhang and Marc Pollefeys and Andreas Geiger and Fisher Yu},
title = {MuRF: Multi-Baseline Radiance Fields},
booktitle = {Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2024}
}

Paper

Project Page

Mip-Splatting: Alias-free 3D Gaussian Splatting
Z. Yu, A. Chen, B. Huang, T. Sattler and A. Geiger
Conference on Computer Vision and Pattern Recognition (CVPR), 2024

Abstract: Recently, 3D Gaussian Splatting has demonstrated impressive novel view synthesis results, reaching high fidelity and efficiency. However, strong artifacts can be observed when changing the sampling rate, e.g., by changing focal length or camera distance. We find that the source for this phenomenon can be attributed to the lack of 3D frequency constraints and the usage of a 2D dilation filter. To address this problem, we introduce a 3D smoothing filter which constrains the size of the 3D Gaussian primitives based on the maximal sampling frequency induced by the input views, eliminating high-frequency artifacts when zooming in. Moreover, replacing 2D dilation with a 2D Mip filter, which simulates a 2D box filter, effectively mitigates aliasing and dilation issues. Our evaluation, including scenarios such a training on single-scale images and testing on multiple scales, validates the effectiveness of our approach.

Latex Bibtex Citation:
@inproceedings{Yu2024CVPR,
author = {Zehao Yu and Anpei Chen and Binbin Huang and Torsten Sattler and Andreas Geiger},
title = {Mip-Splatting: Alias-free 3D Gaussian Splatting},
booktitle = {Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2024}
}

Paper

Project Page

NeLF-Pro: Neural Light Field Probes for Multi-Scale Novel View Synthesis
Z. You, A. Geiger and A. Chen
Conference on Computer Vision and Pattern Recognition (CVPR), 2024

Abstract: We present NeLF-Pro, a novel representation to model and reconstruct light fields in diverse natural scenes that vary in extent and spatial granularity. In contrast to previous fast reconstruction methods that represent the 3D scene globally, we model the light field of a scene as a set of local light field feature probes, parameterized with position and multi-channel 2D feature maps. Our central idea is to bake the scene's light field into spatially varying learnable representations and to query point features by weighted blending of probes close to the camera - allowing for mipmap representation and rendering. We introduce a novel vector-matrix-matrix (VMM) factorization technique that effectively represents the light field feature probes as products of core factors (i.e., VM) shared among local feature probes, and a basis factor (i.e., M) - efficiently encoding internal relationships and patterns within the scene. Experimentally, we demonstrate that NeLF-Pro significantly boosts the performance of feature grid-based representations, and achieves fast reconstruction with better rendering quality while maintaining compact modeling.

Latex Bibtex Citation:
@inproceedings{You2024CVPR,
author = {Zinuo You and Andreas Geiger and Anpei Chen},
title = {NeLF-Pro: Neural Light Field Probes for Multi-Scale Novel View Synthesis},
booktitle = {Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2024}
}

Paper

Project Page

Dictionary Fields: Learning a Neural Basis Decomposition
A. Chen, Z. Xu, X. Wei, S. Tang, H. Su and A. Geiger
International Conference on Computer Graphics and Interactive Techniques (SIGGRAPH), 2023

Abstract: We present Dictionary Fields, a novel neural representation which decomposes a signal into a product of factors, each represented by a classical or neural field representation, operating on transformed input coordinates. More specifically, we factorize a signal into a coefficient field and a basis field, and exploit periodic coordinate transformations to apply the same basis functions across multiple locations and scales. Our experiments show that Dictionary Fields lead to improvements in approximation quality, compactness, and training time when compared to previous fast reconstruction methods. Experimentally, our representation achieves better image approximation quality on 2D image regression tasks, higher geometric quality when reconstructing 3D signed distance fields, and higher compactness for radiance field reconstruction tasks. Furthermore, Dictionary Fields enable generalization to unseen images/3D scenes by sharing bases across signals during training which greatly benefits use cases such as image regression from partial observations and few-shot radiance field reconstruction.

Latex Bibtex Citation:
@inproceedings{Chen2023SIGGRAPH,
author = {Anpei Chen and Zexiang Xu and Xinyue Wei and Siyu Tang and Hao Su and Andreas Geiger},
title = {Dictionary Fields: Learning a Neural Basis Decomposition},
booktitle = {International Conference on Computer Graphics and Interactive Techniques (SIGGRAPH)},
year = {2023}
}

Paper

Video

Project Page

Factor Fields: A Unified Framework for Neural Fields and Beyond
A. Chen, Z. Xu, X. Wei, S. Tang, H. Su and A. Geiger
Arxiv, 2023

Abstract: We present Factor Fields, a novel framework for modeling and representing signals. Factor Fields decomposes a signal into a product of factors, each represented by a classical or neural field representation which operates on transformed input coordinates. This decomposition results in a unified framework that accommodates several recent signal representations including NeRF, Plenoxels, EG3D, Instant-NGP, and TensoRF. Additionally, our framework allows for the creation of powerful new signal representations, such as the "Dictionary Field" (DiF) which is a second contribution of this paper. Our experiments show that DiF leads to improvements in approximation quality, compactness, and training time when compared to previous fast reconstruction methods. Experimentally, our representation achieves better image approximation quality on 2D image regression tasks, higher geometric quality when reconstructing 3D signed distance fields, and higher compactness for radiance field reconstruction tasks. Furthermore, DiF enables generalization to unseen images/3D scenes by sharing bases across signals during training which greatly benefits use cases such as image regression from sparse observations and few-shot radiance field reconstruction.

Latex Bibtex Citation:
@article{Chen2023ARXIV,
author = {Anpei Chen and Zexiang Xu and Xinyue Wei and Siyu Tang and Hao Su and Andreas Geiger},
title = {Factor Fields: A Unified Framework for Neural Fields and Beyond},
journal = {Arxiv},
year = {2023}
}

Paper

Video

Project Page

NeRFPlayer: Streamable Dynamic Scene Representation with Decomposed Neural Radiance Fields
L. Song, A. Chen, Z. Li, Z. Chen, L. Chen, J. Yuan, Y. Xu and A. Geiger
IEEE Transactions on Visualization and Computer Graphics (TVCG), 2023

Abstract: Visually exploring in a real-world 4D spatiotemporal space freely in VR has been a long-term quest. The task is especially appealing when only a few or even single RGB cameras are used for capturing the dynamic scene. To this end, we present an efficient framework capable of fast reconstruction, compact modeling, and streamable rendering. First, we propose to decompose the 4D spatiotemporal space according to temporal characteristics. Points in the 4D space are associated with probabilities of belonging to three categories: static, deforming, and new areas. Each area is represented and regularized by a separate neural field. Second, we propose a hybrid representations based feature streaming scheme for efficiently modeling the neural fields. Our approach, coined NeRFPlayer, is evaluated on dynamic scenes captured by single hand-held cameras and multi-camera arrays, achieving comparable or superior rendering performance in terms of quality and speed comparable to recent state-of-the-art methods, achieving reconstruction in 10 seconds per frame and real-time rendering.

Latex Bibtex Citation:
@article{Song2023TVCG,
author = {Liangchen Song and Anpei Chen and Zhong Li and Zhang Chen and Lele Chen and Junsong Yuan and Yi Xu and Andreas Geiger},
title = {NeRFPlayer: Streamable Dynamic Scene Representation with Decomposed Neural Radiance Fields},
journal = {IEEE Transactions on Visualization and Computer Graphics (TVCG)},
year = {2023}
}

Paper

Video

Project Page

TensoRF: Tensorial Radiance Fields
A. Chen, Z. Xu, A. Geiger, J. Yu and H. Su
European Conference on Computer Vision (ECCV), 2022

Abstract: We present TensoRF, a novel approach to model and reconstruct radiance fields. Unlike NeRF that purely uses MLPs, we model the radiance field of a scene as a 4D tensor, which represents a 3D voxel grid with per-voxel multi-channel features. Our central idea is to factorize the 4D scene tensor into multiple compact low-rank tensor components. We demonstrate that applying traditional CP decomposition - that factorizes tensors into rank-one components with compact vectors -- in our framework leads to improvements over vanilla NeRF. To further boost performance, we introduce a novel vector-matrix (VM) decomposition that relaxes the low-rank constraints for two modes of a tensor and factorizes tensors into compact vector and matrix factors. Beyond superior rendering quality, our models with CP and VM decompositions lead to a significantly lower memory footprint in comparison to previous and concurrent works that directly optimize per-voxel features. Experimentally, we demonstrate that TensoRF with CP decomposition achieves fast reconstruction (<30 min) with better rendering quality and even a smaller model size (<4 MB) compared to NeRF. Moreover, TensoRF with VM decomposition further boosts rendering quality and outperforms previous state-of-the-art methods, while reducing the reconstruction time (<10 min) and retaining a compact model size (<75 MB).

Latex Bibtex Citation:
@inproceedings{Chen2022ECCV,
author = {Anpei Chen and Zexiang Xu and Andreas Geiger and Jingyi Yu and Hao Su},
title = {TensoRF: Tensorial Radiance Fields},
booktitle = {European Conference on Computer Vision (ECCV)},
year = {2022}
}

Paper

Supplementary Material

Poster

Video

Project Page