Andreas Geiger

Publications of Haofei Xu

LaRa: Efficient Large-Baseline Radiance Fields
A. Chen, H. Xu, S. Esposito, S. Tang and A. Geiger
European Conference on Computer Vision (ECCV), 2024
Abstract: Radiance field methods have achieved photorealistic novel view synthesis and geometry reconstruction. But they are mostly applied in per-scene optimization or small-baseline settings. While several recent works investigate feed-forward reconstruction with large baselines by utilizing transformers, they all operate with a standard global attention mechanism and hence ignore the local nature of 3D reconstruction. We propose a method that unifies local and global reasoning in transformer layers, resulting in improved quality and faster convergence. Our model represents scenes as Gaussian Volumes and combines this with an image encoder and Group Attention Layers for efficient feed-forward reconstruction. Experimental results demonstrate that our model, trained for two days on four GPUs, demonstrates high fidelity in reconstructing 360° radiance fields, and robustness to zero-shot and out-of-domain testing.
Latex Bibtex Citation:
@inproceedings{Chen2024ECCV,
  author = {Anpei Chen and Haofei Xu and Stefano Esposito and Siyu Tang and Andreas Geiger},
  title = {LaRa: Efficient Large-Baseline Radiance Fields},
  booktitle = {European Conference on Computer Vision (ECCV)},
  year = {2024}
}
MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images
Y. Chen, H. Xu, C. Zheng, B. Zhuang, M. Pollefeys, A. Geiger, T. Cham and J. Cai
European Conference on Computer Vision (ECCV), 2024
Abstract: We propose MVSplat, an efficient feed-forward 3D Gaussian Splatting model learned from sparse multi-view images. To accurately localize the Gaussian centers, we propose to build a cost volume representation via plane sweeping in the 3D space, where the cross-view feature similarities stored in the cost volume can provide valuable geometry cues to the estimation of depth. We learn the Gaussian primitives' opacities, covariances, and spherical harmonics coefficients jointly with the Gaussian centers while only relying on photometric supervision. We demonstrate the importance of the cost volume representation in learning feed-forward Gaussian Splatting models via extensive experimental evaluations. On the large-scale RealEstate10K and ACID benchmarks, our model achieves state-of-the-art performance with the fastest feed-forward inference speed (22 fps). Compared to the latest state-of-the-art method pixelSplat, our model uses 10x fewer parameters and infers more than 2x faster while providing higher appearance and geometry quality as well as better cross-dataset generalization.
Latex Bibtex Citation:
@inproceedings{Chen2024ECCVb,
  author = {Yuedong Chen and Haofei Xu and Chuanxia Zheng and Bohan Zhuang and Marc Pollefeys and Andreas Geiger and Tat-Jen Cham and Jianfei Cai},
  title = {MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images},
  booktitle = {European Conference on Computer Vision (ECCV)},
  year = {2024}
}
MuRF: Multi-Baseline Radiance Fields
H. Xu, A. Chen, Y. Chen, C. Sakaridis, Y. Zhang, M. Pollefeys, A. Geiger and F. Yu
Conference on Computer Vision and Pattern Recognition (CVPR), 2024
Abstract: We present Multi-Baseline Radiance Fields (MuRF), a general feed-forward approach to solving sparse view synthesis under multiple different baseline settings (small and large baselines, and different number of input views). To render a target novel view, we discretize the 3D space into planes parallel to the target image plane, and accordingly construct a target view frustum volume. Such a target volume representation is spatially aligned with the target view, which effectively aggregates relevant information from the input views for high-quality rendering. It also facilitates subsequent radiance field regression with a convolutional network thanks to its axis-aligned nature. The 3D context modeled by the convolutional network enables our method to synthesis sharper scene structures than prior works. Our MuRF achieves state-of-the-art performance across multiple different baseline settings and diverse scenarios ranging from simple objects (DTU) to complex indoor and outdoor scenes (RealEstate10K and LLFF). We also show promising zero-shot generalization abilities on the Mip-NeRF 360 dataset, demonstrating the general applicability of MuRF.
Latex Bibtex Citation:
@inproceedings{Xu2024CVPR,
  author = {Haofei Xu and Anpei Chen and Yuedong Chen and Christos Sakaridis and Yulun Zhang and Marc Pollefeys and Andreas Geiger and Fisher Yu},
  title = {MuRF: Multi-Baseline Radiance Fields},
  booktitle = {Conference on Computer Vision and Pattern Recognition (CVPR)},
  year = {2024}
}
Unifying Flow, Stereo and Depth Estimation
H. Xu, J. Zhang, J. Cai, H. Rezatofighi, F. Yu, D. Tao and A. Geiger
Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Abstract: We present a unified formulation and model for three motion and 3D perception tasks: optical flow, rectified stereo matching and unrectified stereo depth estimation from posed images. Unlike previous specialized architectures for each specific task, we formulate all three tasks as a unified dense correspondence matching problem, which can be solved with a single model by directly comparing feature similarities. Such a formulation calls for discriminative feature representations, which we achieve using a Transformer, in particular the cross-attention mechanism. We demonstrate that cross-attention enables integration of knowledge from another image via cross-view interactions, which greatly improves the quality of the extracted features. Our unified model naturally enables cross-task transfer since the model architecture and parameters are shared across tasks. We outperform RAFT with our unified model on the challenging Sintel dataset, and our final model that uses a few additional task-specific refinement steps outperforms or compares favorably to recent state-of-the-art methods on 10 popular flow, stereo and depth datasets, while being simpler and more efficient in terms of model design and inference speed.
Latex Bibtex Citation:
@article{Xu2023PAMI,
  author = {Haofei Xu and Jing Zhang and Jianfei Cai and Hamid Rezatofighi and Fisher Yu and Dacheng Tao and Andreas Geiger},
  title = {Unifying Flow, Stereo and Depth Estimation},
  journal = {Transactions on Pattern Analysis and Machine Intelligence (TPAMI)},
  year = {2023}
}


eXTReMe Tracker