Publications

Abstract: We present Multi-Baseline Radiance Fields (MuRF), a general feed-forward approach to solving sparse view synthesis under multiple different baseline settings (small and large baselines, and different number of input views). To render a target novel view, we discretize the 3D space into planes parallel to the target image plane, and accordingly construct a target view frustum volume. Such a target volume representation is spatially aligned with the target view, which effectively aggregates relevant information from the input views for high-quality rendering. It also facilitates subsequent radiance field regression with a convolutional network thanks to its axis-aligned nature. The 3D context modeled by the convolutional network enables our method to synthesis sharper scene structures than prior works. Our MuRF achieves state-of-the-art performance across multiple different baseline settings and diverse scenarios ranging from simple objects (DTU) to complex indoor and outdoor scenes (RealEstate10K and LLFF). We also show promising zero-shot generalization abilities on the Mip-NeRF 360 dataset, demonstrating the general applicability of MuRF.

Latex Bibtex Citation:
@inproceedings{Xu2024CVPR,
author = {Haofei Xu and Anpei Chen and Yuedong Chen and Christos Sakaridis and Yulun Zhang and Marc Pollefeys and Andreas Geiger and Fisher Yu},
title = {MuRF: Multi-Baseline Radiance Fields},
booktitle = {Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2024}
}

Paper

Project Page

Abstract: We present a unified formulation and model for three motion and 3D perception tasks: optical flow, rectified stereo matching and unrectified stereo depth estimation from posed images. Unlike previous specialized architectures for each specific task, we formulate all three tasks as a unified dense correspondence matching problem, which can be solved with a single model by directly comparing feature similarities. Such a formulation calls for discriminative feature representations, which we achieve using a Transformer, in particular the cross-attention mechanism. We demonstrate that cross-attention enables integration of knowledge from another image via cross-view interactions, which greatly improves the quality of the extracted features. Our unified model naturally enables cross-task transfer since the model architecture and parameters are shared across tasks. We outperform RAFT with our unified model on the challenging Sintel dataset, and our final model that uses a few additional task-specific refinement steps outperforms or compares favorably to recent state-of-the-art methods on 10 popular flow, stereo and depth datasets, while being simpler and more efficient in terms of model design and inference speed.

Latex Bibtex Citation:
@article{Xu2023PAMI,
author = {Haofei Xu and Jing Zhang and Jianfei Cai and Hamid Rezatofighi and Fisher Yu and Dacheng Tao and Andreas Geiger},
title = {Unifying Flow, Stereo and Depth Estimation},
journal = {Transactions on Pattern Analysis and Machine Intelligence (TPAMI)},
year = {2023}
}

Paper

Project Page

Publications of Haofei Xu