Publications

Publications of Zehao Yu

Gaussian Opacity Fields: Efficient Adaptive Surface Reconstruction in Unbounded Scenes
Z. Yu, T. Sattler and A. Geiger
Conference and Exhibition on Computer Graphics and Interactive Techniques in Asia (SIGGRAPH ASIA), 2024

Abstract: Recently, 3D Gaussian Splatting (3DGS) has demonstrated impressive novel view synthesis results, while allowing the rendering of high-resolution images in real-time. However, leveraging 3D Gaussians for surface reconstruction poses significant challenges due to the explicit and disconnected nature of 3D Gaussians. In this work, we present Gaussian Opacity Fields (GOF), a novel approach for efficient, high-quality, and adaptive surface reconstruction in unbounded scenes. Our GOF is derived from ray-tracing-based volume rendering of 3D Gaussians, enabling direct geometry extraction from 3D Gaussians by identifying its levelset, without resorting to Poisson reconstruction or TSDF fusion as in previous work. We approximate the surface normal of Gaussians as the normal of the ray-Gaussian intersection plane, enabling the application of regularization that significantly enhances geometry. Furthermore, we develop an efficient geometry extraction method utilizing Marching Tetrahedra, where the tetrahedral grids are induced from 3D Gaussians and thus adapt to the scene's complexity. Our evaluations reveal that GOF surpasses existing 3DGS-based methods in surface reconstruction and novel view synthesis. Further, it compares favorably to or even outperforms, neural implicit methods in both quality and speed.

Latex Bibtex Citation:
@inproceedings{Yu2024SIGGRAPHASIA,
author = {Zehao Yu and Torsten Sattler and Andreas Geiger},
title = {Gaussian Opacity Fields: Efficient Adaptive Surface Reconstruction in Unbounded Scenes},
booktitle = {Conference and Exhibition on Computer Graphics and Interactive Techniques in Asia (SIGGRAPH ASIA)},
year = {2024}
}

Paper

Project Page

2D Gaussian Splatting for Geometrically Accurate Radiance Fields
B. Huang, Z. Yu, A. Chen, A. Geiger and S. Gao
International Conference on Computer Graphics and Interactive Techniques (SIGGRAPH), 2024

Abstract: 3D Gaussian Splatting (3DGS) has recently revolutionized radiance field reconstruction, achieving high quality novel view synthesis and fast rendering speed without baking. However, 3DGS fails to accurately represent surfaces due to the multi-view inconsistent nature of 3D Gaussians. We present 2D Gaussian Splatting (2DGS), a novel approach to model and reconstruct geometrically accurate radiance fields from multi-view images. Our key idea is to collapse the 3D volume into a set of 2D oriented planar Gaussian disks. Unlike 3D Gaussians, 2D Gaussians provide view-consistent geometry while modeling surfaces intrinsically. To accurately recover thin surfaces and achieve stable optimization, we introduce a perspective-correct 2D splatting process utilizing ray-splat intersection and rasterization. Additionally, we incorporate depth distortion and normal consistency terms to further enhance the quality of the reconstructions. We demonstrate that our differentiable renderer allows for noise-free and detailed geometry reconstruction while maintaining competitive appearance quality, fast training speed, and real-time rendering.

Latex Bibtex Citation:
@inproceedings{Huang2024SIGGRAPH,
author = {Binbin Huang and Zehao Yu and Anpei Chen and Andreas Geiger and Shenghua Gao},
title = {2D Gaussian Splatting for Geometrically Accurate Radiance Fields},
booktitle = {International Conference on Computer Graphics and Interactive Techniques (SIGGRAPH)},
year = {2024}
}

Paper

Video

Project Page

Mip-Splatting: Alias-free 3D Gaussian Splatting (oral, best student paper award)
Z. Yu, A. Chen, B. Huang, T. Sattler and A. Geiger
Conference on Computer Vision and Pattern Recognition (CVPR), 2024

Abstract: Recently, 3D Gaussian Splatting has demonstrated impressive novel view synthesis results, reaching high fidelity and efficiency. However, strong artifacts can be observed when changing the sampling rate, e.g., by changing focal length or camera distance. We find that the source for this phenomenon can be attributed to the lack of 3D frequency constraints and the usage of a 2D dilation filter. To address this problem, we introduce a 3D smoothing filter which constrains the size of the 3D Gaussian primitives based on the maximal sampling frequency induced by the input views, eliminating high-frequency artifacts when zooming in. Moreover, replacing 2D dilation with a 2D Mip filter, which simulates a 2D box filter, effectively mitigates aliasing and dilation issues. Our evaluation, including scenarios such a training on single-scale images and testing on multiple scales, validates the effectiveness of our approach.

Latex Bibtex Citation:
@inproceedings{Yu2024CVPR,
author = {Zehao Yu and Anpei Chen and Binbin Huang and Torsten Sattler and Andreas Geiger},
title = {Mip-Splatting: Alias-free 3D Gaussian Splatting},
booktitle = {Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2024}
}

Paper

Video

Project Page

Efficient End-to-End Detection of 6-DoF Grasps for Robotic Bin Picking
Y. Liu, A. Qualmann, Z. Yu, M. Gabriel, P. Schillinger, M. Spies, N. Vien and A. Geiger
International Conference on Robotics and Automation (ICRA), 2024

Abstract: Bin picking is an important building block for many robotic systems, in logistics, production or in household use-cases. In recent years, machine learning methods for the prediction of 6-DoF grasps on diverse and unknown objects have shown promising progress. However, existing approaches only consider a single ground truth grasp orientation at a grasp location during training and therefore can only predict limited grasp orientations which leads to a reduced number of feasible grasps in bin picking with restricted reachability. In this paper, we propose a novel approach for learning dense and diverse 6-DoF grasps for parallel-jaw grippers in robotic bin picking. We introduce a parameterized grasp distribution model based on Power-Spherical distributions that enables a training based on all possible ground truth samples. Thereby, we also consider the grasp uncertainty enhancing the model’s robustness to noisy inputs. As a result, given a single top-down view depth image, our model can generate diverse grasps with multiple collision-free grasp orientations. Experimental evaluations in simulation and on a real robotic bin picking setup demonstrate the model’s ability to generalize across various object categories achieving an object clearing rate of around 90% in simulation and real-world experiments. We also outperform state of the art approaches. Moreover, the proposed approach exhibits its usability in real robot experiments without any refinement steps, even when only trained on a synthetic dataset, due to the probabilistic grasp distribution modeling.

Latex Bibtex Citation:
@inproceedings{Liu2024ICRA,
author = {Yushi Liu and Alexander Qualmann and Zehao Yu and Miroslav Gabriel and Philipp Schillinger and Markus Spies and Ngo Anh Vien and Andreas Geiger},
title = {Efficient End-to-End Detection of 6-DoF Grasps for Robotic Bin Picking},
booktitle = {International Conference on Robotics and Automation (ICRA)},
year = {2024}
}

Paper

Video

TransFuser: Imitation with Transformer-Based Sensor Fusion for Autonomous Driving
K. Chitta, A. Prakash, B. Jaeger, Z. Yu, K. Renz and A. Geiger
Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023

Abstract: How should we integrate representations from complementary sensors for autonomous driving? Geometry-based fusion has shown promise for perception (e.g. object detection, motion forecasting). However, in the context of end-to-end driving, we find that imitation learning based on existing sensor fusion methods underperforms in complex driving scenarios with a high density of dynamic agents. Therefore, we propose TransFuser, a mechanism to integrate image and LiDAR representations using self-attention. Our approach uses transformer modules at multiple resolutions to fuse perspective view and bird's eye view feature maps. We experimentally validate its efficacy on a challenging new benchmark with long routes and dense traffic, as well as the official leaderboard of the CARLA urban driving simulator. At the time of submission, TransFuser outperforms all prior work on the CARLA leaderboard in terms of driving score by a large margin. Compared to geometry-based fusion, TransFuser reduces the average collisions per kilometer by 48%.

Latex Bibtex Citation:
@article{Chitta2022PAMI,
author = {Kashyap Chitta and Aditya Prakash and Bernhard Jaeger and Zehao Yu and Katrin Renz and Andreas Geiger},
title = {TransFuser: Imitation with Transformer-Based Sensor Fusion for Autonomous Driving},
journal = {Transactions on Pattern Analysis and Machine Intelligence (TPAMI)},
year = {2023}
}

Paper

Supplementary Material

MonoSDF: Exploring Monocular Geometric Cues for Neural Implicit Surface Reconstruction
Z. Yu, S. Peng, M. Niemeyer, T. Sattler and A. Geiger
Advances in Neural Information Processing Systems (NeurIPS), 2022

Abstract: In recent years, neural implicit surface reconstruction methods have become popular for multi-view 3D reconstruction. In contrast to traditional multi-view stereo methods, these approaches tend to produce smoother and more complete reconstructions due to the inductive smoothness bias of neural networks. State-of-the-art neural implicit methods allow for high-quality reconstructions of simple scenes from many input views. Yet, their performance drops significantly for larger and more complex scenes and scenes captured from sparse viewpoints. This is caused primarily by the inherent ambiguity in the RGB reconstruction loss that does not provide enough constraints, in particular in less-observed and textureless areas. Motivated by recent advances in the area of monocular geometry prediction, we systematically explore the utility these cues provide for improving neural implicit surface reconstruction. We demonstrate that depth and normal cues, predicted by general-purpose monocular estimators, significantly improve reconstruction quality and optimization time. Further, we analyse and investigate multiple design choices for representing neural implicit surfaces, ranging from monolithic MLP models over single-grid to multi-resolution grid representations. We observe that geometric monocular priors improve performance both for small-scale single-object as well as large-scale multi-object scenes, independent of the choice of representation.

Latex Bibtex Citation:
@inproceedings{Yu2022NEURIPS,
author = {Zehao Yu and Songyou Peng and Michael Niemeyer and Torsten Sattler and Andreas Geiger},
title = {MonoSDF: Exploring Monocular Geometric Cues for Neural Implicit Surface Reconstruction},
booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
year = {2022}
}

Paper

Arxiv

Project Page