Andreas Geiger

Publications of Zehao Yu

Mip-Splatting: Alias-free 3D Gaussian Splatting
Z. Yu, A. Chen, B. Huang, T. Sattler and A. Geiger
Conference on Computer Vision and Pattern Recognition (CVPR), 2024
Abstract: Recently, 3D Gaussian Splatting has demonstrated impressive novel view synthesis results, reaching high fidelity and efficiency. However, strong artifacts can be observed when changing the sampling rate, e.g., by changing focal length or camera distance. We find that the source for this phenomenon can be attributed to the lack of 3D frequency constraints and the usage of a 2D dilation filter. To address this problem, we introduce a 3D smoothing filter which constrains the size of the 3D Gaussian primitives based on the maximal sampling frequency induced by the input views, eliminating high-frequency artifacts when zooming in. Moreover, replacing 2D dilation with a 2D Mip filter, which simulates a 2D box filter, effectively mitigates aliasing and dilation issues. Our evaluation, including scenarios such a training on single-scale images and testing on multiple scales, validates the effectiveness of our approach.
Latex Bibtex Citation:
@inproceedings{Yu2024CVPR,
  author = {Zehao Yu and Anpei Chen and Binbin Huang and Torsten Sattler and Andreas Geiger},
  title = {Mip-Splatting: Alias-free 3D Gaussian Splatting},
  booktitle = {Conference on Computer Vision and Pattern Recognition (CVPR)},
  year = {2024}
}
Efficient End-to-End Detection of 6-DoF Grasps for Robotic Bin Picking
Y. Liu, A. Qualmann, Z. Yu, M. Gabriel, P. Schillinger, M. Spies, N. Vien and . Geiger
International Conference on Robotics and Automation (ICRA), 2024
Abstract: Bin picking is an important building block for many robotic systems, in logistics, production or in household use-cases. In recent years, machine learning methods for the prediction of 6-DoF grasps on diverse and unknown objects have shown promising progress. However, existing approaches only consider a single ground truth grasp orientation at a grasp location during training and therefore can only predict limited grasp orientations which leads to a reduced number of feasible grasps in bin picking with restricted reachability. In this paper, we propose a novel approach for learning dense and diverse 6-DoF grasps for parallel-jaw grippers in robotic bin picking. We introduce a parameterized grasp distribution model based on Power-Spherical distributions that enables a training based on all possible ground truth samples. Thereby, we also consider the grasp uncertainty enhancing the model’s robustness to noisy inputs. As a result, given a single top-down view depth image, our model can generate diverse grasps with multiple collision-free grasp orientations. Experimental evaluations in simulation and on a real robotic bin picking setup demonstrate the model’s ability to generalize across various object categories achieving an object clearing rate of around 90% in simulation and real-world experiments. We also outperform state of the art approaches. Moreover, the proposed approach exhibits its usability in real robot experiments without any refinement steps, even when only trained on a synthetic dataset, due to the probabilistic grasp distribution modeling.
Latex Bibtex Citation:
@inproceedings{Liu2024ICRA,
  author = {Yushi Liu and Alexander Qualmann and Zehao Yu and Miroslav Gabriel and Philipp Schillinger and Markus Spies and Ngo Anh Vien and Geiger},
  title = {Efficient End-to-End Detection of 6-DoF Grasps for Robotic Bin Picking},
  booktitle = {International Conference on Robotics and Automation (ICRA)},
  year = {2024}
}
TransFuser: Imitation with Transformer-Based Sensor Fusion for Autonomous Driving
K. Chitta, A. Prakash, B. Jaeger, Z. Yu, K. Renz and A. Geiger
Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Abstract: How should we integrate representations from complementary sensors for autonomous driving? Geometry-based fusion has shown promise for perception (e.g. object detection, motion forecasting). However, in the context of end-to-end driving, we find that imitation learning based on existing sensor fusion methods underperforms in complex driving scenarios with a high density of dynamic agents. Therefore, we propose TransFuser, a mechanism to integrate image and LiDAR representations using self-attention. Our approach uses transformer modules at multiple resolutions to fuse perspective view and bird's eye view feature maps. We experimentally validate its efficacy on a challenging new benchmark with long routes and dense traffic, as well as the official leaderboard of the CARLA urban driving simulator. At the time of submission, TransFuser outperforms all prior work on the CARLA leaderboard in terms of driving score by a large margin. Compared to geometry-based fusion, TransFuser reduces the average collisions per kilometer by 48%.
Latex Bibtex Citation:
@article{Chitta2022PAMI,
  author = {Kashyap Chitta and Aditya Prakash and Bernhard Jaeger and Zehao Yu and Katrin Renz and Andreas Geiger},
  title = {TransFuser: Imitation with Transformer-Based Sensor Fusion for Autonomous Driving},
  journal = {Transactions on Pattern Analysis and Machine Intelligence (TPAMI)},
  year = {2023}
}
MonoSDF: Exploring Monocular Geometric Cues for Neural Implicit Surface Reconstruction
Z. Yu, S. Peng, M. Niemeyer, T. Sattler and A. Geiger
Advances in Neural Information Processing Systems (NeurIPS), 2022
Abstract: In recent years, neural implicit surface reconstruction methods have become popular for multi-view 3D reconstruction. In contrast to traditional multi-view stereo methods, these approaches tend to produce smoother and more complete reconstructions due to the inductive smoothness bias of neural networks. State-of-the-art neural implicit methods allow for high-quality reconstructions of simple scenes from many input views. Yet, their performance drops significantly for larger and more complex scenes and scenes captured from sparse viewpoints. This is caused primarily by the inherent ambiguity in the RGB reconstruction loss that does not provide enough constraints, in particular in less-observed and textureless areas. Motivated by recent advances in the area of monocular geometry prediction, we systematically explore the utility these cues provide for improving neural implicit surface reconstruction. We demonstrate that depth and normal cues, predicted by general-purpose monocular estimators, significantly improve reconstruction quality and optimization time. Further, we analyse and investigate multiple design choices for representing neural implicit surfaces, ranging from monolithic MLP models over single-grid to multi-resolution grid representations. We observe that geometric monocular priors improve performance both for small-scale single-object as well as large-scale multi-object scenes, independent of the choice of representation.
Latex Bibtex Citation:
@inproceedings{Yu2022NEURIPS,
  author = {Zehao Yu and Songyou Peng and Michael Niemeyer and Torsten Sattler and Andreas Geiger},
  title = {MonoSDF: Exploring Monocular Geometric Cues for Neural Implicit Surface Reconstruction},
  booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
  year = {2022}
}


eXTReMe Tracker