Publications

Publications of Bernhard Jaeger

An Invitation to Deep Reinforcement Learning
B. Jaeger and A. Geiger
Foundations and Trends in Optimization, 2024

Abstract: Training a deep neural network to maximize a target objective has become the standard recipe for successful machine learning over the last decade. These networks can be optimized with supervised learning if the target objective is differentiable. However, this is not the case for many interesting problems. Common objectives like intersection over union (IoU), and bilingual evaluation understudy (BLEU) scores or rewards cannot be optimized with supervised learning. A common workaround is to define differentiable surrogate losses, leading to suboptimal solutions with respect to the actual objective. Reinforcement learning (RL) has emerged as a promising alternative for optimizing deep neural networks to maximize non-differentiable objectives in recent years. Examples include aligning large language models via human feedback, code generation, object detection or control problems. This makes RL techniques relevant to the larger machine learning audience. The subject is, however, timeintensive to approach due to the large range of methods, as well as the often highly theoretical presentation. This monograph takes an alternative approach that is different from classic RL textbooks. Rather than focusing on tabular problems, we introduce RL as a generalization of supervised learning, which we first apply to non-differentiable objectives and later to temporal problems. Assuming only basic knowledge of supervised learning, the reader will be able to understand state-of-the-art deep RL algorithms like proximal policy optimization (PPO) after reading this tutorial.

Latex Bibtex Citation:
@book{Jaeger2024,
author = {Bernhard Jaeger and Andreas Geiger},
title = {An Invitation to Deep Reinforcement Learning},
publisher = {Foundations and Trends in Optimization},
year = {2024}
}

Paper

Arxiv

Link

End-to-end Autonomous Driving: Challenges and Frontiers
L. Chen, P. Wu, K. Chitta, B. Jaeger, A. Geiger and H. Li
Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024

Abstract: The autonomous driving community has witnessed a rapid growth in approaches that embrace an end-to-end algorithm framework, utilizing raw sensor input to generate vehicle motion plans, instead of concentrating on individual tasks such as detection and motion prediction. End-to-end systems, in comparison to modular pipelines, benefit from joint feature optimization for perception and planning. This field has flourished due to the availability of large-scale datasets, closed-loop evaluation, and the increasing need for autonomous driving algorithms to perform effectively in challenging scenarios. In this survey, we provide a comprehensive analysis of more than 250 papers, covering the motivation, roadmap, methodology, challenges, and future trends in end-to-end autonomous driving. We delve into several critical challenges, including multi-modality, interpretability, causal confusion, robustness, and world models, amongst others. Additionally, we discuss current advancements in foundation models and visual pre-training, as well as how to incorporate these techniques within the end-to-end driving framework. To facilitate future research, we maintain an active repository that contains up-to-date links to relevant literature and open-source projects at https://github.com/OpenDriveLab/End-to-end-Autonomous-Driving.

Latex Bibtex Citation:
@article{Chen2024PAMI,
author = {Li Chen and Penghao Wu and Kashyap Chitta and Bernhard Jaeger and Andreas Geiger and Hongyang Li},
title = {End-to-end Autonomous Driving: Challenges and Frontiers},
journal = {Transactions on Pattern Analysis and Machine Intelligence (TPAMI)},
year = {2024}
}

Paper

Project Page

Geometric Transform Attention
T. Miyato, B. Jaeger, M. Welling and A. Geiger
International Conference on Learning Representations (ICLR), 2024

Abstract: As transformers are equivariant to the permutation of input tokens, encoding the positional information of tokens is necessary for many tasks. However, since existing positional encoding schemes have been initially designed for NLP tasks, their suitability for vision tasks, which typically exhibit different structural properties in their data, is questionable. We argue that existing positional encoding schemes are suboptimal for 3D vision tasks, as they do not respect their underlying 3D geometric structure. Based on this hypothesis, we propose a geometry-aware attention mechanism that encodes the geometric structure of tokens as relative transformation determined by the geometric relationship between queries and key-value pairs. By evaluating on multiple novel view synthesis (NVS) datasets in the sparse wide-baseline multi-view setting, we show that our attention, called Geometric Transform Attention (GTA), improves learning efficiency and performance of state-of-the-art transformer-based NVS models without any additional learned parameters and only minor computational overhead.

Latex Bibtex Citation:
@inproceedings{Miyato2024ICLR,
author = {Takeru Miyato and Bernhard Jaeger and Max Welling and Andreas Geiger},
title = {Geometric Transform Attention},
booktitle = {International Conference on Learning Representations (ICLR)},
year = {2024}
}

Paper

Project Page

Hidden Biases of End-to-End Driving Models
B. Jaeger, K. Chitta and A. Geiger
International Conference on Computer Vision (ICCV), 2023

Abstract: End-to-end driving systems have recently made rapid progress, in particular on CARLA. Independent of their major contribution, they introduce changes to minor system components. Consequently, the source of improvements is unclear. We identify two biases that recur in nearly all state-of-the-art methods and are critical for the observed progress on CARLA: (1) lateral recovery via a strong inductive bias towards target point following, and (2) longitudinal averaging of multimodal waypoint predictions for slowing down. We investigate the drawbacks of these biases and identify principled alternatives. By incorporating our insights, we develop TF++, a simple end-to-end method that ranks first on the Longest6 and LAV benchmarks, gaining 11 driving score over the best prior work on Longest6.

Latex Bibtex Citation:
@inproceedings{Jaeger2023ICCV,
author = {Bernhard Jaeger and Kashyap Chitta and Andreas Geiger},
title = {Hidden Biases of End-to-End Driving Models},
booktitle = {International Conference on Computer Vision (ICCV)},
year = {2023}
}

TransFuser: Imitation with Transformer-Based Sensor Fusion for Autonomous Driving
K. Chitta, A. Prakash, B. Jaeger, Z. Yu, K. Renz and A. Geiger
Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023

Abstract: How should we integrate representations from complementary sensors for autonomous driving? Geometry-based fusion has shown promise for perception (e.g. object detection, motion forecasting). However, in the context of end-to-end driving, we find that imitation learning based on existing sensor fusion methods underperforms in complex driving scenarios with a high density of dynamic agents. Therefore, we propose TransFuser, a mechanism to integrate image and LiDAR representations using self-attention. Our approach uses transformer modules at multiple resolutions to fuse perspective view and bird's eye view feature maps. We experimentally validate its efficacy on a challenging new benchmark with long routes and dense traffic, as well as the official leaderboard of the CARLA urban driving simulator. At the time of submission, TransFuser outperforms all prior work on the CARLA leaderboard in terms of driving score by a large margin. Compared to geometry-based fusion, TransFuser reduces the average collisions per kilometer by 48%.

Latex Bibtex Citation:
@article{Chitta2022PAMI,
author = {Kashyap Chitta and Aditya Prakash and Bernhard Jaeger and Zehao Yu and Katrin Renz and Andreas Geiger},
title = {TransFuser: Imitation with Transformer-Based Sensor Fusion for Autonomous Driving},
journal = {Transactions on Pattern Analysis and Machine Intelligence (TPAMI)},
year = {2023}
}

Paper

Supplementary Material

End-to-end Autonomous Driving: Challenges and Frontiers
L. Chen, P. Wu, K. Chitta, B. Jaeger, A. Geiger and H. Li
Arxiv, 2023

Latex Bibtex Citation:
@article{Chen2023ARXIVa,
author = {Li Chen and Penghao Wu and Kashyap Chitta and Bernhard Jaeger and Andreas Geiger and Hongyang Li},
title = {End-to-end Autonomous Driving: Challenges and Frontiers},
journal = {Arxiv},
year = {2023}
}

Arxiv

Project Page