In this project, we investigate probabilistic generative model for multi-object traffic scene understanding from movable platforms which reason jointly about the 3D scene layout as well as the location and orientation of objects in the scene. In particular, we are interested in inferring the scene topology, geometry and traffic activities from short video sequences. Inspired by the impressive driving capabilities of humans, our models do not rely on GPS, lidar or map knowledge. Instead, they take advantage of a diverse set of visual cues in the form of vehicle tracklets, vanishing points, semantic scene labels, scene flow and occupancy grids. For each of these cues we propose likelihood functions that are integrated into a probabilistic generative model. We learn all model parameters from training data using contrastive divergence. Experiments conducted on videos of 113 representative intersections show that we are able to successfully infer the correct layout in a variety of very challenging scenarios. To evaluate the importance of each feature cue, experiments using different feature combinations are conducted. Furthermore, we show how by employing context we are able to improve over the state-of-the-art in terms of object detection and object orientation estimation in challenging and cluttered urban environments.
Here is a short overview of the relevant publications. For bibtex citations please scroll further down this page!
3D Traffic Scene Understanding from Movable Platforms (PAMI 2014): This is a summary paper of the NIPS and CVPR 2011 papers, combining monocular and stereo cues for intersection understanding and proposing to learn parameters via contrastive divergence. The code provided on this page has been used to generate the results in this paper.