The object tracking benchmark consists of 21 training sequences and 29 test sequences. Despite the fact that we have labeled 8 different classes, only the classes 'Car' and 'Pedestrian' are evaluated in our benchmark, as only for those classes enough instances for a comprehensive evaluation have been labeled. The labeling process has been performed in two steps: First we hired a set of annotators, to label 3D bounding boxes as tracklets in point clouds. Since for a pedestrian tracklet, a single 3D bounding box tracklet (dimensions have been fixed) often fits badly, we additionally labeled the left/right boundaries of each object by making use of Mechanical Turk. We also collected labels of the object's occlusion state, and computed the object's truncation via backprojecting a car/pedestrian model into the image plane. We evaluate submitted results using the metrics HOTA, CLEAR MOT and MT/PT/ML. We rank methods by HOTA. Our development kit and github evaluation code provides details about the data format as well as utility functions for reading and writing the label files.
The goal in the object tracking task is to estimate object tracklets for the classes 'Car' and 'Pedestrian'. We evaluate 2D 0-based bounding boxes in each image. We like to encourage people to add a confidence measure for every particular frame for this track. For evaluation we only consider detections/objects larger than 25 pixel (height) in the image and do not count Vans as false positives for cars or Sitting Persons as false positives for Pedestrians due to their similarity in appearance. As evaluation criterion we follow the HOTA metrics [1], while also evaluating the CLEARMOT [2] and Mostly-Tracked/Partly-Tracked/Mostly-Lost [3] metrics. Methods are ranked overall by HOTA, and bold numbers indicate the best method for each particular metric. To make the methods comparable, the time for object detection is not included in the specified runtime.
Note: On 25.02.2021 we have updated the evaluation to use the HOTA metrics as the main evaluation metrics, and to show results as plots to enable better comparison over various aspects of tracking. Furthermore, the definition of previously used evaluation metrics such as MOTA have been updated to match modern definitions (such as used in MOTChallenge) in order to unify metrics across benchmarks. Now ID-switches are counted for cases where the ID changes after a gap in either gt or predicted tracks, and when assigning IDs the algorithm has a preferences for extending current tracks (minimizing the number of ID-switches) if possible. We have re-calculated the results for all methods. Please download the new evaluation code. Please report these new numbers for all future submissions. The previous leaderboards before the changes will remain live for now and can be found here, but after some time they will stop being updated.
Important Policy Update: As more and more non-published work and re-implementations of existing work is submitted to KITTI, we have established a new policy: from now on, only submissions with significant novelty that are leading to a peer-reviewed paper in a conference or journal are allowed. Minor modifications of existing algorithms or student research projects are not allowed. Such work must be evaluated on a split of the training set. To ensure that our policy is adopted, new users must detail their status, describe their work and specify the targeted venue during registration. Furthermore, we will regularly delete all entries that are 6 months old but are still anonymous or do not have a paper associated with them. For conferences, 6 month is enough to determine if a paper has been accepted and to add the bibliography information. For longer review cycles, you need to resubmit your results.
Additional information used by the methods
Stereo: Method uses left and right (stereo) images
Laser Points: Method uses point clouds from Velodyne laser scanner
GPS: Method uses GPS information
Online: Online method (frame-by-frame processing, no latency)
Additional training data: Use of additional data sources for training (see details)
ETHZ Sequences: Inner City Sequences from Mobile Platforms.
Citation
When using this dataset in your research, we will be happy if you cite us:
@INPROCEEDINGS{Geiger2012CVPR,
author = {Andreas Geiger and Philip Lenz and Raquel Urtasun},
title = {Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite}, booktitle = {Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2012}
}
@ARTICLE{Luiten2020IJCV,
author = {Jonathon Luiten and Aljosa Osep and Patrick Dendorfer and Philip Torr and Andreas Geiger and Laura Leal-Taixe and Bastian Leibe},
title = {HOTA: A Higher Order Metric for Evaluating Multi-Object Tracking}, journal = {International Journal of Computer Vision (IJCV)},
year = {2020}
}