![]()
The object tracking benchmark consists of 21 training sequences and 29 test sequences. Despite the fact that we have labeled 8 different classes, only the classes 'Car' and 'Pedestrian' are evaluated in our benchmark, as only for those classes enough instances for a comprehensive evaluation have been labeled. The labeling process has been performed in two steps: First we hired a set of annotators, to label 3D bounding boxes as tracklets in point clouds. Since for a pedestrian tracklet, a single 3D bounding box tracklet (dimensions have been fixed) often fits badly, we additionally labeled the left/right boundaries of each object by making use of Mechanical Turk. We also collected labels of the object's occlusion state, and computed the object's truncation via backprojecting a car/pedestrian model into the image plane. We evaluate submitted results using the common metrics CLEAR MOT and MT/PT/ML. Since there is no single ranking criterion, we do not rank methods. Our development kit provides details about the data format as well as utility functions for reading and writing the label files.
- Download left color images of tracking data set (15 GB)
- Download right color images, if you want to use stereo information (15 GB)
- Download Velodyne point clouds, if you want to use laser information (35 GB)
- Download GPS/IMU data, if you want to use map information (8 MB)
- Download camera calibration matrices of tracking data set (1 MB)
- Download training labels of tracking data set (9 MB)
- Download L-SVM reference detections for training and test set (L-SVM), 108 MB)
- Download Regionlet reference detections for training and test set (Regionlets, 33 MB)
- Download tracking development kit (1 MB)
The goal in the object tracking task is to estimate object tracklets for the classes 'Car' and 'Pedestrian'. We evaluate 2D 0-based bounding boxes in each image. We like to encourage people to add a confidence measure for every particular frame for this track. For evaluation we only consider detections/objects larger than 25 pixel (height) in the image and do not count Vans as false positives for cars or Sitting Persons as wrong positives for Pedestrians due to their similarity in appearance. As evaluation criterion we follow the CLEARMOT [1] and Mostly-Tracked/Partly-Tracked/Mostly-Lost [2] metrics. We do not rank methods by a single criterion, but bold numbers indicate the best method for a particular metric. To make the methods comparable, the time for object detection is not included in the specified runtime.
[1] K. Bernardin, R. Stiefelhagen: Evaluating Multiple Object Tracking Performance: The CLEAR MOT Metrics. JIVP 2008.
[2] Y. Li, C. Huang, R. Nevatia: Learning to associate: HybridBoosted multi-target tracker for crowded scene. CVPR 2009.
Note 1: On 01.06.2015 we have fixed several bugs in the evaluation script and also in the calculation of the CLEAR MOT metrics. We have furthermore fixed some problems in the annotations of the training and test set (almost completely occluded objects are no longer counted as false negatives). Furthermore, from now on vans are not counted as false positives for cars and sitting persons not as false positives for pedestrians. We have also improved the devkit with new illustrations and re-calculated the results for all methods. Please download the devkit and the annotations/labels with the improved ground truth for training again if you have downloaded the files prior to 20.05.2015. Please consider reporting these new number for all future submissions. The last leaderboards right before the changes can be found here!
Note 2: On 27.11.2015 we have fixed a bug in the evaluation script which prevented van labels from being loaded and led to don't care areas being evaluated. Please download the devkit with the corrected evaluation script (if you want to evaluate on the training set) and consider reporting the new numbers for all future submissions. The leaderboard has been updated. The last leaderboards right before the changes can be found here!
Note 3: On 25.05.2016 we have fixed a bug in the evaluation script wrt. overcounting of ignored detections. Thanks to Adrien Gaidon for reporting this bug. Please download the devkit with the corrected evaluation script (if you want to evaluate on the training set) and consider reporting the new numbers for all future submissions. The leaderboard has been updated. The last leaderboards right before the changes can be found here!
Note 4: On 25.04.2017 a major update of the evaluation script includes the following changes: the counting of ignored detections was corrected; occlusion, truncation and minimum height handling was corrected; and the evaluation summary includes additional statistics. In detail, submitted detections are ignored (i.e. not considered) if they are classified as a "neighboring class" (i.e. 'Van' for 'Car' or 'Cyclist' for 'Pedestrian'), if they do not exceed the minimum height of 25px or if there is an overlap of 0.5 or greater with a 'Don't Care' area. In contrary, ground truth detections are ignored if the occlusion exceeds occlusion level 2, if the truncation exceeds the maximum truncation of 0 or if it belongs to a neighboring class (i.e. 'Van' for 'Car' or 'Cyclist' for 'Pedestrian'). We made sure that true positives, false positives, true negatives and false negatives are counted correctly.
Finally, the evaluation summary now includes information about the number of ignored detections. We like to thank the following researchers for detailed feedback: Adrien Gaidon, Jonathan D. Kuck and Jose M. Buenaposada. The last leaderboards right before the changes can be found here!
Stereo: Method uses left and right (stereo) images
Laser Points: Method uses point clouds from Velodyne laser scanner
GPS: Method uses GPS information
Online: Online method (frame-by-frame processing, no latency)
Additional training data: Use of additional data sources for training (see details)


