Object Tracking Evaluation 2012


The object tracking benchmark consists of 21 training sequences and 29 test sequences. Despite the fact that we have labeled 8 different classes, only the classes 'Car' and 'Pedestrian' are evaluated in our benchmark, as only for those classes enough instances for a comprehensive evaluation have been labeled. The labeling process has been performed in two steps: First we hired a set of annotators, to label 3D bounding boxes as tracklets in point clouds. Since for a pedestrian tracklet, a single 3D bounding box tracklet (dimensions have been fixed) often fits badly, we additionally labeled the left/right boundaries of each object by making use of Mechanical Turk. We also collected labels of the object's occlusion state, and computed the object's truncation via backprojecting a car/pedestrian model into the image plane. We evaluate submitted results using the common metrics CLEAR MOT and MT/PT/ML. Since there is no single ranking criterion, we do not rank methods. Out development kit provides details about the data format as well as utility functions for reading and writing the label files.

The goal in the object tracking task is to estimate object tracklets for the classes 'Car' and 'Pedestrian'. We evaluate 2D 0-based bounding boxes in each image. We like to encourage people to add a confidence measure for every particular frame for this track. For evaluation we only consider detections/objects larger than 25 pixel (height) in the image and do not count Vans as false positives for cars or Sitting Persons as wrong positives for Pedestrians due to their similarity in appearance. As evaluation criterion we follow the CLEARMOT [1] and Mostly-Tracked/Partly-Tracked/Mostly-Lost [2] metrics. We do not rank methods by a single criterion, but bold numbers indicate the best method for a particular metric. To make the methods comparable, the time for object detection is not included in the specified runtime.
Note: On 01.06.2015 we have fixed several bugs in the evaluation script and also in the calculation of the CLEAR MOT metrics. We have furthermore fixed some problems in the annotations of the training and test set (almost completely occluded objects are no longer counted as false negatives). Furthermore, from now on vans are not counted as false positives for cars and sitting persons not as false positives for pedestrians. We have also improved the devkit with new illustrations and re-calculated the results for all methods. Please download the devkit and the annotations/labels with the improved ground truth for training again if you have downloaded the files prior to 20.05.2015. Please consider reporting these new number for all future submissions. The last leaderboards right before the changes can be found here!
Second Note: On 27.11.2015 we have fixed a bug in the evaluation script which prevented van labels from being loaded and led to don't care areas being evaluated. Please download the devkit with the corrected evaluation script (if you want to evaluate on the training set) and consider reporting the new numbers for all future submissions. The leaderboard has been updated. The last leaderboards right before the changes can be found here!
[1] K. Bernardin, R. Stiefelhagen: Evaluating Multiple Object Tracking Performance: The CLEAR MOT Metrics. JIVP 2008.
[2] Y. Li, C. Huang, R. Nevatia: Learning to associate: HybridBoosted multi-target tracker for crowded scene. CVPR 2009.

Additional information used by the methods
  • Stereo: Method uses left and right (stereo) images
  • Laser Points: Method uses point clouds from Velodyne laser scanner
  • GPS: Method uses GPS information
  • Online: Online method (frame-by-frame processing, no latency)
  • Additional training data: Use of additional data sources for training (see details)

CAR


Method Setting Code MOTA MOTP MT ML IDS FRAG Runtime Environment
DP_MCF code 37.90 % 78.41 % 11.13 % 39.18 % 2738 3239 0.01 s 1 core @ 2.5 Ghz (Matlab)
H. Pirsiavash, D. Ramanan and C. Fowlkes: Globally-Optimal Greedy Algorithms for Tracking a Variable Number of Objects. IEEE conference on Computer Vision and Pattern Recognition (CVPR) 2011.
HM
This is an online method (no batch processing).
43.40 % 78.34 % 7.77 % 41.92 % 12 576 0.01 s 1 core @ 2.5 Ghz (Python)
A. Geiger: Probabilistic Models for 3D Urban Scene Understanding from Movable Platforms. 2013.
MCF 45.45 % 78.25 % 10.98 % 39.94 % 23 589 0.01 s 1 core @ 2.5 Ghz (Python + C/C++)
L. Zhang, Y. Li and R. Nevatia: Global data association for multi-object tracking using network flows.. CVPR .
TBD code 54.24 % 78.35 % 13.87 % 34.30 % 31 535 10 s 1 core @ 2.5 Ghz (Matlab + C/C++)
A. Geiger, M. Lauer, C. Wojek, C. Stiller and R. Urtasun: 3D Traffic Scene Understanding from Movable Platforms. Pattern Analysis and Machine Intelligence (PAMI) 2014.
H. Zhang, A. Geiger and R. Urtasun: Understanding High-Level Semantics by Modeling Traffic Patterns. International Conference on Computer Vision (ICCV) 2013.
SSP code 56.59 % 77.64 % 21.34 % 27.13 % 7 714 0.6s 1 core @ 2.7 Ghz (Python)
P. Lenz, A. Geiger and R. Urtasun: FollowMe: Efficient Online Min-Cost Flow Tracking with Bounded Memory and Computation. International Conference on Computer Vision (ICCV) 2015.
mbodSSP
This is an online method (no batch processing).
code 54.34 % 77.52 % 15.09 % 29.73 % 0 704 0.01 s 1 core @ 2.7 Ghz (Python)
P. Lenz, A. Geiger and R. Urtasun: FollowMe: Efficient Online Min-Cost Flow Tracking with Bounded Memory and Computation. International Conference on Computer Vision (ICCV) 2015.
DCO code 37.10 % 74.36 % 10.67 % 33.84 % 223 622 0.03 s 1 core @ >3.5 Ghz (Matlab + C/C++)
A. Andriyenko, K. Schindler and S. Roth: Discrete-Continuous Optimization for Multi-Target Tracking. CVPR 2012.
CEM code 50.17 % 77.11 % 14.48 % 33.99 % 125 398 0.09 s 1 core @ >3.5 Ghz (Matlab + C/C++)
A. Milan, S. Roth and K. Schindler: Continuous Energy Minimization for Multitarget Tracking. IEEE TPAMI 2014.
NOMT 65.24 % 78.17 % 31.55 % 27.90 % 13 154 0.09 s 16 core @ 2.5 Ghz (C++)
W. Choi: Near-Online Multi-target Tracking with Aggregated Local Flow Descriptor . ICCV 2015.
NOMT-HM
This is an online method (no batch processing).
60.18 % 78.65 % 26.98 % 30.34 % 28 250 0.09 s 8 cores @ 2.5 Ghz (Matlab + C/C++)
W. Choi: Near-Online Multi-target Tracking with Aggregated Local Flow Descriptor . ICCV 2015.
SSP* code 72.12 % 78.55 % 40.70 % 8.99 % 191 966 0.6 s 1 core @ 2.7 Ghz (Python)
P. Lenz, A. Geiger and R. Urtasun: FollowMe: Efficient Online Min-Cost Flow Tracking with Bounded Memory and Computation. International Conference on Computer Vision (ICCV) 2015.
mbodSSP*
This is an online method (no batch processing).
code 72.18 % 78.75 % 34.45 % 10.52 % 116 884 0.01 s 1 core @ 2.7 Ghz (Python)
P. Lenz, A. Geiger and R. Urtasun: FollowMe: Efficient Online Min-Cost Flow Tracking with Bounded Memory and Computation. International Conference on Computer Vision (ICCV) 2015.
NOMT-HM*
This is an online method (no batch processing).
74.84 % 80.02 % 38.72 % 15.24 % 109 371 0.09 s 8 cores @ 2.5 Ghz (Matlab + C/C++)
W. Choi: Near-Online Multi-target Tracking with Aggregated Local Flow Descriptor . ICCV 2015.
SCEA*
This is an online method (no batch processing).
75.23 % 79.39 % 38.72 % 12.65 % 106 466 0.06 s 1 core @ 4.0 Ghz (Matlab + C/C++)
J. Yoon, C. Lee, M. Yang and K. Yoon: Online Multi-object Tracking via Structural Constraint Event Aggregation. IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2016.
SCEA
This is an online method (no batch processing).
56.31 % 78.84 % 19.97 % 29.27 % 17 468 0.05 s 1 core @ 4.0 Ghz (Matlab + C/C++)
J. Yoon, C. Lee, M. Yang and K. Yoon: Online Multi-object Tracking via Structural Constraint Event Aggregation. IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2016.
NOMT* 77.83 % 79.46 % 43.14 % 14.63 % 36 225 0.09 s 16 cores @ 2.5 Ghz (C++)
W. Choi: Near-Online Multi-target Tracking with Aggregated Local Flow Descriptor . ICCV 2015.
RMOT
This is an online method (no batch processing).
51.54 % 75.18 % 15.24 % 33.54 % 51 382 0.01 s 1 core @ 3.5 Ghz (Matlab)
J. Yoon, M. Yang, J. Lim and K. Yoon: Bayesian Multi-Object Tracking Using Motion Context from Multiple Objects. IEEE Winter Conference on Applications of Computer Vision (WACV) 2015.
DCO_X* code 67.30 % 78.85 % 26.22 % 15.40 % 323 984 0.9 s 1 core @ >3.5 Ghz (Matlab + C/C++)
A. Milan, K. Schindler and S. Roth: Detection- and Trajectory-Level Exclusion in Multiple Object Tracking. CVPR 2013.
RMOT*
This is an online method (no batch processing).
65.29 % 75.42 % 26.83 % 11.43 % 215 742 0.02 s 1 core @ 3.5 Ghz (Matlab)
J. Yoon, M. Yang, J. Lim and K. Yoon: Bayesian Multi-Object Tracking Using Motion Context from Multiple Objects. IEEE Winter Conference on Applications of Computer Vision (WACV) 2015.
ODAMOT
This is an online method (no batch processing).
58.84 % 75.45 % 16.77 % 18.90 % 403 1298 1 s 1 core @ 2.5 Ghz (Python)
A. Gaidon and E. Vig: Online Domain Adaptation for Multi-Object Tracking. British Machine Vision Conference (BMVC) 2015.
LP_SSVM 60.47 % 76.93 % 27.74 % 23.78 % 16 430 0.06 s 1 core @ 2.5 Ghz (Matlab + C/C++)
S. Wang and C. Fowlkes: Learning Optimal Parameters For Multi-target Tracking. British Machine Vision Conference (BMVC) 2015.
LP_SSVM* 77.20 % 77.80 % 43.14 % 8.99 % 63 558 0.01 s 1 core @ 2.5 Ghz (C/C++)
S. Wang and C. Fowlkes: Learning Optimal Parameters For Multi-target Tracking. British Machine Vision Conference (BMVC) 2015.
FMMOVT 31.72 % 77.68 % 11.74 % 36.43 % 514 940 0.05 s 1 core @ 2.5 Ghz (C/C++)
F. Alencar, C. Massera, D. Ridel and D. Wolf: Fast Metric Multi-Object Vehicle Tracking for Dynamical Environment Comprehension. Latin American Robotics Symposium (LARS), 2015 2015.
DP_MCF_RCNN 57.05 % 81.32 % 15.40 % 26.52 % 153 843 0.01 s 1 core @ 2.5 Ghz (Python)
Anonymous submission
RCMOT_COR*
This is an online method (no batch processing).
75.15 % 77.78 % 36.59 % 16.31 % 124 401 0.83 s 1 core @ 2.5 Ghz (Matlab)
Anonymous submission
MDP
This is an online method (no batch processing).
code 76.24 % 82.10 % 37.80 % 14.18 % 135 401 0.9 s 8 cores @ 3.5 Ghz (Matlab + C/C++)
Y. Xiang, A. Alahi and S. Savarese: Learning to Track: Online Multi- Object Tracking by Decision Making. International Conference on Computer Vision (ICCV) 2015.
TDCS
This is an online method (no batch processing).
54.68 % 75.20 % 13.72 % 24.85 % 126 991 0.06 s 1 core @ 2.0 Ghz (Matlab + C/C++)
Anonymous submission
FMMOVT V2
This is an online method (no batch processing).
39.23 % 80.05 % 12.35 % 33.23 % 588 1132 0.05 s 1 core @ 2.5 Ghz (Python)
Anonymous submission
CNN-OCC
This is an online method (no batch processing).
76.63 % 78.36 % 40.70 % 14.02 % 72 408 1.1 s 1 core @ 3.6 Ghz (MATLAB)
Anonymous submission
This table as LaTeX

PEDESTRIAN


Method Setting Code MOTA MOTP MT ML IDS FRAG Runtime Environment
CEM code 27.44 % 68.48 % 7.90 % 52.92 % 96 610 0.09 s 1 core @ >3.5 Ghz (Matlab + C/C++)
A. Milan, S. Roth and K. Schindler: Continuous Energy Minimization for Multitarget Tracking. IEEE TPAMI 2014.
NOMT 36.89 % 67.75 % 14.43 % 42.61 % 34 800 0.09 s 16 core @ 2.5 Ghz (C++)
W. Choi: Near-Online Multi-target Tracking with Aggregated Local Flow Descriptor . ICCV 2015.
NOMT-HM
This is an online method (no batch processing).
27.49 % 67.99 % 11.34 % 51.55 % 73 743 0.09 s 8 cores @ 2.5 Ghz (Matlab + C/C++)
W. Choi: Near-Online Multi-target Tracking with Aggregated Local Flow Descriptor . ICCV 2015.
NOMT-HM*
This is an online method (no batch processing).
39.31 % 71.14 % 17.18 % 42.27 % 186 870 0.09 s 8 cores @ 2.5 Ghz (Matlab + C/C++)
W. Choi: Near-Online Multi-target Tracking with Aggregated Local Flow Descriptor . ICCV 2015.
SCEA*
This is an online method (no batch processing).
43.88 % 71.86 % 14.09 % 43.30 % 56 649 0.06 s 1 core @ 4.0 Ghz (Matlab + C/C++)
J. Yoon, C. Lee, M. Yang and K. Yoon: Online Multi-object Tracking via Structural Constraint Event Aggregation. IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2016.
SCEA
This is an online method (no batch processing).
33.09 % 68.45 % 8.59 % 47.42 % 16 724 0.05 s 1 core @ 4.0 Ghz (Matlab + C/C++)
J. Yoon, C. Lee, M. Yang and K. Yoon: Online Multi-object Tracking via Structural Constraint Event Aggregation. IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2016.
NOMT* 46.64 % 71.45 % 23.37 % 34.71 % 63 672 0.09 s 16 cores @ 2.5 Ghz (C++)
W. Choi: Near-Online Multi-target Tracking with Aggregated Local Flow Descriptor . ICCV 2015.
RMOT
This is an online method (no batch processing).
34.49 % 68.06 % 9.97 % 47.42 % 81 692 0.01 s 1 core @ 3.5 Ghz (Matlab)
J. Yoon, M. Yang, J. Lim and K. Yoon: Bayesian Multi-Object Tracking Using Motion Context from Multiple Objects. IEEE Winter Conference on Applications of Computer Vision (WACV) 2015.
RMOT*
This is an online method (no batch processing).
43.73 % 71.02 % 16.84 % 41.24 % 156 760 0.02 s 1 core @ 3.5 Ghz (Matlab)
J. Yoon, M. Yang, J. Lim and K. Yoon: Bayesian Multi-Object Tracking Using Motion Context from Multiple Objects. IEEE Winter Conference on Applications of Computer Vision (WACV) 2015.
LP_SSVM 33.30 % 67.38 % 9.62 % 45.02 % 72 825 0.06 s 1 core @ 2.5 Ghz (Matlab + C/C++)
S. Wang and C. Fowlkes: Learning Optimal Parameters For Multi-target Tracking. British Machine Vision Conference (BMVC) 2015.
LP_SSVM* 43.76 % 70.48 % 16.84 % 34.71 % 73 814 0.01 s 1 core @ 2.5 Ghz (C/C++)
S. Wang and C. Fowlkes: Learning Optimal Parameters For Multi-target Tracking. British Machine Vision Conference (BMVC) 2015.
RCMOT_COR*
This is an online method (no batch processing).
42.75 % 70.55 % 17.18 % 45.70 % 147 830 0.83 s 1 core @ 2.5 Ghz (Matlab)
Anonymous submission
MDP
This is an online method (no batch processing).
code 47.18 % 70.36 % 20.27 % 28.18 % 88 830 0.9 s 8 cores @ 3.5 Ghz (Matlab + C/C++)
Y. Xiang, A. Alahi and S. Savarese: Learning to Track: Online Multi- Object Tracking by Decision Making. International Conference on Computer Vision (ICCV) 2015.
CNN-OCC
This is an online method (no batch processing).
44.52 % 68.38 % 21.99 % 37.46 % 213 987 1.1 s 1 core @ 3.6 Ghz (MATLAB)
Anonymous submission
This table as LaTeX

Related Datasets

  • TUD Datasets: "TUD Multiview Pedestrians" and "TUD Stadmitte" Datasets.
  • PETS 2009: The Datasets for the "Performance Evaluation of Tracking and Surveillance"" Workshop.
  • EPFL Terrace: Multi-camera pedestrian videos.
  • ETHZ Sequences: Inner City Sequences from Mobile Platforms.

Citation

When using this dataset in your research, we will be happy if you cite us:
@INPROCEEDINGS{Geiger2012CVPR,
  author = {Andreas Geiger and Philip Lenz and Raquel Urtasun},
  title = {Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite},
  booktitle = {Conference on Computer Vision and Pattern Recognition (CVPR)},
  year = {2012}
}



eXTReMe Tracker