Object Tracking Evaluation 2012


The object tracking benchmark consists of 21 training sequences and 29 test sequences. Despite the fact that we have labeled 8 different classes, only the classes 'Car' and 'Pedestrian' are evaluated in our benchmark, as only for those classes enough instances for a comprehensive evaluation have been labeled. The labeling process has been performed in two steps: First we hired a set of annotators, to label 3D bounding boxes as tracklets in point clouds. Since for a pedestrian tracklet, a single 3D bounding box tracklet (dimensions have been fixed) often fits badly, we additionally labeled the left/right boundaries of each object by making use of Mechanical Turk. We also collected labels of the object's occlusion state, and computed the object's truncation via backprojecting a car/pedestrian model into the image plane. We evaluate submitted results using the common metrics CLEAR MOT and MT/PT/ML. Since there is no single ranking criterion, we do not rank methods. Out development kit provides details about the data format as well as utility functions for reading and writing the label files.

The goal in the object tracking task is to estimate object tracklets for the classes 'Car' and 'Pedestrian'. We evaluate 2D 0-based bounding boxes in each image. We like to encourage people to add a confidence measure for every particular frame for this track. For evaluation we only consider detections/objects larger than 25 pixel (height) in the image and do not count Vans as false positives for cars or Sitting Persons as wrong positives for Pedestrians due to their similarity in appearance. As evaluation criterion we follow the CLEARMOT [1] and Mostly-Tracked/Partly-Tracked/Mostly-Lost [2] metrics. We do not rank methods by a single criterion, but bold numbers indicate the best method for a particular metric. To make the methods comparable, the time for object detection is not included in the specified runtime.

[1] K. Bernardin, R. Stiefelhagen: Evaluating Multiple Object Tracking Performance: The CLEAR MOT Metrics. JIVP 2008.
[2] Y. Li, C. Huang, R. Nevatia: Learning to associate: HybridBoosted multi-target tracker for crowded scene. CVPR 2009.

Note 1: On 01.06.2015 we have fixed several bugs in the evaluation script and also in the calculation of the CLEAR MOT metrics. We have furthermore fixed some problems in the annotations of the training and test set (almost completely occluded objects are no longer counted as false negatives). Furthermore, from now on vans are not counted as false positives for cars and sitting persons not as false positives for pedestrians. We have also improved the devkit with new illustrations and re-calculated the results for all methods. Please download the devkit and the annotations/labels with the improved ground truth for training again if you have downloaded the files prior to 20.05.2015. Please consider reporting these new number for all future submissions. The last leaderboards right before the changes can be found here!

Note 2: On 27.11.2015 we have fixed a bug in the evaluation script which prevented van labels from being loaded and led to don't care areas being evaluated. Please download the devkit with the corrected evaluation script (if you want to evaluate on the training set) and consider reporting the new numbers for all future submissions. The leaderboard has been updated. The last leaderboards right before the changes can be found here!

Note 3: On 25.05.2016 we have fixed a bug in the evaluation script wrt. overcounting of ignored detections. Thanks to Adrien Gaidon for reporting this bug. Please download the devkit with the corrected evaluation script (if you want to evaluate on the training set) and consider reporting the new numbers for all future submissions. The leaderboard has been updated. The last leaderboards right before the changes can be found here!

Additional information used by the methods
  • Stereo: Method uses left and right (stereo) images
  • Laser Points: Method uses point clouds from Velodyne laser scanner
  • GPS: Method uses GPS information
  • Online: Online method (frame-by-frame processing, no latency)
  • Additional training data: Use of additional data sources for training (see details)

CAR


Method Setting Code MOTA MOTP MT ML IDS FRAG Runtime Environment
1 IMMDP
This is an online method (no batch processing).
75.37 % 82.74 % 60.37 % 10.82 % 178 382 0.19 s 4 cores @ >3.5 Ghz (Matlab + C/C++)
2 TuSimple
This is an online method (no batch processing).
73.20 % 83.97 % 71.65 % 7.01 % 300 515 0.6 s 1 core @ 2.5 Ghz (Matlab + C/C++)
3 wan
This is an online method (no batch processing).
72.99 % 82.83 % 50.61 % 12.20 % 24 248 0.1 s 1 core @ 2.5 Ghz (C/C++)
4 MCMOT-CPD 72.11 % 82.13 % 52.13 % 11.43 % 233 547 0.01 s 1 core @ 3.5 Ghz (Python)
B. Lee, E. Erdenee, S. Jin, M. Nam, Y. Jung and P. Rhee: Multi-class Multi-object Tracking Using Changing Point Detection. ECCVWORK 2016.
5 RBPF 71.44 % 82.25 % 63.87 % 5.49 % 284 673 1 s 1 core @ 2.5 Ghz (Python)
6 DuEye
This is an online method (no batch processing).
70.15 % 83.52 % 60.98 % 5.49 % 402 1043 0.15 s 1 core @ >3.5 Ghz (C/C++)
7 NOMT* 69.73 % 79.46 % 56.25 % 12.96 % 36 225 0.09 s 16 cores @ 2.5 Ghz (C++)
W. Choi: Near-Online Multi-target Tracking with Aggregated Local Flow Descriptor . ICCV 2015.
8 CCF-MOT
This is an online method (no batch processing).
69.46 % 78.36 % 52.29 % 13.11 % 72 408 1.1 s 1 core @ 3.6 Ghz (MATLAB)
9 MDP
This is an online method (no batch processing).
code 69.35 % 82.10 % 51.37 % 13.11 % 135 401 0.9 s 8 cores @ 3.5 Ghz (Matlab + C/C++)
Y. Xiang, A. Alahi and S. Savarese: Learning to Track: Online Multi- Object Tracking by Decision Making. International Conference on Computer Vision (ICCV) 2015.
Y. Xiang, W. Choi, Y. Lin and S. Savarese: Subcategory-aware Convolutional Neural Networks for Object Proposals and Detection. arXiv:1604.04693 2016.
10 DSM 68.66 % 84.39 % 41.46 % 15.40 % 29 418 0.1 s GPU @ 1.0 Ghz (Python)
11 NOMT-HM*
This is an online method (no batch processing).
67.92 % 80.02 % 49.24 % 13.11 % 109 371 0.09 s 8 cores @ 2.5 Ghz (Matlab + C/C++)
W. Choi: Near-Online Multi-target Tracking with Aggregated Local Flow Descriptor . ICCV 2015.
12 SLP* 67.36 % 78.79 % 53.81 % 9.45 % 65 574 0.1 s 1 core @ 2.5 Ghz (Python + C/C++)
13 SCEA*
This is an online method (no batch processing).
67.11 % 79.39 % 52.13 % 10.98 % 106 466 0.06 s 1 core @ 4.0 Ghz (Matlab + C/C++)
J. Yoon, C. Lee, M. Yang and K. Yoon: Online Multi-object Tracking via Structural Constraint Event Aggregation. IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2016.
14 LP-SSVM* 66.35 % 77.80 % 55.95 % 8.23 % 63 558 0.02 s 1 core @ 2.5 Ghz (Matlab + C/C++)
S. Wang and C. Fowlkes: Learning Optimal Parameters for Multi-target Tracking with Contextual Interactions. International Journal of Computer Vision 2016.
15 mbodSSP*
This is an online method (no batch processing).
code 62.64 % 78.75 % 48.02 % 8.69 % 116 884 0.01 s 1 core @ 2.7 Ghz (Python)
P. Lenz, A. Geiger and R. Urtasun: FollowMe: Efficient Online Min-Cost Flow Tracking with Bounded Memory and Computation. International Conference on Computer Vision (ICCV) 2015.
16 SSP* code 60.84 % 78.55 % 53.81 % 7.93 % 191 966 0.6 s 1 core @ 2.7 Ghz (Python)
P. Lenz, A. Geiger and R. Urtasun: FollowMe: Efficient Online Min-Cost Flow Tracking with Bounded Memory and Computation. International Conference on Computer Vision (ICCV) 2015.
17 NOMT 55.87 % 78.17 % 39.94 % 25.46 % 13 154 0.09 s 16 core @ 2.5 Ghz (C++)
W. Choi: Near-Online Multi-target Tracking with Aggregated Local Flow Descriptor . ICCV 2015.
18 DCO-X* code 55.49 % 78.85 % 36.74 % 14.02 % 323 984 0.9 s 1 core @ >3.5 Ghz (Matlab + C/C++)
A. Milan, K. Schindler and S. Roth: Detection- and Trajectory-Level Exclusion in Multiple Object Tracking. CVPR 2013.
19 ODAMOT
This is an online method (no batch processing).
54.87 % 75.45 % 26.37 % 15.09 % 403 1298 1 s 1 core @ 2.5 Ghz (Python)
A. Gaidon and E. Vig: Online Domain Adaptation for Multi-Object Tracking. British Machine Vision Conference (BMVC) 2015.
20 NOMT-HM
This is an online method (no batch processing).
53.03 % 78.65 % 33.23 % 27.13 % 28 250 0.09 s 8 cores @ 2.5 Ghz (Matlab + C/C++)
W. Choi: Near-Online Multi-target Tracking with Aggregated Local Flow Descriptor . ICCV 2015.
21 RMOT*
This is an online method (no batch processing).
53.03 % 75.42 % 39.48 % 10.06 % 215 742 0.02 s 1 core @ 3.5 Ghz (Matlab)
J. Yoon, M. Yang, J. Lim and K. Yoon: Bayesian Multi-Object Tracking Using Motion Context from Multiple Objects. IEEE Winter Conference on Applications of Computer Vision (WACV) 2015.
22 LP-SSVM 51.80 % 76.93 % 35.06 % 21.49 % 16 430 0.05 s 1 core @ 2.5 Ghz (Matlab + C/C++)
S. Wang and C. Fowlkes: Learning Optimal Parameters for Multi-target Tracking with Contextual Interactions. International Journal of Computer Vision 2016.
23 SCEA
This is an online method (no batch processing).
51.30 % 78.84 % 26.22 % 26.22 % 17 468 0.05 s 1 core @ 4.0 Ghz (Matlab + C/C++)
J. Yoon, C. Lee, M. Yang and K. Yoon: Online Multi-object Tracking via Structural Constraint Event Aggregation. IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2016.
24 SSP code 50.42 % 77.64 % 28.66 % 24.09 % 7 714 0.6s 1 core @ 2.7 Ghz (Python)
P. Lenz, A. Geiger and R. Urtasun: FollowMe: Efficient Online Min-Cost Flow Tracking with Bounded Memory and Computation. International Conference on Computer Vision (ICCV) 2015.
25 DBHM*
This is an online method (no batch processing).
49.63 % 77.72 % 52.13 % 7.47 % 1632 2144 0.15 s 4 cores @ 2.5 Ghz (C/C++)
26 TBD code 49.52 % 78.35 % 20.27 % 32.16 % 31 535 10 s 1 core @ 2.5 Ghz (Matlab + C/C++)
A. Geiger, M. Lauer, C. Wojek, C. Stiller and R. Urtasun: 3D Traffic Scene Understanding from Movable Platforms. Pattern Analysis and Machine Intelligence (PAMI) 2014.
H. Zhang, A. Geiger and R. Urtasun: Understanding High-Level Semantics by Modeling Traffic Patterns. International Conference on Computer Vision (ICCV) 2013.
27 TDCS
This is an online method (no batch processing).
49.25 % 75.20 % 23.02 % 21.34 % 126 991 0.06 s 1 core @ 2.0 Ghz (Matlab + C/C++)
28 mbodSSP
This is an online method (no batch processing).
code 48.00 % 77.52 % 22.10 % 27.44 % 0 704 0.01 s 1 core @ 2.7 Ghz (Python)
P. Lenz, A. Geiger and R. Urtasun: FollowMe: Efficient Online Min-Cost Flow Tracking with Bounded Memory and Computation. International Conference on Computer Vision (ICCV) 2015.
29 RMOT
This is an online method (no batch processing).
46.63 % 75.18 % 20.43 % 31.40 % 51 382 0.01 s 1 core @ 3.5 Ghz (Matlab)
J. Yoon, M. Yang, J. Lim and K. Yoon: Bayesian Multi-Object Tracking Using Motion Context from Multiple Objects. IEEE Winter Conference on Applications of Computer Vision (WACV) 2015.
30 CEM code 44.31 % 77.11 % 19.51 % 31.40 % 125 398 0.09 s 1 core @ >3.5 Ghz (Matlab + C/C++)
A. Milan, S. Roth and K. Schindler: Continuous Energy Minimization for Multitarget Tracking. IEEE TPAMI 2014.
31 MCF 43.17 % 78.25 % 14.33 % 37.04 % 23 589 0.01 s 1 core @ 2.5 Ghz (Python + C/C++)
L. Zhang, Y. Li and R. Nevatia: Global data association for multi-object tracking using network flows.. CVPR .
32 HM
This is an online method (no batch processing).
41.47 % 78.34 % 11.59 % 39.33 % 12 576 0.01 s 1 core @ 2.5 Ghz (Python)
A. Geiger: Probabilistic Models for 3D Urban Scene Understanding from Movable Platforms. 2013.
33 FMMOVT V2
This is an online method (no batch processing).
37.46 % 80.05 % 20.27 % 30.79 % 588 1132 0.05 s 1 core @ 2.5 Ghz (Python)
34 DP-MCF code 35.72 % 78.41 % 16.92 % 35.67 % 2738 3239 0.01 s 1 core @ 2.5 Ghz (Matlab)
H. Pirsiavash, D. Ramanan and C. Fowlkes: Globally-Optimal Greedy Algorithms for Tracking a Variable Number of Objects. IEEE conference on Computer Vision and Pattern Recognition (CVPR) 2011.
35 FMMOVT 29.11 % 77.68 % 21.19 % 34.60 % 514 940 0.05 s 1 core @ 2.5 Ghz (C/C++)
F. Alencar, C. Massera, D. Ridel and D. Wolf: Fast Metric Multi-Object Vehicle Tracking for Dynamical Environment Comprehension. Latin American Robotics Symposium (LARS), 2015 2015.
36 DCO code 28.72 % 74.36 % 15.24 % 30.79 % 223 622 0.03 s 1 core @ >3.5 Ghz (Matlab + C/C++)
A. Andriyenko, K. Schindler and S. Roth: Discrete-Continuous Optimization for Multi-Target Tracking. CVPR 2012.
Table as LaTeX | Only published Methods

PEDESTRIAN


Method Setting Code MOTA MOTP MT ML IDS FRAG Runtime Environment
1 TuSimple
This is an online method (no batch processing).
47.96 % 71.93 % 30.93 % 24.05 % 139 829 0.6 s 1 core @ 2.5 Ghz (Matlab + C/C++)
2 MCMOT-CPD 40.50 % 72.44 % 20.62 % 34.36 % 144 775 0.01 s 1 core @ 3.5 Ghz (Python)
B. Lee, E. Erdenee, S. Jin, M. Nam, Y. Jung and P. Rhee: Multi-class Multi-object Tracking Using Changing Point Detection. ECCVWORK 2016.
3 SCEA*
This is an online method (no batch processing).
39.34 % 71.86 % 16.15 % 43.30 % 56 649 0.06 s 1 core @ 4.0 Ghz (Matlab + C/C++)
J. Yoon, C. Lee, M. Yang and K. Yoon: Online Multi-object Tracking via Structural Constraint Event Aggregation. IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2016.
4 NOMT* 38.98 % 71.45 % 26.12 % 34.02 % 63 672 0.09 s 16 cores @ 2.5 Ghz (C++)
W. Choi: Near-Online Multi-target Tracking with Aggregated Local Flow Descriptor . ICCV 2015.
5 RMOT*
This is an online method (no batch processing).
36.42 % 71.02 % 19.59 % 41.24 % 156 760 0.02 s 1 core @ 3.5 Ghz (Matlab)
J. Yoon, M. Yang, J. Lim and K. Yoon: Bayesian Multi-Object Tracking Using Motion Context from Multiple Objects. IEEE Winter Conference on Applications of Computer Vision (WACV) 2015.
6 MDP
This is an online method (no batch processing).
code 35.91 % 70.36 % 23.02 % 27.84 % 88 830 0.9 s 8 cores @ 3.5 Ghz (Matlab + C/C++)
Y. Xiang, A. Alahi and S. Savarese: Learning to Track: Online Multi- Object Tracking by Decision Making. International Conference on Computer Vision (ICCV) 2015.
Y. Xiang, W. Choi, Y. Lin and S. Savarese: Subcategory-aware Convolutional Neural Networks for Object Proposals and Detection. arXiv:1604.04693 2016.
7 CCF-MOT
This is an online method (no batch processing).
35.87 % 68.38 % 24.40 % 37.11 % 213 987 1.1 s 1 core @ 3.6 Ghz (MATLAB)
8 LP-SSVM* 34.97 % 70.48 % 20.27 % 34.36 % 73 814 0.02 s 1 core @ 2.5 Ghz (Matlab + C/C++)
S. Wang and C. Fowlkes: Learning Optimal Parameters for Multi-target Tracking with Contextual Interactions. International Journal of Computer Vision 2016.
9 NOMT-HM*
This is an online method (no batch processing).
31.43 % 71.14 % 21.31 % 41.92 % 186 870 0.09 s 8 cores @ 2.5 Ghz (Matlab + C/C++)
W. Choi: Near-Online Multi-target Tracking with Aggregated Local Flow Descriptor . ICCV 2015.
10 SCEA
This is an online method (no batch processing).
26.02 % 68.45 % 9.62 % 47.08 % 16 724 0.05 s 1 core @ 4.0 Ghz (Matlab + C/C++)
J. Yoon, C. Lee, M. Yang and K. Yoon: Online Multi-object Tracking via Structural Constraint Event Aggregation. IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2016.
11 NOMT 25.55 % 67.75 % 17.53 % 42.61 % 34 800 0.09 s 16 core @ 2.5 Ghz (C++)
W. Choi: Near-Online Multi-target Tracking with Aggregated Local Flow Descriptor . ICCV 2015.
12 RMOT
This is an online method (no batch processing).
25.47 % 68.06 % 13.06 % 47.42 % 81 692 0.01 s 1 core @ 3.5 Ghz (Matlab)
J. Yoon, M. Yang, J. Lim and K. Yoon: Bayesian Multi-Object Tracking Using Motion Context from Multiple Objects. IEEE Winter Conference on Applications of Computer Vision (WACV) 2015.
13 LP-SSVM 23.37 % 67.38 % 12.03 % 45.02 % 72 825 0.05 s 1 core @ 2.5 Ghz (Matlab + C/C++)
S. Wang and C. Fowlkes: Learning Optimal Parameters for Multi-target Tracking with Contextual Interactions. International Journal of Computer Vision 2016.
14 CEM code 18.18 % 68.48 % 8.93 % 51.89 % 96 610 0.09 s 1 core @ >3.5 Ghz (Matlab + C/C++)
A. Milan, S. Roth and K. Schindler: Continuous Energy Minimization for Multitarget Tracking. IEEE TPAMI 2014.
15 NOMT-HM
This is an online method (no batch processing).
17.26 % 67.99 % 14.09 % 50.52 % 73 743 0.09 s 8 cores @ 2.5 Ghz (Matlab + C/C++)
W. Choi: Near-Online Multi-target Tracking with Aggregated Local Flow Descriptor . ICCV 2015.
Table as LaTeX | Only published Methods

Related Datasets

  • TUD Datasets: "TUD Multiview Pedestrians" and "TUD Stadmitte" Datasets.
  • PETS 2009: The Datasets for the "Performance Evaluation of Tracking and Surveillance"" Workshop.
  • EPFL Terrace: Multi-camera pedestrian videos.
  • ETHZ Sequences: Inner City Sequences from Mobile Platforms.

Citation

When using this dataset in your research, we will be happy if you cite us:
@INPROCEEDINGS{Geiger2012CVPR,
  author = {Andreas Geiger and Philip Lenz and Raquel Urtasun},
  title = {Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite},
  booktitle = {Conference on Computer Vision and Pattern Recognition (CVPR)},
  year = {2012}
}



eXTReMe Tracker