Semantic Instance Segmentation Evaluation

This is the KITTI semantic instance segmentation benchmark. It consists of 200 semantically annotated train as well as 200 test images corresponding to the KITTI Stereo and Flow Benchmark 2015. The data format and metrics are conform with The Cityscapes Dataset.

The data can be downloaded here:

The instance segmentation task focuses on detecting, segmenting and classifzing object instances. To assess instance-level performance, we compute the average precision on the region level (AP) for each class and average it across a range of overlap thresholds to avoid a bias towards a specific value. As described in The Cityscapes Dataset, we use 10 different overlaps ranging from 0.5 to 0.95 in steps of 0.05. The overlap is computed at the region level, making it equivalent to the IoU of a single instance. We penalize multiple predictions of the same ground truth instance as false positives. To obtain a single, easy to compare compound score, we report the mean average precision AP, obtained by also averaging over the class label set. As minor scores, we add AP50% for an overlap value of 50 %.

  • AP:  Average precision as described above.
  • AP 50%:    Average Precision with 50 % overlap.

Important Policy Update: As more and more non-published work and re-implementations of existing work is submitted to KITTI, we have established a new policy: from now on, only submissions with significant novelty that are leading to a peer-reviewed paper in a conference or journal are allowed. Minor modifications of existing algorithms or student research projects are not allowed. Such work must be evaluated on a split of the training set. To ensure that our policy is adopted, new users must detail their status, describe their work and specify the targeted venue during registration. Furthermore, we will regularly delete all entries that are 6 months old but are still anonymous or do not have a paper associated with them. For conferences, 6 month is enough to determine if a paper has been accepted and to add the bibliography information. For longer review cycles, you need to resubmit your results.
Additional information used by the methods
  • Laser Points: Method uses point clouds from Velodyne laser scanner
  • Depth: Method uses depth from stereo.
  • Video: Method uses 2 or more temporally adjacent images
  • Additional training data: Use of additional data sources for training (see details)

Method Setting Code AP AP50% Runtime Environment
1 UDeer_DIS++ code 16.36 28.78 0.1 s 1 core @ 2.5 Ghz (C/C++)
Z. Dong, H. Ji, X. Huang, W. Zhang, X. Zhan and J. Chen: PeP: a Point enhanced Painting method for unified point cloud tasks. 2023.
2 UDeer_DIS code 14.11 24.80 0.1 s 1 core @ 2.5 Ghz (C/C++)
3 CenterPoly code 8.73 26.74 0.045 s GPU @ 2.5 Ghz (Python)
H. Perreault, G. Bilodeau, N. Saunier and M. H\'eritier: CenterPoly: real-time instance segmentation using bounding polygons. Proceedings of the IEEE/CVF International Conference on Computer Vision 2021.
4 BAMRCNN_ROB code 0.68 1.81 1 s 4 cores @ 2.5 Ghz (Python)
R. Girshick, I. Radosavovic, G. Gkioxari, P. Doll\'ar and K. He: Detectron. 2018.
Table as LaTeX | Only published Methods

Related Datasets

  • The Cityscapes Dataset: The cityscapes dataset was recorded in 50 German cities and offers high quality pixel-level annotations of 5 000 frames in addition to a larger set of 20 000 weakly annotated frames.
  • Wilddash: Wilddash is a benchmark for semantic and instance segmentation. It aims to improve the expressiveness of performance evaluation for computer vision algorithms in regard to their robustness under real-world conditions.


When using this dataset in your research, we will be happy if you cite us:
  author = { AlhaijaandHassan and MustikovelaandSiva and MeschederandLars and GeigerandAndreas and RotherandCarsten},
  title = {Augmented Reality Meets Computer Vision: Efficient Data Generation for Urban Driving Scenes},
  journal = {International Journal of Computer Vision (IJCV)},
  year = {2018}

eXTReMe Tracker