The KITTI Vision Benchmark Suite

Segmenting and Tracking Every Pixel (STEP) Evaluation

This benchmark is part of the ICCV21-Workshop: Segmenting and Tracking Every Point and Pixel.

The Segmenting and Tracking Every Pixel (STEP) benchmark consists of 21 training sequences and 29 test sequences. It is based on the KITTI Tracking Evaluation and the Multi-Object Tracking and Segmentation (MOTS) benchmark. This benchmark extends the annotations to the Segmenting and Tracking Every Pixel (STEP) task. To this end, we added dense pixelwise segmentation labels for every pixel. In this benchmark, every pixel has a semantic label and all pixels belonging to the most salient object classes, car and pedestrian, have a unique tracking ID. We evaluate submitted results using the Segmentation and Tracking Quality (STQ) metric:

STQ: The combined segmentation and tracking quality given by the geometric mean of AQ and SQ.
AQ: The class-agnostic association quality. Please refer to the above link for details.
SQ (IoU): The track-agnostic segmentation quality given by the mean IoU of all classes.

More details and downloads can be found here:

The submission instructions can be found on the submit results page. Please address any questions or feedback about KITTI-STEP and its evaluation to Mark Weber.

Important Policy Update: As more and more non-published work and re-implementations of existing work is submitted to KITTI, we have established a new policy: from now on, only submissions with significant novelty that are leading to a peer-reviewed paper in a conference or journal are allowed. Minor modifications of existing algorithms or student research projects are not allowed. Such work must be evaluated on a split of the training set. To ensure that our policy is adopted, new users must detail their status, describe their work and specify the targeted venue during registration. Furthermore, we will regularly delete all entries that are 6 months old but are still anonymous or do not have a paper associated with them. For conferences, 6 month is enough to determine if a paper has been accepted and to add the bibliography information. For longer review cycles, you need to resubmit your results.

Additional information used by the methods

Online: Online method (frame-by-frame processing, no latency)
Additional training data: Use of additional data sources for training

	Method	Setting	Code	STQ	AQ	SQ (IoU)
1	Video-kMaX			68.47 %	67.20 %	69.77 %
I. Shin, D. Kim, Q. Yu, J. Xie, H. Kim, B. Green, I. Kweon, K. Yoon and L. Chen: Video-kMaX: A simple unified approach for online and near-online video panoptic segmentation. arXiv preprint arXiv:2304.04694 2023.
2	MAPT		code	67.38 %	66.54 %	68.23 %
Juana Valeria Hurtado and A. Valada: Learning Appearance and Motion Cues for Panoptic Tracking. 2024.
3	TubeFormer-DeepLab			65.25 %	60.59 %	70.27 %
D. Kim, J. Xie, H. Wang, S. Qiao, Q. Yu, H. Kim, H. Adam, I. Kweon and L. Chen: TubeFormer-DeepLab: Video Mask Transformer. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2022.
4	MAPT		code	58.97 %	56.57 %	61.47 %
ERROR: Wrong syntax in BIBTEX file.
5	siain			57.87 %	55.16 %	60.71 %
J. Ryu and K. Yoon: An End-to-End Trainable Video Panoptic Segmentation Method usingTransformers. 2021.
6	Motion-DeepLab		code	52.19 %	45.55 %	59.81 %
M. Weber, J. Xie, M. Collins, Y. Zhu, P. Voigtlaender, H. Adam, B. Green, A. Geiger, B. Leibe, D. Cremers, A. Osep, L. Leal-Taixe and L. Chen: STEP: Segmenting and Tracking Every Pixel. arXiv:2102.11859 2021.

Table as LaTeX | Only published Methods

Citation

When using this dataset in your research, we will be happy if you cite us:
@inproceedings{Weber2021NEURIPSDATA,
author = {Mark Weber and Jun Xie and Maxwell Collins and Yukun Zhu and Paul Voigtlaender and Hartwig Adam and Bradley Green and Andreas Geiger and Bastian Leibe and Daniel Cremers and Aljosa Osep and Laura Leal-Taixe and Liang-Chieh Chen},
title = {STEP: Segmenting and Tracking Every Pixel},
booktitle = {Neural Information Processing Systems (NeurIPS) Track on Datasets and Benchmarks},
year = {2021}
}

The KITTI Vision Benchmark Suite

A project of Karlsruhe Institute of Technologyand Toyota Technological Institute at Chicago

Segmenting and Tracking Every Pixel (STEP) Evaluation

Citation

A project of Karlsruhe Institute of Technology
and Toyota Technological Institute at Chicago