The KITTI Vision Benchmark Suite

Method

SAM2-based Multi-object Tracking and Segmentation using Zero-shot Learning [Seg2Track-SAM2]
https://github.com/hcmr-lab/Seg2Track-SAM2

Submitted on 9 Sep. 2025 18:15 by
Diogo Mendonça (Universidade de Coimbra)

Running time:		1 s
Environment:		GPU @ 1.5 Ghz (Python)

Method Description:

This method extends SAM2 to multi-object tracking
and segmentation in a zero-shot setting. Objects are
initialized with a detector and refined over time
through object reinforcement, ensuring consistent
masks across frames without extra training.

Parameters:

\detection_threshold=0.5
\removal_threshold=0.1

Latex Bibtex:

@misc{mendonça2025seg2tracksam2sam2basedmultiobjecttracking,
title={Seg2Track-SAM2: SAM2-based Multi-object Tracking and Segmentation for Zero-shot Generalization},
author={Diogo Mendonça and Tiago Barros and Cristiano Premebida and Urbano J. Nunes},
year={2025},
eprint={2509.11772},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2509.11772},
}

Detailed Results

From all 29 test sequences, our benchmark computes the commonly used tracking metrics CLEARMOT, MT/PT/ML, identity switches, and fragmentations [1,2]. The tables below show all of these metrics.

Benchmark	MOTA	MOTP	MODA	MODP
CAR	61.52 %	76.66 %	62.23 %	81.01 %
PEDESTRIAN	37.48 %	69.41 %	39.10 %	90.45 %

Benchmark	recall	precision	F1	TP	FP	FN	FAR	#objects	#trajectories
CAR	79.62 %	85.75 %	82.58 %	30780	5113	7877	45.96 %	47510	989
PEDESTRIAN	65.80 %	71.60 %	68.58 %	15385	6103	7995	54.86 %	27345	417

Benchmark	MT	PT	ML	IDS	FRAG
CAR	59.54 %	34.46 %	6.00 %	244	849
PEDESTRIAN	36.08 %	45.02 %	18.90 %	376	1271

This table as LaTeX

[1] K. Bernardin, R. Stiefelhagen: Evaluating Multiple Object Tracking Performance: The CLEAR MOT Metrics. JIVP 2008.
[2] Y. Li, C. Huang, R. Nevatia: Learning to associate: HybridBoosted multi-target tracker for crowded scene. CVPR 2009.

The KITTI Vision Benchmark Suite

A project of Karlsruhe Institute of Technologyand Toyota Technological Institute at Chicago

Method

Detailed Results

A project of Karlsruhe Institute of Technology
and Toyota Technological Institute at Chicago