Method

Learned Object Descriptors [LOD]
[Anonymous Submission]

Submitted on 23 Apr. 2025 16:49 by
[Anonymous Submission]

Running time:0.1 s
Environment:GPU @ 2.0 Ghz (Python)

Method Description:
This method relies only on the visual appearance of objects. We use a YOLO model to get detections and ConvNeXt model to get reID features from each detection. We then assign track IDs based on the highest cosine similarity between incoming reID features and features from all previous frames. If any duplicate IDs are detected within a frame, then only the detecion with highest similarity score is kept and all other duplicates are rejected.
Parameters:
The similarity threshold \tau=0.82
Yolo confidence threshold=0.5
Latex Bibtex:

Detailed Results

From all 29 test sequences, our benchmark computes the commonly used tracking metrics CLEARMOT, MT/PT/ML, identity switches, and fragmentations [1,2]. The tables below show all of these metrics.


Benchmark MOTA MOTP MODA MODP
CAR 68.10 % 84.88 % 69.19 % 88.15 %
PEDESTRIAN 39.14 % 76.77 % 39.65 % 93.93 %

Benchmark recall precision F1 TP FP FN FAR #objects #trajectories
CAR 72.62 % 98.78 % 83.70 % 27212 335 10262 3.01 % 29388 1764
PEDESTRIAN 42.39 % 95.01 % 58.62 % 9897 520 13452 4.67 % 10781 451

Benchmark MT PT ML IDS FRAG
CAR 43.85 % 40.15 % 16.00 % 375 732
PEDESTRIAN 14.78 % 40.21 % 45.02 % 118 650

This table as LaTeX


[1] K. Bernardin, R. Stiefelhagen: Evaluating Multiple Object Tracking Performance: The CLEAR MOT Metrics. JIVP 2008.
[2] Y. Li, C. Huang, R. Nevatia: Learning to associate: HybridBoosted multi-target tracker for crowded scene. CVPR 2009.


eXTReMe Tracker