Method

MOTSFusion [MOTSFusion]
https://github.com/tobiasfshr/MOTSFusion

Submitted on 5 Dec. 2019 21:05 by
Jonathon Luiten (RWTH Aachen University)

Running time:0.44 s
Environment:GPU @ 2.5 Ghz (Python)

Method Description:
First we build tracklets by calculating a
segmentation mask for each detection and linking
these over time using optical flow. We then fuse
these tracklets into 3D object reconstuctions
using depth and ego motion estimates. These 3D
reconstructions are then used to estimate the 3D
motion of objects, which is used to merge
tracklets into long-term tracks, bridging
occlusion gaps of up to 20 frames. This also
allows us to fill in missing detections.
Parameters:
Detections = RRC
Segmentations = BB2SegNet
Latex Bibtex:
@article{luiten2019MOTSFusion,
title={Track to Reconstruct and Reconstruct to
Track},
author={Luiten, Jonathon and Fischer, Tobias and
Leibe, Bastian},
journal={IEEE Robotics and Automation Letters},
year={2020},
publisher={IEEE}
}

Detailed Results

From all 29 test sequences, our benchmark computes the commonly used tracking metrics (adapted for the segmentation case): CLEARMOT, MT/PT/ML, identity switches, and fragmentations [1,2]. The tables below show all of these metrics.


Benchmark sMOTSA MOTSA MOTSP MODSA MODSP
CAR 75.00 % 84.10 % 89.30 % 84.70 % 91.70 %
PEDESTRIAN 0.00 % 0.00 % 0.00 % 0.00 % 0.00 %

Benchmark recall precision F1 TP FP FN FAR #objects #trajectories
CAR 85.50 % 99.10 % 91.80 % 31418 295 5342 2.70 % 36439 770
PEDESTRIAN 0.00 % 0.00 % 0.00 % 0 0 0 0.00 % 0 0

Benchmark MT PT ML IDS FRAG
CAR 66.10 % 27.80 % 6.20 % 201 572
PEDESTRIAN 0.00 % 0.00 % 0.00 % 0 0

This table as LaTeX


[1] K. Bernardin, R. Stiefelhagen: Evaluating Multiple Object Tracking Performance: The CLEAR MOT Metrics. JIVP 2008.
[2] Y. Li, C. Huang, R. Nevatia: Learning to associate: HybridBoosted multi-target tracker for crowded scene. CVPR 2009.


eXTReMe Tracker