MOTSFusion (Pedestrians) [MOTSFusion]

Submitted on 5 Dec. 2019 22:02 by
Jonathon Luiten (RWTH Aachen University)

Running time:0.44 s
Environment:1 core @ 2.5 Ghz (C/C++)

Method Description:
First we build tracklets by calculating a
segmentation mask for each detection and linking
these over time using optical flow. We then fuse
these tracklets into 3D object reconstuctions
using depth and ego motion estimates. These 3D
reconstructions are then used to estimate the 3D
motion of objects, which is used to merge
tracklets into long-term tracks, bridging
occlusion gaps of up to 20 frames. This also
allows us to fill in missing detections.
Detections = TrackRCNN
Segmentations = BB2SegNet
Latex Bibtex:
title={Track to Reconstruct and Reconstruct to
author={Luiten, Jonathon and Fischer, Tobias and
Leibe, Bastian},
journal={IEEE Robotics and Automation Letters},

Detailed Results

From all 29 test sequences, our benchmark computes the HOTA tracking metrics (HOTA, DetA, AssA, DetRe, DetPr, AssRe, AssPr, LocA) [1] as well as the CLEARMOT, MT/PT/ML, identity switches, and fragmentation [2,3] metrics. The tables below show all of these metrics.

Benchmark HOTA DetA AssA DetRe DetPr AssRe AssPr LocA
PEDESTRIAN 54.04 % 60.83 % 49.45 % 64.13 % 81.47 % 56.68 % 70.44 % 83.71 %

Benchmark TP FP FN
PEDESTRIAN 15829 4868 463

PEDESTRIAN 72.89 % 81.50 % 74.24 % 279 58.75 %

Benchmark MT rate PT rate ML rate FRAG
PEDESTRIAN 47.41 % 37.04 % 15.56 % 522

Benchmark # Dets # Tracks
PEDESTRIAN 16292 293

This table as LaTeX