Learning to Track: Online Multi-Object Tracking by Decision Making [on] [MDP]

Submitted on 1 Feb. 2016 05:18 by
Yu Xiang (Stanford University)

Running time:0.9 s
Environment:8 cores @ 3.5 Ghz (Matlab + C/C++)

Method Description:
Online Multi-Object Tracking (MOT) has wide
applications in time-critical video analysis
scenarios, such as robot navigation and
autonomous driving. In tracking-by-detection, a
major challenge of online MOT is how to
associate noisy object detections on a new
frame with previously tracked objects. In this
work, we formulate the online MOT problem as
decision making in Markov Decision Processes
(MDPs), where the lifetime of an object is
modeled with a MDP. Learning a similarity
function for data association is equivalent to
learning a policy for the MDP, and the policy
learning is approached in a reinforcement
learning fashion which benefits from both
advantages of offline-learning and online-
learning for data association. Moreover, our
framework can naturally handle the birth/death
and appearance/disappearance of targets by
treating them as state transitions in the MDP
while leveraging existing online single object
tracking methods.
Use detetions from the SubCNN method evaluated on
the KITTI object detection benchmark
Latex Bibtex:
author = {Xiang, Yu and Alahi, Alexandre
Savarese, Silvio},
title = {Learning to Track: Online Multi-
Object Tracking by Decision Making},
booktitle = {International Conference on
Computer Vision (ICCV)},
pages = {4705--4713},
year = {2015}
author = {Xiang, Yu and Choi, Wongun and Lin, Yuanqing
and Savarese, Silvio},
title = {Subcategory-aware Convolutional Neural
Networks for Object Proposals and Detection},
booktitle = {IEEE Winter Conference on Applications of
Computer Vision (WACV)},
year = {2017}

Detailed Results

From all 29 test sequences, our benchmark computes the commonly used tracking metrics CLEARMOT, MT/PT/ML, identity switches, and fragmentations [1,2]. The tables below show all of these metrics.

CAR 76.59 % 82.10 % 76.97 % 86.36 %
PEDESTRIAN 47.22 % 70.36 % 47.59 % 90.96 %

Benchmark recall precision F1 TP FP FN FAR #objects #trajectories
CAR 80.26 % 98.00 % 88.25 % 29747 606 7315 5.45 % 32774 831
PEDESTRIAN 59.01 % 84.12 % 69.36 % 13734 2592 9540 23.30 % 18951 446

CAR 52.15 % 34.46 % 13.38 % 130 387
PEDESTRIAN 24.05 % 48.11 % 27.84 % 87 825

This table as LaTeX

[1] K. Bernardin, R. Stiefelhagen: Evaluating Multiple Object Tracking Performance: The CLEAR MOT Metrics. JIVP 2008.
[2] Y. Li, C. Huang, R. Nevatia: Learning to associate: HybridBoosted multi-target tracker for crowded scene. CVPR 2009.

eXTReMe Tracker