A Baseline for 3D Multi-Object Tracking [on] [la] [AB3DMOT]

Submitted on 19 Aug. 2019 01:31 by
Xinshuo Weng (Carnegie Mellon University)

Running time:0.0047s
Environment:1 core @ 2.5 Ghz (python)

Method Description:
3D multi-object tracking (MOT) is an essential
component technology for many real-time
applications such as autonomous driving or
assistive robotics. However, recent works for 3D
MOT tend to focus more on developing accurate
systems giving less regard to computational cost
and system complexity. In contrast, this work
proposes a simple yet accurate real-time baseline
3D MOT system. We use an off-the-shelf 3D object
detector to obtain oriented 3D bounding boxes
the LiDAR point cloud. Then, a combination of 3D
Kalman filter and Hungarian algorithm is used for
state estimation and data association. Although
our baseline system is a straightforward
combination of standard methods, we obtain the
state-of-the-art results. To evaluate our
system, we propose a new 3D MOT extension to the
official KITTI 2D MOT evaluation along with two
new metrics. Our proposed baseline method for 3D
MOT establishes new state-of-the-art performance
on 3D MOT for KITTI, improving the 3D MOTA from
72.23 of prior art to 76.47. Surprisingly, by
projecting our 3D tracking results to the 2D
plane and compare against published 2D MOT
methods, our system places 2nd on the official
KITTI leaderboard. Also, our proposed 3D MOT
method runs at a rate of 214.7 FPS, 65 times
faster than the state-of-the-art 2D MOT system.
Latex Bibtex:
archivePrefix = {arXiv},
arxivId = {1907.03961},
author = {Weng, Xinshuo and Kitani, Kris},
eprint = {1907.03961},
journal = {arXiv:1907.03961},
title = {{A Baseline for 3D Multi-Object
url = {},
year = {2019}

Detailed Results

Object detection and orientation estimation results. Results for object detection are given in terms of average precision (AP) and results for joint object detection and orientation estimation are provided in terms of average orientation similarity (AOS).

Benchmark Easy Moderate Hard
Pedestrian (Detection) 54.55 % 43.86 % 40.99 %
Pedestrian (Orientation) 50.30 % 39.76 % 36.90 %
Pedestrian (3D Detection) 42.27 % 34.59 % 31.37 %
Pedestrian (Bird's Eye View) 47.51 % 38.79 % 35.85 %
This table as LaTeX

2D object detection results.
This figure as: png eps pdf txt gnuplot

Orientation estimation results.
This figure as: png eps pdf txt gnuplot

3D object detection results.
This figure as: png eps pdf txt gnuplot

Bird's eye view results.
This figure as: png eps pdf txt gnuplot

eXTReMe Tracker