The KITTI Vision Benchmark Suite

Method

PointPillars [la] [PointPillars]
https://github.com/nutonomy/second.pytorch

Submitted on 16 Nov. 2018 11:46 by
Alex Lang (nuTonomy)

Running time:		16 ms
Environment:		1080ti GPU and Intel i7 CPU

Method Description:

Object detection in point clouds is an important aspect of
many robotics applications such as autonomous driving. In
this paper we consider the problem of encoding a point
cloud into a format appropriate for a downstream detection
pipeline. Recent literature suggests two types of encoders;
fixed encoders tend to be fast but sacrifice accuracy, while
encoders that are learned from data are more accurate, but
slower. In this work we propose PointPillars, a novel encoder
which utilizes PointNets to learn a representation of point
clouds organized in vertical columns (pillars). While the
encoded features can be used with any standard 2D
convolutional detection architecture, we further propose a
lean downstream network. Extensive experimentation shows
that PointPillars outperforms previous encoders with
respect to both speed and accuracy by a large margin.
Despite only using lidar, our full detection pipeline
significantly outperforms the state of the art, even among
fusion methods, with respect to both the 3D and bird's eye
view KITTI benchmarks. This detection performance is
achieved while running at 62 Hz: a 2 - 4 fold runtime
improvement. A faster version of our method matches the
state of the art at 105 Hz. These benchmarks suggest that
PointPillars is an appropriate encoding for object detection
in point clouds.

Parameters:

Latex Bibtex:

@article{lang2018pointpillars,
title={PointPillars: Fast Encoders for Object Detection from
Point Clouds},
author={Lang, Alex H and Vora, Sourabh and Caesar,
Holger and Zhou, Lubing and Yang, Jiong and Beijbom,
Oscar},
journal={CVPR},
year={2019}
}

Detailed Results

Object detection and orientation estimation results. Results for object detection are given in terms of average precision (AP) and results for joint object detection and orientation estimation are provided in terms of average orientation similarity (AOS).

Benchmark	Easy	Moderate	Hard
Car (Detection)	94.00 %	91.19 %	88.17 %
Car (Orientation)	93.84 %	90.70 %	87.47 %
Car (3D Detection)	82.58 %	74.31 %	68.99 %
Car (Bird's Eye View)	90.07 %	86.56 %	82.81 %
Pedestrian (Detection)	65.29 %	55.10 %	52.39 %
Pedestrian (Orientation)	57.47 %	48.05 %	45.40 %
Pedestrian (3D Detection)	51.45 %	41.92 %	38.89 %
Pedestrian (Bird's Eye View)	57.60 %	48.64 %	45.78 %
Cyclist (Detection)	83.97 %	68.98 %	62.17 %
Cyclist (Orientation)	83.79 %	68.55 %	61.71 %
Cyclist (3D Detection)	77.10 %	58.65 %	51.92 %
Cyclist (Bird's Eye View)	79.90 %	62.73 %	55.58 %