Juncong Fei (Karlsruhe Institute of Technology (KIT))

Running time:0.04 s
Environment:GPU @ 2.5 Ghz (Python + C/C++)

Method Description:
3D pedestrian detection is a challenging task
in automated driving because pedestrians are
relatively small, frequently occluded and
easily confused with narrow vertical objects.
LiDAR and camera are two commonly used sensor
modalities for this task, which should provide
complementary information. Unexpectedly, LiDAR-
only detection methods tend to outperform
multisensor fusion methods in public
benchmarks. Recently, PointPainting has been
presented to eliminate this performance drop by
effectively fusing the output of a semantic
segmentation network instead of the raw image
information. In this paper, we propose a
generalization of PointPainting to be able to
apply fusion at different levels. After the
semantic augmentation of the point cloud, we
encode raw point data in pillars to get
geometric features and semantic point data in
voxels to get semantic features and fuse them
in an effective way.
Detailed Results

Object detection and orientation estimation results. Results for object detection are given in terms of average precision (AP) and results for joint object detection and orientation estimation are provided in terms of average orientation similarity (AOS).

Benchmark Easy Moderate Hard
Pedestrian (Detection) 67.62 % 57.22 % 54.90 %
Pedestrian (Orientation) 45.59 % 38.95 % 37.21 %
Pedestrian (3D Detection) 50.90 % 42.19 % 39.52 %
Pedestrian (Bird's Eye View) 58.91 % 49.93 % 47.31 %
2D object detection results.
Orientation estimation results.
3D object detection results.
Bird's eye view results.
