Method

Multi-Level Fusion for Cross-Modal 3D Object Detection [MLF-DET]


Submitted on 18 Dec. 2023 16:32 by
Zewei Lin (Xi’an Jiaotong University)

Running time:0.09 s
Environment:1 core @ 2.5 Ghz (C/C++)

Method Description:
In this paper, we propose a novel and effective
Multi-Level Fusion network, named as MLF-DET, for
high-performance cross-modal 3D object DETection,
which integrates both the feature-level fusion and
decision-level fusion to fully utilize the
information in the image. For the feature-level
fusion, we present the Multi-scale Voxel Image
fusion module, which densely aligns multi-scale
voxel features with image features. For the
decision-level fusion, we propose the lightweight
Feature-cued Confidence Rectification module which
further exploits image semantics to rectify the
confidence of detection candidates. Besides, we
design an effective data augmentation strategy
termed Occlusion-aware GT Sampling to reserve more
sampled objects in the training scenes, so as to
reduce overfitting. Extensive experiments on the
KITTI dataset demonstrate the effectiveness of our
method.
Parameters:
TBD
Latex Bibtex:
@inproceedings{lin2023mlf,
title={MLF-DET: Multi-Level Fusion for Cross-
Modal 3D Object Detection},
author={Lin, Zewei and Shen, Yanqing and Zhou,
Sanping and Chen, Shitao and Zheng, Nanning},
booktitle={International Conference on
Artificial Neural Networks},
pages={136--149},
year={2023},
organization={Springer}
}

Detailed Results

Object detection and orientation estimation results. Results for object detection are given in terms of average precision (AP) and results for joint object detection and orientation estimation are provided in terms of average orientation similarity (AOS).


Benchmark Easy Moderate Hard
Car (Detection) 96.89 % 96.17 % 88.90 %
Car (Orientation) 96.87 % 96.09 % 88.78 %
Car (3D Detection) 91.18 % 82.89 % 77.89 %
Car (Bird's Eye View) 93.38 % 89.82 % 84.78 %
Pedestrian (Detection) 70.25 % 63.09 % 59.23 %
Pedestrian (Orientation) 64.49 % 56.89 % 53.17 %
Pedestrian (3D Detection) 50.86 % 45.29 % 42.05 %
Pedestrian (Bird's Eye View) 56.45 % 50.88 % 47.60 %
Cyclist (Detection) 87.34 % 81.95 % 74.79 %
Cyclist (Orientation) 87.17 % 81.07 % 73.92 %
Cyclist (3D Detection) 83.31 % 70.71 % 63.71 %
Cyclist (Bird's Eye View) 86.20 % 74.88 % 66.75 %
This table as LaTeX


2D object detection results.
This figure as: png eps txt gnuplot



Orientation estimation results.
This figure as: png eps txt gnuplot



3D object detection results.
This figure as: png eps txt gnuplot



Bird's eye view results.
This figure as: png eps txt gnuplot



2D object detection results.
This figure as: png eps txt gnuplot



Orientation estimation results.
This figure as: png eps txt gnuplot



3D object detection results.
This figure as: png eps txt gnuplot



Bird's eye view results.
This figure as: png eps txt gnuplot



2D object detection results.
This figure as: png eps txt gnuplot



Orientation estimation results.
This figure as: png eps txt gnuplot



3D object detection results.
This figure as: png eps txt gnuplot



Bird's eye view results.
This figure as: png eps txt gnuplot




eXTReMe Tracker