Method

3D-AWARE: Attention-guided, Weighted, and Adaptive Representation Encoding for Knowledge Distillatio [3D-AWARE]


Submitted on 19 Jul. 2025 07:27 by
Fazal ghaffar (Deakin University)

Running time:0.1 s
Environment:1 core @ 2.5 Ghz (Python)

Method Description:
In this paper, we present 3D-AWARE, a novel
teacher-student knowledge distillation framework
for efficient 3D object detection that integrates
LiDAR point clouds and RGB images via multi-level
attention-based fusion. The teacher network
leverages cross-modal and late fusion blocks to
generate rich geometric-visual representations,
while the lightweight student model adopts fusion
dropout and streamlined attention to boost
robustness and efficiency. To bridge the capacity
gap, we apply multi-stage distillation losses,
including classification logits, bounding box
regressions, intermediate features, and cross-
modal attention maps. Notably, our student is
trained on a larger augmented dataset than the
teacher, allowing it to outperform the teacher in
terms of 3D Average Precision (AP) while
maintaining low computational overhead. Extensive
experiments on the KITTI benchmark.
Parameters:
0.2
Latex Bibtex:

Detailed Results

Object detection and orientation estimation results. Results for object detection are given in terms of average precision (AP) and results for joint object detection and orientation estimation are provided in terms of average orientation similarity (AOS).


Benchmark Easy Moderate Hard
Car (Detection) 98.69 % 95.52 % 92.93 %
Car (Orientation) 98.68 % 95.40 % 92.74 %
Car (3D Detection) 91.38 % 84.85 % 80.39 %
Car (Bird's Eye View) 95.54 % 91.60 % 88.95 %
This table as LaTeX


2D object detection results.
This figure as: png eps txt gnuplot



Orientation estimation results.
This figure as: png eps txt gnuplot



3D object detection results.
This figure as: png eps txt gnuplot



Bird's eye view results.
This figure as: png eps txt gnuplot




eXTReMe Tracker