The KITTI Vision Benchmark Suite

Method

Luminet: A Multi-Modal 3-Stream Feature Fusion Network for 3d Object Detection [LumiNet]

Submitted on 16 Feb. 2026 04:31 by
Fazal ghaffar (Deakin University)

Running time:		0.1 s
Environment:		1 core @ 2.5 Ghz (Python)

Method Description:

LumiNet is a novel 3D object detection framework
that combines LiDAR point clouds, RGB images, and
depth data to enhance 3D object detection. By
fusing these complementary modalities, the
approach provides robust and reliable detection
for applications like autonomous vehicles. LumiNet
integrates semantic information from RGB images
into point features using a dedicated fusion
module and leverages depth features to strengthen
the representation of LiDAR and RGB data. This
multi-modal fusion enables accurate 3D bounding
box predictions and improves scene understanding,
particularly through reliable depth estimation
critical for real-world environments. The
framework incorporates a Strong Attention
mechanism and a 3-Stream multi-modal loss to
enhance cross-modal feature learning and fusion.
LumiNet's performance is evaluated on the KITTI
and JRDB datasets, with experimental results
highlighting the effectiveness of its multi-modal
fusion framework compared to state-of-the-art 3D
detectio

Parameters:

0.2

Latex Bibtex:

Detailed Results

Object detection and orientation estimation results. Results for object detection are given in terms of average precision (AP) and results for joint object detection and orientation estimation are provided in terms of average orientation similarity (AOS).

Benchmark	Easy	Moderate	Hard
Car (Detection)	99.23 %	96.27 %	88.94 %
Car (Orientation)	99.09 %	95.87 %	88.47 %
Car (3D Detection)	91.76 %	83.32 %	78.29 %
Car (Bird's Eye View)	95.79 %	90.13 %	85.06 %
Pedestrian (Detection)	72.01 %	61.38 %	58.94 %
Pedestrian (Orientation)	66.85 %	55.80 %	53.17 %
Pedestrian (3D Detection)	53.54 %	45.26 %	41.55 %
Pedestrian (Bird's Eye View)	57.64 %	50.44 %	46.74 %
Cyclist (Detection)	88.45 %	74.76 %	67.89 %
Cyclist (Orientation)	87.99 %	74.03 %	67.13 %
Cyclist (3D Detection)	80.43 %	62.31 %	55.72 %
Cyclist (Bird's Eye View)	85.56 %	68.42 %	61.65 %