The KITTI Vision Benchmark Suite

Method

Multi-View Fusion Network [MVFusion]
[Anonymous Submission]

Submitted on 12 Mar. 2026 12:38 by
[Anonymous Submission]

Running time:		0.06 s
Environment:		1 core @ 2.5 Ghz (C/C++)

Method Description:

Current multi-view fusion approaches still suffer
from inaccurate feature alignment, limited
robustness to noise, and insufficient cross-view
feature collaboration. To address these issues, we
propose a plug-and-play multi-view fusion
framework for LiDAR-based 3D object detection. A
Multi-View Attention Fusion (MVAF) module is
introduced based on a Transformer architecture,
which performs global adaptive fusion between BEV
and SV features to enhance cross-view contextual
interaction. Extensive experiments on the KITTI
Dataset demonstrate that the proposed framework
significantly improves detection accuracy and
robustness.

Parameters:

none

Latex Bibtex:

none

Detailed Results

Object detection and orientation estimation results. Results for object detection are given in terms of average precision (AP) and results for joint object detection and orientation estimation are provided in terms of average orientation similarity (AOS).

Benchmark	Easy	Moderate	Hard
Car (Detection)	96.57 %	96.05 %	90.85 %
Car (Orientation)	96.53 %	95.86 %	90.57 %
Car (3D Detection)	91.65 %	85.20 %	78.28 %
Car (Bird's Eye View)	95.22 %	92.10 %	84.98 %

This table as LaTeX

2D object detection results.
This figure as: png eps txt gnuplot

Orientation estimation results.
This figure as: png eps txt gnuplot

3D object detection results.
This figure as: png eps txt gnuplot

Bird's eye view results.
This figure as: png eps txt gnuplot

The KITTI Vision Benchmark Suite

A project of Karlsruhe Institute of Technologyand Toyota Technological Institute at Chicago

Method

Detailed Results

A project of Karlsruhe Institute of Technology
and Toyota Technological Institute at Chicago