Method

MonoCtrl: Controlled Cue Mining and Enhancement for Monocular 3D Object Detection [MonoCtrl_MonoCD]


Submitted on 14 Mar. 2026 03:48 by
Song Zhenbo (Nanjing University of Science and Technology)

Running time:0.06 s
Environment:1 core @ 2.5 Ghz (Python)

Method Description:
MonoCtrl based on MonoCD.

It is the first to decompose objects and images
into control signals and controlled targets
respectively, which enhances the model's ability
to mine cues in monocular scenarios. A novel
Nearest Neighbor Attention algorithm is proposed,
applicable to monocular 3D detection and other
fields. Based on these innovations, MonoCtrl can
enhance the robustness of monocular 3D object
detectors and especially their detection
performance for challenging objects.
Parameters:
\alpha=1
Latex Bibtex:

Detailed Results

Object detection and orientation estimation results. Results for object detection are given in terms of average precision (AP) and results for joint object detection and orientation estimation are provided in terms of average orientation similarity (AOS).


Benchmark Easy Moderate Hard
Car (Detection) 96.55 % 93.67 % 83.86 %
Car (Orientation) 96.37 % 93.29 % 83.32 %
Car (3D Detection) 25.86 % 18.32 % 15.42 %
Car (Bird's Eye View) 33.96 % 24.85 % 21.66 %
This table as LaTeX


2D object detection results.
This figure as: png eps txt gnuplot



Orientation estimation results.
This figure as: png eps txt gnuplot



3D object detection results.
This figure as: png eps txt gnuplot



Bird's eye view results.
This figure as: png eps txt gnuplot




eXTReMe Tracker