Method

URFormer [URFormer]


Submitted on 7 Nov. 2023 10:14 by
Guoxin Zhang (Hebei University of Science and Technology)

Running time:0.1 s
Environment:1 core @ 2.5 Ghz (Python)

Method Description:
We propose a Unified Representation Transformer-
based multi-modal 3D detector (URFormer) with
better representation scheme and cross-modality
interaction, which consists of three crucial
components. First, we propose Depth-Aware Lift
Module (DALM), which exploits depth information in
2D modality and lifts 2D representation into 3D at
the pixel-level, and naturally unifies
inconsistent multi-modal representation. Second,
we design a Sparse Transformer (SPTR) to enlarge
effective receptive fields and capture long-range
object semantic features for better interaction in
multi-modal features. Finally, we design Unified
Representation Fusion (URFusion) to integrate
cross-modality features in a fine-grain manner.
Parameters:
TBD
Latex Bibtex:

Detailed Results

Object detection and orientation estimation results. Results for object detection are given in terms of average precision (AP) and results for joint object detection and orientation estimation are provided in terms of average orientation similarity (AOS).


Benchmark Easy Moderate Hard
Car (Detection) 98.52 % 95.81 % 93.03 %
Car (Orientation) 98.45 % 95.59 % 92.69 %
Car (3D Detection) 89.64 % 83.40 % 78.62 %
Car (Bird's Eye View) 94.40 % 91.22 % 86.35 %
This table as LaTeX


2D object detection results.
This figure as: png eps txt gnuplot



Orientation estimation results.
This figure as: png eps txt gnuplot



3D object detection results.
This figure as: png eps txt gnuplot



Bird's eye view results.
This figure as: png eps txt gnuplot




eXTReMe Tracker