Method

Point Virtual Transformer Painted [PointVit P1]


Submitted on 27 Sep. 2025 08:34 by
Veerain Sood (Texas A & M)

Running time:0.05 s
Environment:1 core @ 2.5 Ghz (C/C++)

Method Description:
This is a single stage transformer architecture
operating directly on voxelized LiDAR points.
The painting variant of PointViT augments raw
LiDAR with RGB information by projecting points
into the image plane and assigning per-point color
values.
It combines real and virtual points with painted
features, integrates self-attention with local
depthwise convolutions, and employs a multiscale
voxel feature encoder.

This is a modification to PointVit V2 model
integrating rgb values in virtual as well as real
lidar points
Parameters:
Number of transformer heads: 4
Latex Bibtex:

Detailed Results

Object detection and orientation estimation results. Results for object detection are given in terms of average precision (AP) and results for joint object detection and orientation estimation are provided in terms of average orientation similarity (AOS).


Benchmark Easy Moderate Hard
Car (Detection) 97.05 % 96.53 % 88.96 %
Car (Orientation) 97.04 % 96.47 % 88.88 %
Car (3D Detection) 89.22 % 79.97 % 72.64 %
Car (Bird's Eye View) 93.44 % 89.57 % 82.08 %
This table as LaTeX


2D object detection results.
This figure as: png eps txt gnuplot



Orientation estimation results.
This figure as: png eps txt gnuplot



3D object detection results.
This figure as: png eps txt gnuplot



Bird's eye view results.
This figure as: png eps txt gnuplot




eXTReMe Tracker