\begin{tabular}{c | c | c | c | c | c | c}
{\bf Method} & {\bf Setting} & {\bf Moderate} & {\bf Easy} & {\bf Hard} & {\bf Runtime} & {\bf Environment}\\ \hline
VPFNet & & 48.36 \% & 54.65 \% & 44.98 \% & 0.2 s / 1 core & C. Wang, H. Chen and L. Fu: VPFNet: Voxel-Pixel Fusion Network for Multi-class 3D Object Detection. 2021.\\
PV-RCNN++ & & 47.19 \% & 54.29 \% & 43.49 \% & 0.06 s / 1 core & \\
EQ-PVRCNN & & 47.02 \% & 55.84 \% & 42.94 \% & 0.2 s / GPU & \\
ADLAB & & 46.18 \% & 53.59 \% & 43.28 \% & 0.05 s / 1 core & \\
PiFeNet & & 45.89 \% & 54.84 \% & 42.02 \% & 0.03 s / 1 core & \\
ISE-RCNN & & 45.66 \% & 51.44 \% & 42.43 \% & 0.09 s / 1 core & \\
CAT-Det & & 45.44 \% & 54.26 \% & 41.94 \% & 0.3 s / GPU & \\
HotSpotNet & & 45.37 \% & 53.10 \% & 41.47 \% & 0.04 s / 1 core & Q. Chen, L. Sun, Z. Wang, K. Jia and A. Yuille: object as hotspots. Proceedings of the European Conference on Computer Vision (ECCV) 2020.\\
H^23D R-CNN & & 45.26 \% & 52.75 \% & 41.56 \% & 0.03 s / 1 core & \\
PE-RCVN & & 45.01 \% & 50.29 \% & 41.85 \% & 0.03 s / 1 core & \\
SAA-PV-RCNN & & 45.00 \% & 52.55 \% & 41.82 \% & 0.08 s / 1 core & \\
VPN & & 44.56 \% & 54.13 \% & 41.73 \% & 0.06 s / 1 core & \\
EPNet++ & & 44.38 \% & 52.79 \% & 41.29 \% & 0.1 s / GPU & \\
TANet & & 44.34 \% & 53.72 \% & 40.49 \% & 0.035s / GPU & Z. Liu, X. Zhao, T. Huang, R. Hu, Y. Zhou and X. Bai: TANet: Robust 3D Object Detection from Point Clouds with Triple Attention. AAAI 2020.\\
TBD & & 44.32 \% & 49.37 \% & 41.24 \% & 0.1 s / 1 core & \\
3DSSD & & 44.27 \% & 54.64 \% & 40.23 \% & 0.04 s / GPU & Z. Yang, Y. Sun, S. Liu and J. Jia: 3DSSD: Point-based 3D Single Stage Object Detector. CVPR 2020.\\
AutoAlign & & 44.08 \% & 53.99 \% & 40.82 \% & 0.1 s / 1 core & \\
ISE-RCNN-PV & & 43.78 \% & 50.03 \% & 40.50 \% & 0.1 s / 1 core & \\
Point-GNN & la & 43.77 \% & 51.92 \% & 40.14 \% & 0.6 s / GPU & W. Shi and R. Rajkumar: Point-GNN: Graph Neural Network for 3D Object Detection in a Point Cloud. CVPR 2020.\\
VCT & & 43.65 \% & 50.27 \% & 41.43 \% & 0.2 s / 1 core & \\
EA-M-RCNN(BorderAtt) & & 43.44 \% & 51.81 \% & 39.85 \% & 0.08 s / 1 core & \\
F-ConvNet & la & 43.38 \% & 52.16 \% & 38.80 \% & 0.47 s / GPU & Z. Wang and K. Jia: Frustum ConvNet: Sliding Frustums to Aggregate Local Point-Wise Features for Amodal 3D Object Detection. IROS 2019.\\
MMLab-PartA^2 & la & 43.35 \% & 53.10 \% & 40.06 \% & 0.08 s / GPU & S. Shi, Z. Wang, J. Shi, X. Wang and H. Li: From Points to Parts: 3D Object Detection from Point Cloud with Part-aware and Part-aggregation Network. IEEE Transactions on Pattern Analysis and Machine Intelligence 2020.\\
FS-Net & la & 43.31 \% & 49.82 \% & 40.89 \% & 0.1 s / 1 core & \\
MMLab PV-RCNN & la & 43.29 \% & 52.17 \% & 40.29 \% & 0.08 s / 1 core & S. Shi, C. Guo, L. Jiang, Z. Wang, J. Shi, X. Wang and H. Li: PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection. CVPR 2020.\\
FromVoxelToPoint & & 43.28 \% & 51.80 \% & 40.71 \% & 0.1 s / 1 core & J. Li, H. Dai, L. Shao and Y. Ding: From Voxel to Point: IoU-guided 3D Object Detection for Point Cloud with Voxel-to- Point Decoder. MM '21: The 29th ACM International Conference on Multimedia (ACM MM) 2021.\\
VMVS & la & 43.27 \% & 53.44 \% & 39.51 \% & 0.25 s / GPU & J. Ku, A. Pon, S. Walsh and S. Waslander: Improving 3D object detection for pedestrians with virtual multi-view synthesis orientation estimation. IROS 2019.\\
P2V-RCNN & & 43.19 \% & 50.91 \% & 40.81 \% & 0.1 s / 2 cores & J. Li, S. Luo, Z. Zhu, H. Dai, A. Krylov, Y. Ding and L. Shao: P2V-RCNN: Point to Voxel Feature Learning for 3D Object Detection from Point Clouds. IEEE Access 2021.\\
MGAF-3DSSD & & 43.09 \% & 50.65 \% & 39.65 \% & 0.1 s / 1 core & J. Li, H. Dai, L. Shao and Y. Ding: Anchor-free 3D Single Stage Detector with Mask-Guided Attention for Point Cloud. MM '21: The 29th ACM International Conference on Multimedia (ACM MM) 2021.\\
SGNet & & 43.00 \% & 49.68 \% & 40.45 \% & 0.09 s / GPU & \\
Frustum-PointPillars & & 42.89 \% & 51.22 \% & 39.28 \% & 0.06 s / 4 cores & A. Paigwar, D. Sierra-Gonzalez, \. Erkent and C. Laugier: Frustum-PointPillars: A Multi-Stage Approach for 3D Object Detection using RGB Camera and LiDAR. International Conference on Computer Vision, ICCV, Workshop on Autonomous Vehicle Vision 2021.\\
Fast-CLOCs & & 42.72 \% & 52.10 \% & 39.08 \% & 0.1 s / GPU & \\
STD & & 42.47 \% & 53.29 \% & 38.35 \% & 0.08 s / GPU & Z. Yang, Y. Sun, S. Liu, X. Shen and J. Jia: STD: Sparse-to-Dense 3D Object Detector for Point Cloud. ICCV 2019.\\
AVOD-FPN & la & 42.27 \% & 50.46 \% & 39.04 \% & 0.1 s / & J. Ku, M. Mozifian, J. Lee, A. Harakeh and S. Waslander: Joint 3D Proposal Generation and Object Detection from View Aggregation. IROS 2018.\\
SemanticVoxels & & 42.19 \% & 50.90 \% & 39.52 \% & 0.04 s / GPU & J. Fei, W. Chen, P. Heidenreich, S. Wirges and C. Stiller: SemanticVoxels: Sequential Fusion for 3D Pedestrian Detection using LiDAR Point Cloud and Semantic Segmentation. MFI 2020.\\
TBD & & 42.19 \% & 49.89 \% & 39.34 \% & 0.05 s / 1 core & \\
TBD & & 42.19 \% & 49.89 \% & 39.34 \% & 0.03 s / 1 core & \\
F-PointNet & la & 42.15 \% & 50.53 \% & 38.08 \% & 0.17 s / GPU & C. Qi, W. Liu, C. Wu, H. Su and L. Guibas: Frustum PointNets for 3D Object Detection from RGB-D Data. arXiv preprint arXiv:1711.08488 2017.\\
TBD & la & 42.05 \% & 48.66 \% & 38.94 \% & 0.1 s / 1 core & \\
PointPillars & la & 41.92 \% & 51.45 \% & 38.89 \% & 16 ms / & A. Lang, S. Vora, H. Caesar, L. Zhou, J. Yang and O. Beijbom: PointPillars: Fast Encoders for Object Detection from Point Clouds. CVPR 2019.\\
TBD\_IOU1 & & 41.65 \% & 49.00 \% & 39.31 \% & 0.1 s / 1 core & \\
epBRM & la & 41.52 \% & 49.17 \% & 39.08 \% & 0.10 s / 1 core & K. Shin: Improving a Quality of 3D Object Detection by Spatial Transformation Mechanism. arXiv preprint arXiv:1910.04853 2019.\\
TBD\_IOU & & 41.45 \% & 48.25 \% & 39.17 \% & 0.1 s / 1 core & \\
GNN-RCNN & & 41.32 \% & 47.48 \% & 38.97 \% & 0.1 s / 1 core & \\
tbd & & 41.10 \% & 50.56 \% & 37.49 \% & 0.04 s / 1 core & \\
IA-SSD (single) & & 41.03 \% & 47.90 \% & 37.98 \% & 0.013 s / 1 core & \\
Generalized-SIENet & & 40.97 \% & 47.01 \% & 38.88 \% & 0.08 s / 1 core & \\
PointPainting & la & 40.97 \% & 50.32 \% & 37.87 \% & 0.4 s / GPU & S. Vora, A. Lang, B. Helou and O. Beijbom: PointPainting: Sequential Fusion for 3D Object Detection. CVPR 2020.\\
SCIR-Net & la & 40.95 \% & 49.23 \% & 38.47 \% & 0.03 s / GPU & \\
DSA-PV-RCNN & la & 40.89 \% & 46.97 \% & 38.80 \% & 0.08 s / 1 core & P. Bhattacharyya, C. Huang and K. Czarnecki: SA-Det3D: Self-Attention Based Context-Aware 3D Object Detection. 2021.\\
SARFE & & 40.79 \% & 47.29 \% & 38.25 \% & 0.03 s / 1 core & \\
MSADet & & 40.58 \% & 49.54 \% & 38.19 \% & 0.05 s / 1 core & ERROR: Wrong syntax in BIBTEX file.\\
SAA-SECOND & & 40.57 \% & 48.73 \% & 37.77 \% & 38m s / 1 core & \\
PDV & & 40.56 \% & 47.80 \% & 38.46 \% & 0.1 s / 1 core & \\
WHUT-iou\_ssd & & 40.53 \% & 46.41 \% & 38.48 \% & 0.045s / 1 core & \\
E^2-PV-RCNN & & 40.47 \% & 46.61 \% & 38.60 \% & 0.08 s / 1 core & \\
SA-voxel-centernet & & 40.43 \% & 46.10 \% & 38.32 \% & 0.04 s / 1 core & \\
FusionDetv2-v3 & & 40.38 \% & 46.86 \% & 37.41 \% & 0.05 s / 1 core & \\
FPCR-CNN & & 40.32 \% & 48.33 \% & 37.66 \% & 0.05 s / 1 core & \\
P2V\_PCV1 & & 40.27 \% & 45.43 \% & 38.24 \% & 0.1 s / 1 core & \\
sa-voxel-centernet & & 40.24 \% & 46.08 \% & 38.07 \% & 0.04 s / 1 core & \\
FPC-RCNN & & 40.13 \% & 46.41 \% & 37.84 \% & 0.05 s / 1 core & \\
TBD & & 40.07 \% & 46.11 \% & 37.87 \% & 0.1 s / 1 core & \\
TPCG & & 39.97 \% & 46.35 \% & 37.66 \% & 0.1 s / 1 core & \\
M3DeTR & & 39.94 \% & 45.70 \% & 37.66 \% & n/a s / GPU & T. Guan, J. Wang, S. Lan, R. Chandra, Z. Wu, L. Davis and D. Manocha: M3DeTR: Multi-representation, Multi- scale, Mutual-relation 3D Object Detection with Transformers. 2021.\\
FusionDetv2-v5 & & 39.91 \% & 47.50 \% & 37.39 \% & 0.05 s / 1 core & \\
DDet & & 39.87 \% & 45.82 \% & 38.00 \% & 0.1 s / 1 core & \\
MVOD & & 39.82 \% & 46.22 \% & 37.56 \% & 0.16 s / 1 core & \\
Point Image Fusion & & 39.79 \% & 45.04 \% & 37.62 \% & 0.2 s / 1 core & \\
anonymous & & 39.74 \% & 46.09 \% & 37.41 \% & 0.05s / 1 core & \\
FusionDetv2-v4 & & 39.68 \% & 46.93 \% & 37.31 \% & 0.06 s / 1 core & \\
DSASNet & & 39.65 \% & 47.14 \% & 37.05 \% & 0.08 s / GPU & \\
Fast VP-RCNN & & 39.65 \% & 45.95 \% & 37.29 \% & 0.05 s / 1 core & \\
VCRCNN & & 39.64 \% & 45.19 \% & 37.55 \% & 0.1 s / 1 core & \\
FusionDetv1 & & 39.42 \% & 47.30 \% & 36.97 \% & 0.1 s / 1 core & \\
demo & & 39.38 \% & 47.69 \% & 36.06 \% & 0.04 s / 1 core & \\
MMLab-PointRCNN & la & 39.37 \% & 47.98 \% & 36.01 \% & 0.1 s / GPU & S. Shi, X. Wang and H. Li: Pointrcnn: 3d object proposal generation and detection from point cloud. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2019.\\
ST-RCNN & la & 39.36 \% & 44.96 \% & 37.09 \% & 0.04 s / 1 core & \\
FusionDetv2-v2 & & 39.31 \% & 44.98 \% & 37.22 \% & 0.04 s / 1 core & \\
ARPNET & & 39.31 \% & 48.32 \% & 35.93 \% & 0.08 s / GPU & Y. Ye, C. Zhang and X. Hao: ARPNET: attention region proposal network for 3D object detection. Science China Information Sciences 2019.\\
TBD & & 39.31 \% & 46.85 \% & 36.11 \% & 0.06 s / 1 core & \\
IA-SSD (multi) & & 39.03 \% & 46.51 \% & 35.61 \% & 0.014 s / 1 core & \\
tbd & & 38.89 \% & 45.98 \% & 35.94 \% & 0.1 s / 1 core & \\
NV-RCNN & & 38.75 \% & 47.05 \% & 36.52 \% & 0.1 s / 1 core & \\
SIF & & 38.74 \% & 46.23 \% & 36.06 \% & 0.1 s / 1 core & P. An: SIF. Submitted to CVIU 2021.\\
SCNet & la & 38.66 \% & 47.83 \% & 35.70 \% & 0.04 s / GPU & Z. Wang, H. Fu, L. Wang, L. Xiao and B. Dai: SCNet: Subdivision Coding Network for Object Detection Based on 3D Point Cloud. IEEE Access 2019.\\
Faraway-Frustum & la & 38.58 \% & 46.33 \% & 35.71 \% & 0.1 s / GPU & H. Zhang, D. Yang, E. Yurtsever, K. Redmill and U. Ozguner: Faraway-Frustum: Dealing with Lidar Sparsity for 3D Object Detection using Fusion. arXiv preprint arXiv:2011.01404 2020.\\
FPV-SSD & & 38.45 \% & 45.83 \% & 36.03 \% & 0.03 s / 1 core & ERROR: Wrong syntax in BIBTEX file.\\
MKFFNet & & 38.05 \% & 46.01 \% & 35.72 \% & 0.1 s / 1 core & \\
FPC3D\_all & la & 37.95 \% & 45.49 \% & 35.60 \% & 0.03 s / 1 core & \\
VGCN & & 37.60 \% & 45.28 \% & 34.96 \% & 0.09 s / 1 core & ERROR: Wrong syntax in BIBTEX file.\\
DVFENet & & 37.50 \% & 43.55 \% & 35.33 \% & 0.05 s / 1 core & Y. He, G. Xia, Y. Luo, L. Su, Z. Zhang, W. Li and P. Wang: DVFENet: Dual-branch Voxel Feature Extraction Network for 3D Object Detection. Neurocomputing 2021.\\
MLOD & la & 37.47 \% & 47.58 \% & 35.07 \% & 0.12 s / GPU & J. Deng and K. Czarnecki: MLOD: A multi-view 3D object detection based on robust feature fusion method. arXiv preprint arXiv:1909.04163 2019.\\
S-AT GCN & & 37.37 \% & 44.63 \% & 34.92 \% & 0.02 s / GPU & L. Wang, C. Wang, X. Zhang, T. Lan and J. Li: S-AT GCN: Spatial-Attention Graph Convolution Network based Feature Enhancement for 3D Object Detection. CoRR 2021.\\
YF & & 36.99 \% & 44.43 \% & 34.40 \% & 0.04 s / GPU & \\
HS3D & & 36.86 \% & 45.62 \% & 33.67 \% & 0.12 s / 1 core & \\
XView & & 36.79 \% & 42.44 \% & 34.96 \% & 0.1 s / 1 core & L. Xie, G. Xu, D. Cai and X. He: X-view: Non-egocentric Multi-View 3D Object Detector. 2021.\\
MKFFNet & & 36.66 \% & 43.94 \% & 34.56 \% & 0.01s / 1 core & ERROR: Wrong syntax in BIBTEX file.\\
FusionDetv2-baseline & & 36.66 \% & 41.34 \% & 34.60 \% & 0.04 s / 1 core & \\
MKFFNet & & 36.65 \% & 44.00 \% & 34.59 \% & 0.1 s / 1 core & \\
TBD & & 36.53 \% & 44.11 \% & 34.30 \% & TBD / GPU & \\
PFF3D & la & 36.07 \% & 43.93 \% & 32.86 \% & 0.05 s / GPU & L. Wen and K. Jo: Fast and Accurate 3D Object Detection for Lidar-Camera-Based Autonomous Vehicles Using One Shared Voxel-Based Backbone. IEEE Access 2021.\\
NV2P-RCNN & & 35.98 \% & 43.18 \% & 33.88 \% & 0.1 s / GPU & \\
ASCNet & & 35.76 \% & 42.00 \% & 33.69 \% & 0.09 s / 1 core & \\
RoIFusion & & 35.14 \% & 42.22 \% & 32.92 \% & 0.22 s / 1 core & \\
BirdNet+ & la & 35.06 \% & 41.55 \% & 32.93 \% & 0.11 s / & A. Barrera, C. Guindel, J. Beltrán and F. García: BirdNet+: End-to-End 3D Object Detection in LiDAR Bird’s Eye View. 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC) 2020.\\
TBD\_BD & & 34.86 \% & 42.56 \% & 32.49 \% & 0.03 s / 1 core & \\
AB3DMOT & la on & 34.59 \% & 42.27 \% & 31.37 \% & 0.0047s / 1 core & X. Weng and K. Kitani: A Baseline for 3D Multi-Object Tracking. arXiv:1907.03961 2019.\\
DSGN++ & & 32.74 \% & 43.05 \% & 29.54 \% & 0.4 s / 1 core & \\
PP-PCdet & & 32.04 \% & 39.23 \% & 29.79 \% & 0.01 s / 1 core & \\
Contrastive PP & & 31.64 \% & 38.47 \% & 29.30 \% & 0.01 s / 1 core & \\
BirdNet+ (legacy) & la & 31.46 \% & 37.99 \% & 29.46 \% & 0.1 s / & A. Barrera, C. Guindel, J. Beltrán and F. García: BirdNet+: End-to-End 3D Object Detection in LiDAR Bird’s Eye View. 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC) 2020.\\
SparsePool & & 30.38 \% & 37.84 \% & 26.94 \% & 0.13 s / 8 cores & Z. Wang, W. Zhan and M. Tomizuka: Fusing bird view lidar point cloud and front view camera image for deep object detection. arXiv preprint arXiv:1711.06703 2017.\\
MMLAB LIGA-Stereo & st & 30.00 \% & 40.46 \% & 27.07 \% & 0.4 s / 1 core & X. Guo, S. Shi, X. Wang and H. Li: LIGA-Stereo: Learning LiDAR Geometry Aware Representations for Stereo-based 3D Detector. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) 2021.\\
SparsePool & & 27.92 \% & 35.52 \% & 25.87 \% & 0.13 s / 8 cores & Z. Wang, W. Zhan and M. Tomizuka: Fusing bird view lidar point cloud and front view camera image for deep object detection. arXiv preprint arXiv:1711.06703 2017.\\
AVOD & la & 27.86 \% & 36.10 \% & 25.76 \% & 0.08 s / & J. Ku, M. Mozifian, J. Lee, A. Harakeh and S. Waslander: Joint 3D Proposal Generation and Object Detection from View Aggregation. IROS 2018.\\
CSW3D & la & 26.64 \% & 33.75 \% & 23.34 \% & 0.03 s / 4 cores & J. Hu, T. Wu, H. Fu, Z. Wang and K. Ding: Cascaded Sliding Window Based Real-Time 3D Region Proposal for Pedestrian Detection. ROBIO 2019.\\
PointRGBNet & & 26.40 \% & 34.77 \% & 24.03 \% & 0.08 s / 4 cores & \\
Disp R-CNN (velo) & st & 25.80 \% & 37.12 \% & 22.04 \% & 0.387 s / GPU & J. Sun, L. Chen, Y. Xie, S. Zhang, Q. Jiang, X. Zhou and H. Bao: Disp R-CNN: Stereo 3D Object Detection via Shape Prior Guided Instance Disparity Estimation. CVPR 2020.\\
Disp R-CNN & st & 25.40 \% & 35.75 \% & 21.79 \% & 0.387 s / GPU & J. Sun, L. Chen, Y. Xie, S. Zhang, Q. Jiang, X. Zhou and H. Bao: Disp R-CNN: Stereo 3D Object Detection via Shape Prior Guided Instance Disparity Estimation. CVPR 2020.\\
deleted & & 25.13 \% & 35.02 \% & 22.36 \% & 0.3 s / GPU & \\
FusionDetv2-v1 & & 24.55 \% & 30.58 \% & 23.64 \% & 0.04 s / 1 core & \\
CG-Stereo & st & 24.31 \% & 33.22 \% & 20.95 \% & 0.57 s / & C. Li, J. Ku and S. Waslander: Confidence Guided Stereo 3D Object Detection with Split Depth Estimation. IROS 2020.\\
NCL & & 23.33 \% & 27.75 \% & 21.66 \% & NA s / 1 core & \\
LIGA-Stereo-old & st & 23.23 \% & 30.14 \% & 20.58 \% & 0.375 s / & \\
YOLOStereo3D & st & 19.75 \% & 28.49 \% & 16.48 \% & 0.1 s / & Y. Liu, L. Wang and M. Liu: YOLOStereo3D: A Step Back to 2D for Efficient Stereo 3D Detection. 2021 International Conference on Robotics and Automation (ICRA) 2021.\\
OSE+ & & 19.67 \% & 28.30 \% & 17.17 \% & 0.1 s / 1 core & \\
AEC3D & & 19.00 \% & 24.39 \% & 17.43 \% & 18 ms / GPU & \\
BEVC & & 17.65 \% & 23.49 \% & 15.92 \% & 35ms / GPU & \\
OC Stereo & st & 17.58 \% & 24.48 \% & 15.60 \% & 0.35 s / 1 core & A. Pon, J. Ku, C. Li and S. Waslander: Object-Centric Stereo Matching for 3D Object Detection. ICRA 2020.\\
BirdNet & la & 17.08 \% & 22.04 \% & 15.82 \% & 0.11 s / & J. Beltrán, C. Guindel, F. Moreno, D. Cruzado, F. García and A. Escalera: BirdNet: A 3D Object Detection Framework from LiDAR Information. 2018 21st International Conference on Intelligent Transportation Systems (ITSC) 2018.\\
VN3D & & 15.69 \% & 19.56 \% & 13.17 \% & 0.02 s / 1 core & ERROR: Wrong syntax in BIBTEX file.\\
DSGN & st & 15.55 \% & 20.53 \% & 14.15 \% & 0.67 s / & Y. Chen, S. Liu, X. Shen and J. Jia: DSGN: Deep Stereo Geometry Network for 3D Object Detection. CVPR 2020.\\
SOD & & 14.68 \% & 21.13 \% & 12.67 \% & 0.1 s / 1 core & \\
Complexer-YOLO & la & 13.96 \% & 17.60 \% & 12.70 \% & 0.06 s / GPU & M. Simon, K. Amende, A. Kraus, J. Honer, T. Samann, H. Kaulbersch, S. Milz and H. Michael Gross: Complexer-YOLO: Real-Time 3D Object Detection and Tracking on Semantic Point Clouds. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops 2019.\\
RT3D-GMP & st & 11.41 \% & 16.23 \% & 10.12 \% & 0.06 s / GPU & H. Königshof and C. Stiller: Learning-Based Shape Estimation with Grid Map Patches for Realtime 3D Object Detection for Automated Driving. 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC) 2020.\\
PS-fld & & 10.82 \% & 16.95 \% & 9.26 \% & 0.25 s / 1 core & \\
CMKD & & 10.39 \% & 16.89 \% & 9.29 \% & 0.1 s / 1 core & \\
EGFN & st & 10.27 \% & 14.05 \% & 9.02 \% & 0.06 s / GPU & \\
MonoDTR & & 10.18 \% & 15.33 \% & 8.61 \% & 0.04 s / 1 core & \\
GUPNet & & 9.76 \% & 14.95 \% & 8.41 \% & NA s / 1 core & Y. Lu, X. Ma, L. Yang, T. Zhang, Y. Liu, Q. Chu, J. Yan and W. Ouyang: Geometry Uncertainty Projection Network for Monocular 3D Object Detection. arXiv preprint arXiv:2107.13774 2021.\\
DD3D & & 9.30 \% & 13.91 \% & 8.05 \% & n/a s / 1 core & D. Park, R. Ambrus, V. Guizilini, J. Li and A. Gaidon: Is Pseudo-Lidar needed for Monocular 3D Object detection?. IEEE/CVF International Conference on Computer Vision (ICCV) .\\
ZongmuMono3dV2 & & 9.20 \% & 14.53 \% & 7.82 \% & 0.08 s / 1 core & \\
ZongmuMono3d & & 9.18 \% & 14.23 \% & 7.82 \% & 0.08 s / 1 core & \\
SGM3D & & 8.81 \% & 13.99 \% & 7.26 \% & 0.03 s / 1 core & \\
SCSTSV-MonoFlex & & 8.75 \% & 13.10 \% & 7.38 \% & 0.03 s / 1 core & \\
gupnet\_se & & 8.65 \% & 13.40 \% & 7.78 \% & 0.03s / 1 core & \\
SwinMono3D & & 8.54 \% & 12.96 \% & 7.19 \% & 0.08 s / 1 core & \\
MonoCon & & 8.41 \% & 13.10 \% & 6.94 \% & 0.02 s / GPU & \\
MonoEdge & & 8.33 \% & 12.11 \% & 7.03 \% & 0.05 s / GPU & \\
MonoFlex & & 8.16 \% & 11.89 \% & 6.81 \% & 0.03 s / 1 core & \\
CaDDN & & 8.14 \% & 12.87 \% & 6.76 \% & 0.63 s / GPU & C. Reading, A. Harakeh, J. Chae and S. Waslander: Categorical Depth Distribution Network for Monocular 3D Object Detection. CVPR 2021.\\
MonoGround & & 7.89 \% & 12.37 \% & 7.13 \% & 0.03 s / 1 core & \\
vadin-TBD & & 7.66 \% & 11.87 \% & 6.82 \% & 0.04 s / 1 core & \\
MonoLCD & & 7.62 \% & 11.21 \% & 6.47 \% & 0.04 s / 1 core & \\
K3D & & 7.60 \% & 12.58 \% & 6.73 \% & 0.3 s / 1 core & \\
ANM & & 7.54 \% & 11.92 \% & 6.37 \% & 0.02 s / 1 core & \\
SAIC\_ADC\_Mono3D & & 7.54 \% & 12.06 \% & 6.41 \% & 50 s / GPU & \\
Mix-Teaching-M3D & & 7.47 \% & 11.67 \% & 6.61 \% & 0.03 s / 1 core & \\
LPCG-Monoflex & & 7.33 \% & 10.82 \% & 6.18 \% & 0.03 s / 1 core & \\
MonoDDE & & 7.32 \% & 11.13 \% & 6.67 \% & 0.04 s / 1 core & \\
RefinedMPL & & 7.18 \% & 11.14 \% & 5.84 \% & 0.15 s / GPU & J. Vianney, S. Aich and B. Liu: RefinedMPL: Refined Monocular PseudoLiDAR for 3D Object Detection in Autonomous Driving. arXiv preprint arXiv:1911.09712 2019.\\
MonoEdge-Rotate & & 7.02 \% & 10.47 \% & 5.84 \% & 0.05 s / GPU & \\
TopNet-HighRes & la & 6.92 \% & 10.40 \% & 6.63 \% & 101ms / & S. Wirges, T. Fischer, C. Stiller and J. Frias: Object Detection and Classification in Occupancy Grid Maps Using Deep Convolutional Networks. 2018 21st International Conference on Intelligent Transportation Systems (ITSC) 2018.\\
MonoRUn & & 6.78 \% & 10.88 \% & 5.83 \% & 0.07 s / GPU & H. Chen, Y. Huang, W. Tian, Z. Gao and L. Xiong: MonoRUn: Monocular 3D Object Detection by Reconstruction and Uncertainty Propagation. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2021.\\
MonoPair & & 6.68 \% & 10.02 \% & 5.53 \% & 0.06 s / GPU & Y. Chen, L. Tai, K. Sun and M. Li: MonoPair: Monocular 3D Object Detection Using Pairwise Spatial Relationships. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2020.\\
mono3d & & 6.62 \% & 10.10 \% & 5.46 \% & 0.03 s / GPU & \\
monodle & & 6.55 \% & 9.64 \% & 5.44 \% & 0.04 s / GPU & X. Ma, Y. Zhang, D. Xu, D. Zhou, S. Yi, H. Li and W. Ouyang: Delving into Localization Errors for Monocular 3D Object Detection. CVPR 2021 .\\
MonoFlex & & 6.31 \% & 9.43 \% & 5.26 \% & 0.03 s / GPU & Y. Zhang, J. Lu and J. Zhou: Objects are Different: Flexible Monocular 3D Object Detection. CVPR 2021.\\
GAC3D++ & & 6.29 \% & 9.29 \% & 5.20 \% & 0.25 s / 1 core & \\
RelationNet3D\_dla34 & & 6.22 \% & 9.28 \% & 5.23 \% & 0.04 s / 1 core & \\
E2E-DA & & 5.95 \% & 8.79 \% & 5.72 \% & 0.03 s / 1 core & \\
M3DSSD++ & & 5.65 \% & 8.10 \% & 4.72 \% & 0.16s / 1 core & \\
MonoGeo & & 5.63 \% & 8.00 \% & 4.71 \% & 0.05 s / 1 core & \\
ICCV & & 5.25 \% & 8.34 \% & 4.72 \% & 0.04 s / GPU & \\
MonoHMOO & & 5.23 \% & 7.62 \% & 4.28 \% & 0.2 s / 1 core & \\
RelationNet3D\_res18 & & 5.19 \% & 7.95 \% & 4.21 \% & 0.04 s / 1 core & \\
Aug3D-RPN & & 4.71 \% & 6.01 \% & 3.87 \% & 0.08 s / 1 core & C. He, J. Huang, X. Hua and L. Zhang: Aug3D-RPN: Improving Monocular 3D Object Detection by Synthetic Images with Virtual Depth. 2021.\\
MM & & 4.70 \% & 7.81 \% & 4.01 \% & 1 s / 1 core & \\
Shift R-CNN (mono) & & 4.66 \% & 7.95 \% & 4.16 \% & 0.25 s / GPU & A. Naiden, V. Paunescu, G. Kim, B. Jeon and M. Leordeanu: Shift R-CNN: Deep Monocular 3D Object Detection With Closed-form Geometric Constraints. ICIP 2019.\\
Lite-FPN & & 4.38 \% & 6.57 \% & 3.56 \% & 0.01 s / 1 core & \\
COF3D & & 4.37 \% & 6.02 \% & 3.55 \% & 200 s / 1 core & \\
PLDet3d & & 4.25 \% & 6.31 \% & 3.49 \% & 0.11 s / 1 core & \\
MAOLoss & & 4.18 \% & 5.81 \% & 3.67 \% & 0.05 s / 1 core & \\
MonoPSR & & 4.00 \% & 6.12 \% & 3.30 \% & 0.2 s / GPU & J. Ku*, A. Pon* and S. Waslander: Monocular 3D Object Detection Leveraging Accurate Proposals and Shape Reconstruction. CVPR 2019.\\
MP-Mono & & 3.75 \% & 5.09 \% & 3.50 \% & 0.16 s / GPU & \\
Geo3D & & 3.65 \% & 5.74 \% & 3.01 \% & 0.04 s / GPU & ERROR: Wrong syntax in BIBTEX file.\\
DFR-Net & & 3.62 \% & 6.09 \% & 3.39 \% & 0.18 s / & Z. Zou, X. Ye, L. Du, X. Cheng, X. Tan, L. Zhang, J. Feng, X. Xue and E. Ding: The devil is in the task: Exploiting reciprocal appearance-localization features for monocular 3d object detection . ICCV 2021.\\
DDMP-3D & & 3.55 \% & 4.93 \% & 3.01 \% & 0.18 s / 1 core & L. Wang, L. Du, X. Ye, Y. Fu, G. Guo, X. Xue, J. Feng and L. Zhang: Depth-conditioned Dynamic Message Propagation for Monocular 3D Object Detection. CVPR 2020.\\
E2E-DA-Lite (Res18) & & 3.51 \% & 5.82 \% & 3.42 \% & 0.01 s / GPU & \\
M3D-RPN & & 3.48 \% & 4.92 \% & 2.94 \% & 0.16 s / GPU & G. Brazil and X. Liu: M3D-RPN: Monocular 3D Region Proposal Network for Object Detection . ICCV 2019 .\\
D4LCN & & 3.42 \% & 4.55 \% & 2.83 \% & 0.2 s / GPU & M. Ding, Y. Huo, H. Yi, Z. Wang, J. Shi, Z. Lu and P. Luo: Learning Depth-Guided Convolutions for Monocular 3D Object Detection. CVPR 2020.\\
QD-3DT & on & 3.37 \% & 5.53 \% & 3.02 \% & 0.03 s / GPU & H. Hu, Y. Yang, T. Fischer, F. Yu, T. Darrell and M. Sun: Monocular Quasi-Dense 3D Object Tracking. ArXiv:2103.07351 2021.\\
MonoEF & & 2.79 \% & 4.27 \% & 2.21 \% & 0.03 s / 1 core & Y. Zhou, Y. He, H. Zhu, C. Wang, H. Li and Q. Jiang: Monocular 3D Object Detection: An Extrinsic Parameter Free Approach. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021.\\
KAIST-VDCLab & & 2.47 \% & 3.27 \% & 2.43 \% & 0.04 s / 1 core & \\
RT3DStereo & st & 2.45 \% & 3.28 \% & 2.35 \% & 0.08 s / GPU & H. Königshof, N. Salscheider and C. Stiller: Realtime 3D Object Detection for Automated Driving Using Stereo Vision and Semantic Information. Proc. IEEE Intl. Conf. Intelligent Transportation Systems 2019.\\
TopNet-UncEst & la & 1.87 \% & 3.42 \% & 1.73 \% & 0.09 s / & S. Wirges, M. Braun, M. Lauer and C. Stiller: Capturing Object Detection Uncertainty in Multi-Layer Grid Maps. 2019.\\
PPTrans & & 1.85 \% & 2.68 \% & 1.44 \% & 0.2 s / GPU & \\
TBD & & 1.81 \% & 3.00 \% & 1.59 \% & 0.3 s / 1 core & \\
SS3D & & 1.78 \% & 2.31 \% & 1.48 \% & 48 ms / & E. Jörgensen, C. Zach and F. Kahl: Monocular 3D Object Detection and Box Fitting Trained End-to-End Using Intersection-over-Union Loss. CoRR 2019.\\
PGD-FCOS3D & & 1.49 \% & 2.28 \% & 1.38 \% & 0.03 s / 1 core & T. Wang, X. Zhu, J. Pang and D. Lin: Probabilistic and Geometric Depth: Detecting Objects in Perspective. Conference on Robot Learning (CoRL) 2021.\\
SparVox3D & & 1.35 \% & 1.93 \% & 1.04 \% & 0.05 s / GPU & E. Balatkan and F. Kıraç: Improving Regression Performance on Monocular 3D Object Detection Using Bin-Mixing and Sparse Voxel Data. 2021 6th International Conference on Computer Science and Engineering (UBMK) 2021.\\
EM & & 1.18 \% & 1.09 \% & 0.80 \% & 0.05 s / 1 core & \\
CDTrack3D & & 1.07 \% & 1.49 \% & 0.71 \% & 0.0106 s / & \\
EW & & 0.81 \% & 0.79 \% & 0.74 \% & 0.05 s / 1 core & \\
mBoW & la & 0.00 \% & 0.00 \% & 0.00 \% & 10 s / 1 core & J. Behley, V. Steinhage and A. Cremers: Laser-based Segment Classification Using a Mixture of Bag-of-Words. Proc. of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2013.
\end{tabular}