KITTI-360

Semantic SLAM

Trajectory Estimation

We adopt the standard Absolute Pose Error (APE) and Relative Pose Error (RPE) as metrics for evaluating pose estimation. We align the predicted trajectory to the ground truth using a rigid transformation to evaluate the APE. The RPE is evaluated between two frames with a distance of 1 meter.

APE: Absolute Pose Error
RPE: Relative Pose Error

	Method	Setting	Code	APE	RPE	Runtime	Environment
1	CT-ICP2		code	0.50	1.00 %	0.06 s	1 core @ 3.5 Ghz (C/C++)
P. Dellenbach, J. Deschaud, B. Jacquet and F. Goulette: CT-ICP: Real-time Elastic LiDAR Odometry with Loop Closure. 2022 International Conference on Robotics and Automation (ICRA) 2022.
2	SOFT2			0.70	0.84 %	0.1 s	4 cores @ 2.5 Ghz (C/C++)
I. Cvišić, I. Marković and I. Petrović: SOFT2: Stereo Visual Odometry for Road Vehicles Based on a Point-to-Epipolar-Line Metric. IEEE Transactions on Robotics 2022.
3	ORB-SLAM2			1.92	2.03 %		NVIDIA V100
R. Mur-Artal and J. Tard'{o}s: ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo, and RGB-D Cameras. TRO 2017.
4	SUMA++			3.13	2.72 %		NVIDIA V100
X. Chen, A. Milioto, E. Palazzolo, P. Gigu\`{e}re, J. Behley and C. Stachniss: SuMa++: Efficient LiDAR-based Semantic SLAM. IROS 2019.

Table as LaTeX | Only published Methods

Geometric and Semantic Mapping

We evaluate geometric completion and semantic estimation and rank the methods according to the confidence weighted mean intersection-over-union (mIoU). Geometric completion is evaluated via completeness and accuracy at a threshold of 20cm. Completeness is calculated as the fraction of ground truth points of which the distances to their closest reconstructed points are below the threshold. Accuracy instead measures the percentage of reconstructed points that are within a distance threshold to the ground truth points. As our ground truth reconstruction may not be complete, we prevent punishing reconstructed points by dividing the space into observed and unobserved regions, which are determined by the unobserved volume from a 3D occupancy map obtained using OctoMap. We further measure the F1 score as the harmonic mean of the completeness and the accuracy.

Accuracy: Percentage of reconstructed points that are within a distance threshold to the ground truth points
Completeness: Percentage of ground truth points that are within a distance threshold to the reconstructed points
F1: Harmonic mean of the accuracy and completeness
mIoU Class: Confidence weighted mean intersection-over-union over object classes

	Method	Setting	Code	Accuracy	Completeness	F1	mIoU Class	Runtime	Environment
1	S-DSP		code	79.15	72.45	75.64	37.59	3s s	1 core @ 3.5 Ghz (C/C++)
G. Chen, Z. Wang, W. Dong and J. Alonso-Mora: Particle-based Instance-aware Semantic Occupancy Mapping in Dynamic Environments. 2024.
2	ORB-SLAM2 + PSPNet			81.77	74.89	78.15	32.48		NVIDIA V100
R. Mur-Artal and J. Tard'{o}s: ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo, and RGB-D Cameras. TRO 2017. H. Zhao, J. Shi, X. Qi, X. Wang and J. Jia: Pyramid Scene Parsing Network. CVPR 2017.
3	SUMA++			90.98	64.19	75.27	19.40		NVIDIA V100
X. Chen, A. Milioto, E. Palazzolo, P. Gigu\`{e}re, J. Behley and C. Stachniss: SuMa++: Efficient LiDAR-based Semantic SLAM. IROS 2019.

Table as LaTeX | Only published Methods