We adopt the standard Absolute Pose Error (APE) and Relative Pose Error (RPE) as metrics for evaluating pose estimation. We align the predicted trajectory to the ground truth using a rigid transformation to evaluate the APE. The RPE is evaluated between two frames with a distance of 1 meter.

**APE:**Absolute Pose Error**RPE:**Relative Pose Error

We evaluate geometric completion and semantic estimation and rank the methods according to the confidence weighted mean intersection-over-union (mIoU). Geometric completion is evaluated via completeness and accuracy at a threshold of 20cm. Completeness is calculated as the fraction of ground truth points of which the distances to their closest reconstructed points are below the threshold. Accuracy instead measures the percentage of reconstructed points that are within a distance threshold to the ground truth points. As our ground truth reconstruction may not be complete, we prevent punishing reconstructed points by dividing the space into observed and unobserved regions, which are determined by the unobserved volume from a 3D occupancy map obtained using OctoMap. We further measure the F1 score as the harmonic mean of the completeness and the accuracy.

**Accuracy:**Percentage of reconstructed points that are within a distance threshold to the ground truth points**Completeness:**Percentage of ground truth points that are within a distance threshold to the reconstructed points**F1:**Harmonic mean of the accuracy and completeness**mIoU Class:**Confidence weighted mean intersection-over-union over object classes