KITTI-360

Submit

Novel View Synthesis

Novel View Appearance Synthesis (50% Drop Rate)

We select 5 static scenes with a driving distance of ∼ 50 meters each for evaluating NVS at a 50% drop rate. We select one frame every ∼ 0.8 meters driving distance (corresponding to the overall average distance between frames) to avoid redundancy when the vehicle is slow. We release 50% of the frames for training and retain 50% for evaluation. Our evaluation table ranks all methods according to the peak signal-to-noise ratio (PSNR). We also evaluate structural similarity index (SSIM) and perceptual smilarity (LPIPS).

PSNR: Peak signal-to-noise ratio
SSIM: Structural similarity index
LPIPS: Perceptual Similarity using AlexNet

Table as LaTeX | Only published Methods

Novel View Semantic Synthesis (50% Drop Rate)

Our evaluation table ranks all methods according to the confidence weighted mean intersection-over-union (mIoU). The weighted IoU of one class can be defined as \(\text{IoU} = \frac{\sum_{i\in{\{\text{TP}\}}}c_{i}}{\sum_{i\in{\{\text{TP, FP, FN}\}}}c_{i}}\) where \(\{\text{TP}\}\) and \(\{\text{TP, FP, FN}\}\) are the set of image pixels in the intersection and the union of the class label, respectively. \(c_i \in [0, 1]\) denotes the confidence value at pixel \(i\). In constrast to standard evaluation where \(c_i=1\) for all pixels, we adopt confidence weighted evaluation metrics leveraging the uncertainty to take into account the ambiguity in our automatically generated annotations.

mIoU class: mean Intersection over Union over classes
mIoU category: mean Intersection over Union over categories

	Method	Setting	Code	mIoU Class	mIoU Category	Runtime	Environment
1	PNF			73.06	84.97	15 s	GPU @ 2.5 Ghz (Python)
A. Kundu, K. Genova, X. Yin, A. Fathi, C. Pantofaru, L. Guibas, A. Tagliasacchi, F. Dellaert and T. Funkhouser: Panoptic Neural Fields: A Semantic Object-Aware Neural Scene Representation. CVPR 2022.
2	HUGS			72.65	85.64	0.02 s	1 core @ 2.5 Ghz (C/C++)

3	GT Image + PSPNet			63.82	78.25	0.2 s	1 core @ 2.5 Ghz (C/C++)
Y. Liao, J. Xie and A. Geiger: KITTI-360: A Novel Dataset and Benchmarks for Urban Scene Understanding in 2D and 3D. ARXIV 2021. H. Zhao, J. Shi, X. Qi, X. Wang and J. Jia: Pyramid Scene Parsing Network. CVPR 2017.
4	FVS + PSPNet			60.86	74.61	0.4 s	1 core @ 2.5 Ghz (C/C++)
ERROR: Wrong syntax in BIBTEX file.
5	PBNR + PSPNet			58.43	71.99	1 s	1 core @ 2.5 Ghz (C/C++)
G. Kopanas, J. Philip, T. Leimkühler and G. Drettakis: Point-Based Neural Rendering with Per-View Optimization. Computer Graphics Forum (Proceedings of the Eurographics Symposium on Rendering) 2021. H. Zhao, J. Shi, X. Qi, X. Wang and J. Jia: Pyramid Scene Parsing Network. CVPR 2017.
6	NeRF + PSPNet			49.57	69.14	15 s	GPU @ 2.5 Ghz (Python)
B. Mildenhall, P. Srinivasan, M. Tancik, J. Barron, R. Ramamoorthi and R. Ng: NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. ECCV 2020. H. Zhao, J. Shi, X. Qi, X. Wang and J. Jia: Pyramid Scene Parsing Network. CVPR 2017.
7	mip-NeRF + PSPNet			48.25	67.47	15 s	GPU @ 2.5 Ghz (Python)
J. Barron, B. Mildenhall, M. Tancik, P. Hedman, R. Martin-Brualla and P. Srinivasan: Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields. ICCV 2021. H. Zhao, J. Shi, X. Qi, X. Wang and J. Jia: Pyramid Scene Parsing Network. CVPR 2017.
8	PCL + PSPNet		code	37.21	44.55	0.4 s	1 core @ 2.5 Ghz (C/C++)
Y. Liao, J. Xie and A. Geiger: KITTI-360: A Novel Dataset and Benchmarks for Urban Scene Understanding in 2D and 3D. ARXIV 2021. H. Zhao, J. Shi, X. Qi, X. Wang and J. Jia: Pyramid Scene Parsing Network. CVPR 2017.

Table as LaTeX | Only published Methods

Novel View Appearance Synthesis (90% Drop Rate)

We select 10 static scenes with a driving distance of ∼ 50 meters each for evaluating NVS at a 90% drop rate. We select one frame every ∼ 4.0 meters driving distance (corresponding to the overall average distance between frames) to avoid redundancy when the vehicle is slow. We release 50% of the frames for training and retain 50% for evaluation. Our evaluation table ranks all methods according to the peak signal-to-noise ratio (PSNR). We also evaluate structural similarity index (SSIM) and perceptual smilarity (LPIPS). Our evaluation table ranks all methods according to the peak signal-to-noise ratio (PSNR). We also evaluate structural similarity index (SSIM) and perceptual smilarity (LPIPS).

PSNR: Peak signal-to-noise ratio
SSIM: Structural similarity index
LPIPS: Perceptual Similarity using AlexNet

	Method	Setting	Code	PSNR	SSIM	LPIPS	Runtime	Environment
1	DGNerf		code	17.33	0.714	0.397	1 s	1 core @ 2.5 Ghz (C/C++)

2	MVSRegNeRF			17.20	0.702	0.424	2 s	1 core @ 2.5 Ghz (C/C++)
F. Bian, S. Xiong, R. Yi and L. Ma: Multi-view stereo-regulated NeRF for urban scene novel view synthesis. The Visual Computer 2024.
3	NeRF			15.74	0.648	0.590	10 s	1 core @ 2.5 Ghz (C/C++)
B. Mildenhall, P. Srinivasan, M. Tancik, J. Barron, R. Ramamoorthi and R. Ng: NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. ECCV 2020.

Table as LaTeX | Only published Methods

Novel View Semantic Synthesis (90% Drop Rate)

mIoU class: mean Intersection over Union over classes
mIoU category: mean Intersection over Union over categories