KITTI-360

Submit

Novel View Synthesis

Novel View Appearance Synthesis (50% Drop Rate)

We select 5 static scenes with a driving distance of ∼ 50 meters each for evaluating NVS at a 50% drop rate. We select one frame every ∼ 0.8 meters driving distance (corresponding to the overall average distance between frames) to avoid redundancy when the vehicle is slow. We release 50% of the frames for training and retain 50% for evaluation. Our evaluation table ranks all methods according to the peak signal-to-noise ratio (PSNR). We also evaluate structural similarity index (SSIM) and perceptual smilarity (LPIPS).

PSNR: Peak signal-to-noise ratio
SSIM: Structural similarity index
LPIPS: Perceptual Similarity using AlexNet

Table as LaTeX | Only published Methods

Novel View Semantic Synthesis (50% Drop Rate)

Our evaluation table ranks all methods according to the confidence weighted mean intersection-over-union (mIoU). The weighted IoU of one class can be defined as $IoU = \frac{\sum_{i \in {TP}} c_{i}}{\sum_{i \in {TP, FP, FN}} c_{i}}$ where ${TP}$ and ${TP, FP, FN}$ are the set of image pixels in the intersection and the union of the class label, respectively. $c_{i} \in [0, 1]$ denotes the confidence value at pixel $i$ . In constrast to standard evaluation where $c_{i} = 1$ for all pixels, we adopt confidence weighted evaluation metrics leveraging the uncertainty to take into account the ambiguity in our automatically generated annotations.

mIoU class: mean Intersection over Union over classes
mIoU category: mean Intersection over Union over categories

	Method	Setting	Code	mIoU Class	mIoU Category	Runtime	Environment
1	PNF			73.06	84.97	15 s	GPU @ 2.5 Ghz (Python)
A. Kundu, K. Genova, X. Yin, A. Fathi, C. Pantofaru, L. Guibas, A. Tagliasacchi, F. Dellaert and T. Funkhouser: Panoptic Neural Fields: A Semantic Object-Aware Neural Scene Representation. CVPR 2022.
2	GT Image + PSPNet			63.82	78.25	0.2 s	1 core @ 2.5 Ghz (C/C++)
Y. Liao, J. Xie and A. Geiger: KITTI-360: A Novel Dataset and Benchmarks for Urban Scene Understanding in 2D and 3D. ARXIV 2021. H. Zhao, J. Shi, X. Qi, X. Wang and J. Jia: Pyramid Scene Parsing Network. CVPR 2017.
3	FVS + PSPNet			60.86	74.61	0.4 s	1 core @ 2.5 Ghz (C/C++)
ERROR: Wrong syntax in BIBTEX file.
4	PBNR + PSPNet			58.43	71.99	1 s	1 core @ 2.5 Ghz (C/C++)
G. Kopanas, J. Philip, T. Leimkühler and G. Drettakis: Point-Based Neural Rendering with Per-View Optimization. Computer Graphics Forum (Proceedings of the Eurographics Symposium on Rendering) 2021. H. Zhao, J. Shi, X. Qi, X. Wang and J. Jia: Pyramid Scene Parsing Network. CVPR 2017.
5	NeRF + PSPNet			49.57	69.14	15 s	GPU @ 2.5 Ghz (Python)
B. Mildenhall, P. Srinivasan, M. Tancik, J. Barron, R. Ramamoorthi and R. Ng: NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. ECCV 2020. H. Zhao, J. Shi, X. Qi, X. Wang and J. Jia: Pyramid Scene Parsing Network. CVPR 2017.
6	mip-NeRF + PSPNet			48.25	67.47	15 s	GPU @ 2.5 Ghz (Python)
J. Barron, B. Mildenhall, M. Tancik, P. Hedman, R. Martin-Brualla and P. Srinivasan: Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields. ICCV 2021. H. Zhao, J. Shi, X. Qi, X. Wang and J. Jia: Pyramid Scene Parsing Network. CVPR 2017.
7	PCL + PSPNet		code	37.21	44.55	0.4 s	1 core @ 2.5 Ghz (C/C++)
Y. Liao, J. Xie and A. Geiger: KITTI-360: A Novel Dataset and Benchmarks for Urban Scene Understanding in 2D and 3D. ARXIV 2021. H. Zhao, J. Shi, X. Qi, X. Wang and J. Jia: Pyramid Scene Parsing Network. CVPR 2017.

Table as LaTeX | Only published Methods

Novel View Appearance Synthesis (90% Drop Rate)

We select 10 static scenes with a driving distance of ∼ 50 meters each for evaluating NVS at a 90% drop rate. We select one frame every ∼ 4.0 meters driving distance (corresponding to the overall average distance between frames) to avoid redundancy when the vehicle is slow. We release 50% of the frames for training and retain 50% for evaluation. Our evaluation table ranks all methods according to the peak signal-to-noise ratio (PSNR). We also evaluate structural similarity index (SSIM) and perceptual smilarity (LPIPS). Our evaluation table ranks all methods according to the peak signal-to-noise ratio (PSNR). We also evaluate structural similarity index (SSIM) and perceptual smilarity (LPIPS).

PSNR: Peak signal-to-noise ratio
SSIM: Structural similarity index
LPIPS: Perceptual Similarity using AlexNet

Table as LaTeX | Only published Methods

Novel View Semantic Synthesis (90% Drop Rate)

mIoU class: mean Intersection over Union over classes
mIoU category: mean Intersection over Union over categories

	Method	Setting	Code	PSNR	SSIM	LPIPS	Runtime	Environment
1	UdeerGS			23.60	0.880	0.156	0.01 s	GPU @ 2.5 Ghz (Python)

2	ExtraGS			23.58	0.868	0.148	0.01 s	GPU @ 2.5 Ghz (Python)

3	MVSRegNeRF			22.48	0.829	0.256	2 s	1 core @ 2.5 Ghz (C/C++)
F. Bian, S. Xiong, R. Yi and L. Ma: Multi-view stereo-regulated NeRF for urban scene novel view synthesis. The Visual Computer 2024.
4	PointNeRF++		code	22.44	0.828	0.212	20 s	1 core @ 2.5 Ghz (C/C++)
W. Sun, E. Trulls, Y. Tseng, S. Sambandam, G. Sharma, A. Tagliasacchi and K. Yi: PointNeRF++: A multi-scale, point-based Neural Radiance Field. European Conference on Computer Vision 2024.
5	PNF			22.07	0.820	0.221	15 s	GPU @ 2.5 Ghz (Python)
A. Kundu, K. Genova, X. Yin, A. Fathi, C. Pantofaru, L. Guibas, A. Tagliasacchi, F. Dellaert and T. Funkhouser: Panoptic Neural Fields: A Semantic Object-Aware Neural Scene Representation. CVPR 2022.
6	mip-NeRF		code	21.54	0.778	0.365	10 s	1 core @ 2.5 Ghz (Python)
J. Barron, B. Mildenhall, M. Tancik, P. Hedman, R. Martin-Brualla and P. Srinivasan: Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields. ICCV 2021.
7	NeRF		code	21.18	0.779	0.343	10 s	1 core @ 2.5 Ghz (Python)
B. Mildenhall, P. Srinivasan, M. Tancik, J. Barron, R. Ramamoorthi and R. Ng: NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. ECCV 2020.
8	FVS		code	20.00	0.790	0.193	0.2 s	1 core @ 2.5 Ghz (C/C++)
G. Riegler and V. Koltun: Free View Synthesis. ECCV 2020.
9	PBNR		code	19.91	0.811	0.191	0.1 s	1 core @ 2.5 Ghz (C/C++)
G. Kopanas, J. Philip, T. Leimkühler and G. Drettakis: Point-Based Neural Rendering with Per-View Optimization. Computer Graphics Forum (Proceedings of the Eurographics Symposium on Rendering) 2021.
10	Point-NeRF		code	19.44	0.796	0.266	1 s	1 core @ 2.5 Ghz (C/C++)
Q. Xu, Z. Xu, J. Philip, S. Bi, Z. Shu, K. Sunkavalli and U. Neumann: Point-nerf: Point-based neural radiance fields. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2022.
11	PCL			12.81	0.576	0.549	0.2 s	1 core @ 2.5 Ghz (C/C++)
Y. Liao, J. Xie and A. Geiger: KITTI-360: A Novel Dataset and Benchmarks for Urban Scene Understanding in 2D and 3D. ARXIV 2021.