We select 5 static scenes with a driving distance of ∼ 50 meters each for evaluating NVS at a 50% drop rate. We select one frame every ∼ 0.8 meters driving distance (corresponding to the overall average distance between frames) to avoid redundancy when the vehicle is slow. We release 50% of the frames for training and retain 50% for evaluation. Our evaluation table ranks all methods according to the peak signal-to-noise ratio (PSNR). We also evaluate structural similarity index (SSIM) and perceptual smilarity (LPIPS).

**PSNR:**Peak signal-to-noise ratio**SSIM:**Structural similarity index**LPIPS:**Perceptual Similarity using AlexNet

Our evaluation table ranks all methods according to the confidence weighted mean intersection-over-union (mIoU). The weighted IoU of one class can be defined as \(\text{IoU} = \frac{\sum_{i\in{\{\text{TP}\}}}c_{i}}{\sum_{i\in{\{\text{TP, FP, FN}\}}}c_{i}}\) where \(\{\text{TP}\}\) and \(\{\text{TP, FP, FN}\}\) are the set of image pixels in the intersection and the union of the class label, respectively. \(c_i \in [0, 1]\) denotes the confidence value at pixel \(i\). In constrast to standard evaluation where \(c_i=1\) for all pixels, we adopt confidence weighted evaluation metrics leveraging the uncertainty to take into account the ambiguity in our automatically generated annotations.

**mIoU class:**mean Intersection over Union over classes**mIoU category:**mean Intersection over Union over categories

We select 10 static scenes with a driving distance of ∼ 50 meters each for evaluating NVS at a 90% drop rate. We select one frame every ∼ 4.0 meters driving distance (corresponding to the overall average distance between frames) to avoid redundancy when the vehicle is slow. We release 50% of the frames for training and retain 50% for evaluation. Our evaluation table ranks all methods according to the peak signal-to-noise ratio (PSNR). We also evaluate structural similarity index (SSIM) and perceptual smilarity (LPIPS). Our evaluation table ranks all methods according to the peak signal-to-noise ratio (PSNR). We also evaluate structural similarity index (SSIM) and perceptual smilarity (LPIPS).

**PSNR:**Peak signal-to-noise ratio**SSIM:**Structural similarity index**LPIPS:**Perceptual Similarity using AlexNet

Our evaluation table ranks all methods according to the confidence weighted mean intersection-over-union (mIoU). The weighted IoU of one class can be defined as \(\text{IoU} = \frac{\sum_{i\in{\{\text{TP}\}}}c_{i}}{\sum_{i\in{\{\text{TP, FP, FN}\}}}c_{i}}\) where \(\{\text{TP}\}\) and \(\{\text{TP, FP, FN}\}\) are the set of image pixels in the intersection and the union of the class label, respectively. \(c_i \in [0, 1]\) denotes the confidence value at pixel \(i\). In constrast to standard evaluation where \(c_i=1\) for all pixels, we adopt confidence weighted evaluation metrics leveraging the uncertainty to take into account the ambiguity in our automatically generated annotations.

**mIoU class:**mean Intersection over Union over classes**mIoU category:**mean Intersection over Union over categories