KITTI-360

Method

Learning Multi-View Aggregation In the Wild for Large-Scale 3D Semantic Segmentation [DeepViewAggregation]
https://github.com/drprojects/DeepViewAgg

Submitted on 7 Feb. 2022 17:19 by
Damien Robert (Institut Géographique National)

Running time:		-
Environment:		NVIDIA V100

Method Description:

Recent work on 3D semantic segmentation proposes to
exploit the synergy between images and point clouds by
processing each modality with a dedicated network and
projecting learned 2D features onto 3D points. Merging
large-scale point clouds and images raises several
challenges, such as constructing a mapping between
points and pixels and aggregating features between
multiple views. Current methods rely on mesh
reconstruction or specialized sensors to recover
occlusions, and use heuristics to select and aggregate
images. In contrast, we propose an end-to-end trainable
multi-view aggregation model leveraging the viewing
conditions of 3D points to merge features from images
taken at arbitrary positions. Our method can combine
standard 2D and 3D networks and outperforms both 3D
models operating on colorized point clouds and hybrid
2D/3D networks without requiring colorization, meshing,
or true depth maps.

Parameters:

epochs=60, sample_per_epoch=12000, r_max=20,
camera=1

Latex Bibtex:

@inproceedings{robert2022dva,
title={Learning Multi-View Aggregation In the Wild for
Large-Scale 3D Semantic Segmentation},
author={Robert, Damien and Vallet, Bruno and Landrieu,
Loic},
booktitle={Proceedings of the IEEE/CVF Conference on
Computer Vision and Pattern Recognition},
year={2022}
}

Test Set Average

road	sidewalk	building	wall	fence	pole	traffic light	traffic sign	vegetation	terrain	person	car	truck	motorcycle	bicycle	mIoU class
95.25	83.03	86.98	50.71	44.35	57.33	32.27	38.12	90.64	84.86	13.81	96.35	39.98	43.42	16.69	58.25

flat	construction	object	nature	human	vehicle	mIoU category
95.86	82.37	61.18	92.64	13.81	96.10	73.66

This table as LaTeX