Publications

Publications of Despoina Paschalidou

ATISS: Autoregressive Transformers for Indoor Scene Synthesis
D. Paschalidou, A. Kar, M. Shugrina, K. Kreis, A. Geiger and S. Fidler
Advances in Neural Information Processing Systems (NeurIPS), 2021

Abstract: The ability to synthesize realistic and diverse indoor furniture layouts automatically or based on partial input, unlocks many applications, from better interactive 3D tools to data synthesis for training and simulation. In this paper, we present ATISS, a novel autoregressive transformer architecture for creating diverse and plausible synthetic indoor environments, given only the room type and its floor plan. In contrast to prior work, which poses scene synthesis as sequence generation, our model generates rooms as unordered sets of objects. We argue that this formulation is more natural, as it makes ATISS generally useful beyond fully automatic room layout synthesis. For example, the same trained model can be used in interactive applications for general scene completion, partial room re-arrangement with any objects specified by the user, as well as object suggestions for any partial room. To enable this, our model leverages the permutation equivariance of the transformer when conditioning on the partial scene, and is trained to be permutation-invariant across object orderings. Our model is trained end-to-end as an autoregressive generative model using only labeled 3D bounding boxes as supervision. Evaluations on four room types in the 3D-FRONT dataset demonstrate that our model consistently generates plausible room layouts that are more realistic than existing methods. In addition, it has fewer parameters, is simpler to implement and train and runs up to 8x faster than existing methods.

Latex Bibtex Citation:
@inproceedings{Paschalidou2021NEURIPS,
author = {Despoina Paschalidou and Amlan Kar and Maria Shugrina and Karsten Kreis and Andreas Geiger and Sanja Fidler},
title = {ATISS: Autoregressive Transformers for Indoor Scene Synthesis},
booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
year = {2021}
}

Paper

Supplementary Material

Poster

Video

Project Page

Neural Parts: Learning Expressive 3D Shape Abstractions with Invertible Neural Networks
D. Paschalidou, A. Katharopoulos, A. Geiger and S. Fidler
Conference on Computer Vision and Pattern Recognition (CVPR), 2021

Abstract: Impressive progress in 3D shape extraction led to representations that can capture object geometries with high fidelity. In parallel, primitive-based methods seek to represent objects as semantically consistent part arrangements. However, due to the simplicity of existing primitive representations, these methods fail to accurately reconstruct 3D shapes using a small number of primitives/parts. We address the trade-off between reconstruction quality and number of parts with Neural Parts, a novel 3D primitive representation that defines primitives using an Invertible Neural Network (INN) which implements homeomorphic mappings between a sphere and the target object. The INN allows us to compute the inverse mapping of the homeomorphism, which in turn, enables the efficient computation of both the implicit surface function of a primitive and its mesh, without any additional post-processing. Our model learns to parse 3D objects into semantically consistent part arrangements without any part-level supervision. Evaluations on ShapeNet, D-FAUST and FreiHAND demonstrate that our primitives can capture complex geometries and thus simultaneously achieve geometrically accurate as well as interpretable reconstructions using an order of magnitude fewer primitives than state-of-the-art shape abstraction methods.

Latex Bibtex Citation:
@inproceedings{Paschalidou2021CVPR,
author = {Despoina Paschalidou and Angelos Katharopoulos and Andreas Geiger and Sanja Fidler},
title = {Neural Parts: Learning Expressive 3D Shape Abstractions with Invertible Neural Networks},
booktitle = {Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2021}
}

Paper

Supplementary Material

Learning Unsupervised Hierarchical Part Decomposition of 3D Objects from a Single RGB Image
D. Paschalidou, L. Gool and A. Geiger
Conference on Computer Vision and Pattern Recognition (CVPR), 2020

Abstract: Humans perceive the 3D world as a set of distinct objects that are characterized by various low-level (geometry, reflectance) and high-level (connectivity, adjacency, symmetry) properties. Recent methods based on convolutional neural networks (CNNs) demonstrated impressive progress in 3D reconstruction, even when using a single 2D image as input. However, the majority of these methods focuses on recovering the local 3D geometry of an object without considering its part-based decomposition or relations between parts. We address this challenging problem by proposing a novel formulation that allows to jointly recover the geometry of a 3D object as a set of primitives as well as their latent hierarchical structure without part-level supervision. Our model recovers the higher level structural decomposition of various objects in the form of a binary tree of primitives, where simple parts are represented with fewer primitives and more complex parts are modeled with more components. Our experiments on the ShapeNet and D-FAUST datasets demonstrate that considering the organization of parts indeed facilitates reasoning about 3D geometry.

Latex Bibtex Citation:
@inproceedings{Paschalidou2020CVPR,
author = {Despoina Paschalidou and Luc Gool and Andreas Geiger},
title = {Learning Unsupervised Hierarchical Part Decomposition of 3D Objects from a Single RGB Image},
booktitle = {Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2020}
}

Paper

Supplementary Material

PointFlowNet: Learning Representations for Rigid Motion Estimation from Point Clouds
A. Behl, D. Paschalidou, S. Donne and A. Geiger
Conference on Computer Vision and Pattern Recognition (CVPR), 2019

Abstract: Despite significant progress in image-based 3D scene flow estimation, the performance of such approaches has not yet reached the fidelity required by many applications. Simultaneously, these applications are often not restricted to image-based estimation: laser scanners provide a popular alternative to traditional cameras, for example in the context of self-driving cars, as they directly yield a 3D point cloud. In this paper, we propose to estimate 3D motion from such unstructured point clouds using a deep neural network. In a single forward pass, our model jointly predicts 3D scene flow as well as the 3D bounding box and rigid body motion of objects in the scene. While the prospect of estimating 3D scene flow from unstructured point clouds is promising, it is also a challenging task. We show that the traditional global representation of rigid body motion prohibits inference by CNNs, and propose a translation equivariant representation to circumvent this problem. For training our deep network, a large dataset is required. Because of this, we augment real scans from KITTI with virtual objects, realistically modeling occlusions and simulating sensor noise. A thorough comparison with classic and learning-based techniques highlights the robustness of the proposed approach.

Latex Bibtex Citation:
@inproceedings{Behl2019CVPR,
author = {Aseem Behl and Despoina Paschalidou and Simon Donne and Andreas Geiger},
title = {PointFlowNet: Learning Representations for Rigid Motion Estimation from Point Clouds},
booktitle = {Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2019}
}

Paper

Supplementary Material

Poster

Video

Project Page

Superquadrics Revisited: Learning 3D Shape Parsing beyond Cuboids
D. Paschalidou, A. Ulusoy and A. Geiger
Conference on Computer Vision and Pattern Recognition (CVPR), 2019

Abstract: Abstracting complex 3D shapes with parsimonious part-based representations has been a long standing goal in computer vision. This paper presents a learning-based solution to this problem which goes beyond the traditional 3D cuboid representation by exploiting superquadrics as atomic elements. We demonstrate that superquadrics lead to more expressive 3D scene parses while being easier to learn than 3D cuboid representations. Moreover, we provide an analytical solution to the Chamfer loss which avoids the need for computational expensive reinforcement learning or iterative prediction. Our model learns to parse 3D objects into consistent superquadric representations without supervision. Results on various ShapeNet categories as well as the SURREAL human body dataset demonstrate the flexibility of our model in capturing fine details and complex poses that could not have been modelled using cuboids.

Latex Bibtex Citation:
@inproceedings{Paschalidou2019CVPR,
author = {Despoina Paschalidou and Ali Osman Ulusoy and Andreas Geiger},
title = {Superquadrics Revisited: Learning 3D Shape Parsing beyond Cuboids},
booktitle = {Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2019}
}

Paper

Supplementary Material

RayNet: Learning Volumetric 3D Reconstruction with Ray Potentials (spotlight)
D. Paschalidou, A. Ulusoy, C. Schmitt, L. Gool and A. Geiger
Conference on Computer Vision and Pattern Recognition (CVPR), 2018

Abstract: In this paper, we consider the problem of reconstructing a dense 3D model using images captured from different views. Recent methods based on convolutional neural networks (CNN) allow learning the entire task from data. However, they do not incorporate the physics of image formation such as perspective geometry and occlusion. Instead, classical approaches based on Markov Random Fields (MRF) with ray-potentials explicitly model these physical processes, but they cannot cope with large surface appearance variations across different viewpoints. In this paper, we propose RayNet, which combines the strengths of both frameworks. RayNet integrates a CNN that learns view-invariant feature representations with an MRF that explicitly encodes the physics of perspective projection and occlusion. We train RayNet end-to-end using empirical risk minimization. We thoroughly evaluate our approach on challenging real-world datasets and demonstrate its benefits over a piece-wise trained baseline, hand-crafted models as well as other learning-based approaches.

Latex Bibtex Citation:
@inproceedings{Paschalidou2018CVPR,
author = {Despoina Paschalidou and Ali Osman Ulusoy and Carolin Schmitt and Luc Gool and Andreas Geiger},
title = {RayNet: Learning Volumetric 3D Reconstruction with Ray Potentials},
booktitle = {Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2018}
}

Paper

Supplementary Material

Poster

Video

Project Page