Object-Oriented Learning (OOL): Perception, Representation, and Reasoning

International Conference on Machine Learning (ICML)

Friday July 17, 2020, Virtual Workshop

Access the virtual workshop page


Hierarchical Decomposition and Generation of Scenes with Compositional Objects

Compositional structures between parts and objects are inherent in natural scenes. Recent work on representation learning has succeeded in modeling scenes as composition of objects, but further decomposition of objects into parts and subparts has largely been overlooked. In this paper, we propose RICH, the first deep latent variable model for learning Representation of Interpretable Compositional Hierarchies. At the core of RICH is a latent scene graph representation that organizes the entities of a scene into a tree according to their compositional relationships. During inference, RICH takes a top-down approach, allowing higher-level representation to guide lower-level decomposition in case there is compositional ambiguity. In experiments on images containing multiple compositional objects, we demonstrate that RICH is able to learn the latent compositional hierarchy, generate imaginary scenes, and improve data efficiency in downstream tasks.