Object-Oriented Learning (OOL): Perception, Representation, and Reasoning
International Conference on Machine Learning (ICML)
Friday July 17, 2020, Virtual Workshop
The video will become available after 1st August 2020 in accordance with the ICML2020 Code of Conduct.
Compositional structures between parts and objects are inherent in natural scenes. Recent work on representation learning has succeeded in modeling scenes as composition of objects, but further decomposition of objects into parts and subparts has largely been overlooked. In this paper, we propose RICH, the first deep latent variable model for learning Representation of Interpretable Compositional Hierarchies. At the core of RICH is a latent scene graph representation that organizes the entities of a scene into a tree according to their compositional relationships. During inference, RICH takes a top-down approach, allowing higher-level representation to guide lower-level decomposition in case there is compositional ambiguity. In experiments on images containing multiple compositional objects, we demonstrate that RICH is able to learn the latent compositional hierarchy, generate imaginary scenes, and improve data efficiency in downstream tasks.