Object-Oriented Learning (OOL): Perception, Representation, and Reasoning
International Conference on Machine Learning (ICML)
Friday July 17, 2020, Virtual Workshop
We present a generative model of images, that incorporates a structured latent representation separating objects from each other and from the background. It explicitly models the depth ordering of objects, as well as their 2D positions, with a novel and efficient approach to placement that avoids computationally-expensive spatial transformers. The model can be trained from images alone, without the need for object masks or depth supervision. It learns to generate coherent scenes, and to decompose novel images into their constituent objects, predicting their depth ordering, locations, and segmentation of occluded parts.