Object-Oriented Learning (OOL): Perception, Representation, and Reasoning

International Conference on Machine Learning (ICML)

Friday July 17, 2020, Virtual Workshop

Access the virtual workshop page


Learning 3D Object-Oriented World Models from Unlabeled Videos

The physical world can be decomposed into discrete 3D objects. Reasoning about the world in terms of these objects may provide a number of advantages to learning agents. For example, objects interact compositionally, and this can support a strong form of generalization. Knowing properties of individual objects and rules for how those properties interact, one can predict the effects that objects will have on one another even if one has never witnessed an interaction between the types of objects in question. The promise of object-level reasoning has fueled a recent surge of interest in systems capable of learning to extract object-oriented representations from perceptual input without supervision. However, the vast majority of such systems treat objects as 2D entities, effectively ignoring their 3D nature. In the current work, we propose a probabilistic, object-oriented model equipped with the inductive bias that the world is made up of 3D objects moving through a 3D world, and make a number of structural adaptations which take advantage of that bias. In a series of experiments we show that this system is capable not only of segmenting objects from the perceptual stream, but also of extracting 3D information about objects (e.g. depth) and of tracking them through 3D space.