Skip to content Skip to navigation

Learning Physical Graph Representations from Visual Scenes

Center for Mind, Brain, Computation and Technology logo
August 5, 2020 - 6:00pm

Graduate students and postdocs are invited to learn from Daniel Bear. Human vision is object-centric: we partition scenes into discrete objects and attribute physical properties to them, including appearance, position, shape, material composition, and their relations to each other. This abstraction is important for higher cognition, as most of our behaviors and goals are specified in terms of entities and relations — not pixel-level detail. Despite this, most state-of-the-art computer vision algorithms are based on image-like internal representations; if "objects" are present at all, they are typically given only "semantic" properties, not the physical ones needed for useful interaction. To begin addressing this gap, we've developed a way to augment computer vision algorithms with object-centric representations. We call these representations "Physical Scene Graphs" (PSGs) because they have both a hierarchical graphical structure and an explicit encoding of physical properties of a scene's visual elements, such as their position, shape, texture, and so on. Algorithms that infer PSG representations from visual input — "PSGNets" — groups visual features into discrete, unsupervised entities ("objects") and predict their physical attributes as explicit components of associated vectors. The algorithm is recursive, hierarchically constructing a graph of high-level objects from their low-level parts. I'll talk about the architecture of PSGNets, our results using them to segment visual scenes into objects without supervision, and — if there's time — how we're hoping to connect them back to neural circuits in biological visual systems.