Abstract: Unsupervised learning with generative models has the potential of discovering rich representations of 3D scenes. Such Neural Scene Representations may subsequently support a wide variety of downstream tasks, ranging from robotics to computer graphics to medical imaging. However, existing methods ignore one of the most fundamental properties of scenes: their three-dimensional structure. In this talk, I will make the case for equipping Neural Scene Representations with an inductive bias for 3D structure, enabling self-supervised discovery of shape and appearance from few observations. By embedding an implicit scene representation in a neural rendering framework and learning a prior over these representations, I will show how we can enable 3D reconstruction from only a single posed 2D image. I will show how the features we learn in this process are already useful to the downstream task of semantic segmentation. I will then show how gradient-based meta-learning can enable fast inference of implicit representations.
Speaker Biography:Vincent Sitzmann is a postdoc in Joshua Tenenbaum’s group at MIT CSAIL. He previously finished his PhD at Stanford University with a thesis on “Self-Supervised Scene Representation Learning”. His research interest lies in neural scene representations – the way neural networks learn to represent information on our world. His goal is to allow independent agents to reason about our world given visual observations, such as inferring a complete model of a scene with information on geometry, material, lighting etc. from only few observations, a task that is simple for humans, but currently impossible for AI.