A new model of vision

March 04, 2020

When we open our eyes, we immediately see our surroundings in great detail. How the brain is able to form these richly detailed representations of the world so quickly is one of the biggest unsolved puzzles in the study of vision.

Scientists who study the brain have tried to replicate this phenomenon using computer models of vision, but so far, leading models only perform much simpler tasks such as picking out an object or a face against a cluttered background. Now, a team led by MIT cognitive scientists has produced a computer model that captures the human visual system's ability to quickly generate a detailed scene description from an image, and offers some insight into how the brain achieves this.

"What we were trying to do in this work is to explain how perception can be so much richer than just attaching semantic labels on parts of an image, and to explore the question of how do we see all of the physical world," says Josh Tenenbaum, a professor of computational cognitive science and a member of MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL) and the Center for Brains, Minds, and Machines (CBMM).

The new model posits that when the brain receives visual input, it quickly performs a series of computations that reverse the steps that a computer graphics program would use to generate a 2D representation of a face or other object. This type of model, known as efficient inverse graphics (EIG), also correlates well with electrical recordings from face-selective regions in the brains of nonhuman primates, suggesting that the primate visual system may be organized in much the same way as the computer model, the researchers say.

Ilker Yildirim, a former MIT postdoc who is now an assistant professor of psychology at Yale University, is the lead author of the paper, which appears today in Science Advances. Tenenbaum and Winrich Freiwald, a professor of neurosciences and behavior at Rockefeller University, are the senior authors of the study. Mario Belledonne, a graduate student at Yale, is also an author.

Inverse graphics

Decades of research on the brain's visual system has studied, in great detail, how light input onto the retina is transformed into cohesive scenes. This understanding has helped artificial intelligence researchers develop computer models that can replicate aspects of this system, such as recognizing faces or other objects.

"Vision is the functional aspect of the brain that we understand the best, in humans and other animals," Tenenbaum says. "And computer vision is one of the most successful areas of AI at this point. We take for granted that machines can now look at pictures and recognize faces very well, and detect other kinds of objects."

However, even these sophisticated artificial intelligence systems don't come close to what the human visual system can do, Yildirim says.

"Our brains don't just detect that there's an object over there, or recognize and put a label on something," he says. "We see all of the shapes, the geometry, the surfaces, the textures. We see a very rich world."

More than a century ago, the physician, physicist, and philosopher Hermann von Helmholtz theorized that the brain creates these rich representations by reversing the process of image formation. He hypothesized that the visual system includes an image generator that would be used, for example, to produce the faces that we see during dreams. Running this generator in reverse would allow the brain to work backward from the image and infer what kind of face or other object would produce that image, the researchers say.

However, the question remained: How could the brain perform this process, known as inverse graphics, so quickly? Computer scientists have tried to create algorithms that could perform this feat, but the best previous systems require many cycles of iterative processing, taking much longer than the 100 to 200 milliseconds the brain requires to create a detailed visual representation of what you're seeing. Neuroscientists believe perception in the brain can proceed so quickly because it is implemented in a mostly feedforward pass through several hierarchically organized layers of neural processing.

The MIT-led team set out to build a special kind of deep neural network model to show how a neural hierarchy can quickly infer the underlying features of a scene -- in this case, a specific face. In contrast to the standard deep neural networks used in computer vision, which are trained from labeled data indicating the class of an object in the image, the researchers' network is trained from a model that reflects the brain's internal representations of what scenes with faces can look like.

Their model thus learns to reverse the steps performed by a computer graphics program for generating faces. These graphics programs begin with a three-dimensional representation of an individual face and then convert it into a two-dimensional image, as seen from a particular viewpoint. These images can be placed on an arbitrary background image. The researchers theorize that the brain's visual system may do something similar when you dream or conjure a mental image of someone's face.

The researchers trained their deep neural network to perform these steps in reverse -- that is, it begins with the 2D image and then adds features such as texture, curvature, and lighting, to create what the researchers call a "2.5D" representation. These 2.5D images specify the shape and color of the face from a particular viewpoint. Those are then converted into 3D representations, which don't depend on the viewpoint.

"The model gives a systems-level account of the processing of faces in the brain, allowing it to see an image and ultimately arrive at a 3D object, which includes representations of shape and texture, through this important intermediate stage of a 2.5D image," Yildirim says.

Model performance

The researchers found that their model is consistent with data obtained by studying certain regions in the brains of macaque monkeys. In a study published in 2010, Freiwald and Doris Tsao of Caltech recorded the activity of neurons in those regions and analyzed how they responded to 25 different faces, seen from seven different viewpoints. That study revealed three stages of higher-level face processing, which the MIT team now hypothesizes correspond to three stages of their inverse graphics model: roughly, a 2.5D viewpoint-dependent stage; a stage that bridges from 2.5 to 3D; and a 3D, viewpoint-invariant stage of face representation.

"What we show is that both the quantitative and qualitative response properties of those three levels of the brain seem to fit remarkably well with the top three levels of the network that we've built," Tenenbaum says.

The researchers also compared the model's performance to that of humans in a task that involves recognizing faces from different viewpoints. This task becomes harder when researchers alter the faces by removing the face's texture while preserving its shape, or distorting the shape while preserving relative texture. The new model's performance was much more similar to that of humans than computer models used in state-of-the-art face-recognition software, additional evidence that this model may be closer to mimicking what happens in the human visual system.

The researchers now plan to continue testing the modeling approach on additional images, including objects that aren't faces, to investigate whether inverse graphics might also explain how the brain perceives other kinds of scenes. In addition, they believe that adapting this approach to computer vision could lead to better-performing AI systems.

"If we can show evidence that these models might correspond to how the brain works, this work could lead computer vision researchers to take more seriously and invest more engineering resources in this inverse graphics approach to perception," Tenenbaum says. "The brain is still the gold standard for any kind of machine that sees the world richly and quickly."
-end-
The research was funded by the Center for Brains, Minds, and Machines at MIT, the National Science Foundation, the National Eye Institute, the Office of Naval Research, the New York Stem Cell Foundation, the Toyota Research Institute, and Mitsubishi Electric.

Massachusetts Institute of Technology

Related Brain Articles from Brightsurf:

Glioblastoma nanomedicine crosses into brain in mice, eradicates recurring brain cancer
A new synthetic protein nanoparticle capable of slipping past the nearly impermeable blood-brain barrier in mice could deliver cancer-killing drugs directly to malignant brain tumors, new research from the University of Michigan shows.

Children with asymptomatic brain bleeds as newborns show normal brain development at age 2
A study by UNC researchers finds that neurodevelopmental scores and gray matter volumes at age two years did not differ between children who had MRI-confirmed asymptomatic subdural hemorrhages when they were neonates, compared to children with no history of subdural hemorrhage.

New model of human brain 'conversations' could inform research on brain disease, cognition
A team of Indiana University neuroscientists has built a new model of human brain networks that sheds light on how the brain functions.

Human brain size gene triggers bigger brain in monkeys
Dresden and Japanese researchers show that a human-specific gene causes a larger neocortex in the common marmoset, a non-human primate.

Unique insight into development of the human brain: Model of the early embryonic brain
Stem cell researchers from the University of Copenhagen have designed a model of an early embryonic brain.

An optical brain-to-brain interface supports information exchange for locomotion control
Chinese researchers established an optical BtBI that supports rapid information transmission for precise locomotion control, thus providing a proof-of-principle demonstration of fast BtBI for real-time behavioral control.

Transplanting human nerve cells into a mouse brain reveals how they wire into brain circuits
A team of researchers led by Pierre Vanderhaeghen and Vincent Bonin (VIB-KU Leuven, Université libre de Bruxelles and NERF) showed how human nerve cells can develop at their own pace, and form highly precise connections with the surrounding mouse brain cells.

Brain scans reveal how the human brain compensates when one hemisphere is removed
Researchers studying six adults who had one of their brain hemispheres removed during childhood to reduce epileptic seizures found that the remaining half of the brain formed unusually strong connections between different functional brain networks, which potentially help the body to function as if the brain were intact.

Alcohol byproduct contributes to brain chemistry changes in specific brain regions
Study of mouse models provides clear implications for new targets to treat alcohol use disorder and fetal alcohol syndrome.

Scientists predict the areas of the brain to stimulate transitions between different brain states
Using a computer model of the brain, Gustavo Deco, director of the Center for Brain and Cognition, and Josephine Cruzat, a member of his team, together with a group of international collaborators, have developed an innovative method published in Proceedings of the National Academy of Sciences on Sept.

Read More: Brain News and Brain Current Events
Brightsurf.com is a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn advertising fees by advertising and linking to Amazon.com.