Tool transforms world landmark photos into 4D experiences

September 08, 2020

ITHACA, N.Y.- Using publicly available tourist photos of world landmarks such as the Trevi Fountain in Rome or Top of the Rock in New York City, Cornell University researchers have developed a method to create maneuverable 3D images that show changes in appearance over time.

The method, which employs deep learning to ingest and synthesize tens of thousands of mostly untagged and undated photos, solves a problem that has eluded experts in computer vision for six decades.

"It's a new way of modeling scenes that not only allows you to move your head and see, say, the fountain from different viewpoints, but also gives you controls for changing the time," said Noah Snavely, associate professor of computer science at Cornell Tech and senior author of "Crowdsampling the Plenoptic Function," presented at the European Conference on Computer Vision, held virtually Aug. 23-28.

"If you really went to the Trevi Fountain on your vacation, the way it would look would depend on what time you went - at night, it would be lit up by floodlights from the bottom. In the afternoon, it would be sunlit, unless you went on a cloudy day," Snavely said. "We learned the whole range of appearances, based on time of day and weather, from these unorganized photo collections, such that you can explore the whole range and simultaneously move around the scene."

Representing a place in a photorealistic way is challenging for traditional computer vision, partly because of the sheer number of textures to be reproduced. "The real world is so diverse in its appearance and has different kinds of materials - shiny things, water, thin structures," Snavely said.

Another problem is the inconsistency of the available data. Describing how something looks from every possible viewpoint in space and time - known as the plenoptic function - would be a manageable task with hundreds of webcams affixed around a scene, recording data day and night. But since this isn't practical, the researchers had to develop a way to compensate.

"There may not be a photo taken at 4 p.m. from this exact viewpoint in the data set. So we have to learn from a photo taken at 9 p.m. at one location, and a photo taken at 4:03 from another location," Snavely said. "And we don't know the granularity of when these photos were taken. But using deep learning allows us to infer what the scene would have looked like at any given time and place."

The researchers introduced a new scene representation called Deep Multiplane Images to interpolate appearance in four dimensions - 3D, plus changes over time. Their method is inspired in part on a classic animation technique developed by the Walt Disney Company in the 1930s, which uses layers of transparencies to create a 3D effect without redrawing every aspect of a scene.

"We use the same idea invented for creating 3D effects in 2D animation to create 3D effects in real-world scenes, to create this deep multilayer image by fitting it to all these disparate measurements from the tourists' photos," Snavely said. "It's interesting that it kind of stems from this very old, classic technique used in animation."

In the study, they showed that this model could be trained to create a scene using around 50,000 publicly available images found on sites such as Flickr and Instagram. The method has implications for computer vision research, as well as virtual tourism - particularly useful at a time when few can travel in person.

"You can get the sense of really being there," Snavely said. "It works surprisingly well for a range of scenes."
First author of the paper is Cornell Tech doctoral student Zhengqi Li. Abe Davis, assistant professor of computer science in the Faculty of Computing and Information Science, and Cornell Tech doctoral student Wenqi Xian also contributed.

The research was partly supported by philanthropist Eric Schmidt, former CEO of Google, and Wendy Schmidt, by recommendation of the Schmidt Futures Program.

Cornell University

Related Computer Vision Articles from Brightsurf:

Computer vision predicts congenital adrenal hyperplasia
Using computer vision, researchers have discovered strong correlations between facial morphology and congenital adrenal hyperplasia (CAH), a life-threatening genetic condition of the adrenal glands and one of the most common forms of adrenal insufficiency in children.

Computer vision app allows easier monitoring of diabetes
A computer vision technology developed by University of Cambridge engineers has now been developed into a free mobile phone app for regular monitoring of glucose levels in people with diabetes.

Computer vision helps find binding sites in drug targets
Scientists from the iMolecule group at Skoltech developed BiteNet, a machine learning (ML) algorithm that helps find drug binding sites, i.e. potential drug targets, in proteins.

Tool helps clear biases from computer vision
Researchers at Princeton University have developed a tool that flags potential biases in sets of images used to train artificial intelligence (AI) systems.

UCLA computer scientists set benchmarks to optimize quantum computer performance
Two UCLA computer scientists have shown that existing compilers, which tell quantum computers how to use their circuits to execute quantum programs, inhibit the computers' ability to achieve optimal performance.

School-based vision screening programs found 1 in 10 kids had vision problems
A school-based vision screening program in kindergarten, shown to be effective at identifying untreated vision problems in 1 in 10 students, could be useful to implement widely in diverse communities, according to new research in CMAJ (Canadian Medical Association Journal)

Researchers incorporate computer vision and uncertainty into AI for robotic prosthetics
Researchers have developed new software that can be integrated with existing hardware to enable people using robotic prosthetics or exoskeletons to walk in a safer, more natural manner on different types of terrain.

'Time is vision' after a stroke
University of Rochester researchers studied stroke patients who experienced vision loss and found that the patients retained some visual abilities immediately after the stroke but these abilities diminished gradually and eventually disappeared permanently after approximately six months.

Computer vision helps SLAC scientists study lithium ion batteries
New machine learning methods bring insights into how lithium ion batteries degrade, and show it's more complicated than many thought.

A new model of vision
MIT researchers have developed a computer model of face processing that could reveal how the brain produces richly detailed visual representations so quickly.

Read More: Computer Vision News and Computer Vision Current Events is a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn advertising fees by advertising and linking to