The current wave of artificial intelligence dates back to 2012 and a university competition that measured how well algorithms could recognize objects in photographs.
That year, researchers found that feeding thousands of images into an algorithm loosely inspired by how neurons in a brain respond to input produced a huge leap in accuracy. This breakthrough has sparked an explosion of academic research and commercial activity that is transforming some businesses and industries.
Now a new trick, which involves training the same type of AI algorithm to transform 2D images into a rich 3D view of a scene, is causing excitement in the computer graphics and AI worlds. . The technique has the potential to shake up video games, virtual reality, robotics and autonomous driving. Some experts think it could even help machines perceive and reason about the world in a smarter, or at least human, way.
“It’s ultra-hot, there’s a huge buzz,” says Ken Goldberg, a roboticist at the University of California, Berkeley, who uses technology to improve the ability of AI-enhanced robots to grasp unfamiliar shapes. Goldberg says the technology has “hundreds of applications,” in areas ranging from entertainment to architecture.
The new approach is to use a neural network to capture and generate 3D images from a few 2D snapshots, a technique called “neural rendering.” It was born from the merging of ideas circulating in computer graphics and AI, but interest exploded in April 2020 when researchers from UC Berkeley and Google showed that a neural network could capture a scene in a photorealistic way in 3D simply by viewing several 2D images of it.
This algorithm exploits the way light travels through air and performs calculations that calculate the density and color of points in 3D space. This converts 2D images into a photorealistic 3D representation that can be viewed from any possible point. Its core is the same type of neural network as the 2012 image recognition algorithm, which analyzes pixels in a 2D image. The new algorithms convert 2D pixels into 3D equivalents, called voxels. Videos of the trick, which the researchers called Neural Radiance Fields, or NeRF, have won over the research community.
“I’ve been doing computer vision for 20 years, but when I saw this video I was like, ‘Wow, this is just amazing,'” says Georgia Tech professor Frank Dellaert.
For anyone working on infographics, Dellaert says, the approach is a breakthrough. Creating a detailed and realistic 3D scene normally requires hours of painstaking manual work. The new method can generate these scenes from ordinary photographs in minutes. It also offers a new way to create and manipulate synthetic scenes. “It’s seminal and important, which is a crazy thing to say for a work that’s only two years old,” he says.
Dellaert says the speed and variety of ideas that have emerged since then have been breathtaking. Others have used the idea to create moving selfies (or “nerfies”), which let you pan around a person’s head based on a few still images; to create 3D avatars from a single headshot; and develop a way to automatically relight scenes differently.
Labor has been gaining momentum in the industry with surprising speed. Ben Mildenhall, one of the researchers behind NeRF who is now at Google, describes the blossoming of research and development as “a slow tidal wave”.