Nav: Home

Reducing the carbon footprint of artificial intelligence

April 23, 2020

Artificial intelligence has become a focus of certain ethical concerns, but it also has some major sustainability issues.

Last June, researchers at the University of Massachusetts at Amherst released a startling report estimating that the amount of power required for training and searching a certain neural network architecture involves the emissions of roughly 626,000 pounds of carbon dioxide. That's equivalent to nearly five times the lifetime emissions of the average U.S. car, including its manufacturing.

This issue gets even more severe in the model deployment phase, where deep neural networks need to be deployed on diverse hardware platforms, each with different properties and computational resources.

MIT researchers have developed a new automated AI system for training and running certain neural networks. Results indicate that, by improving the computational efficiency of the system in some key ways, the system can cut down the pounds of carbon emissions involved -- in some cases, down to low triple digits.

The researchers' system, which they call a once-for-all network, trains one large neural network comprising many pretrained subnetworks of different sizes that can be tailored to diverse hardware platforms without retraining. This dramatically reduces the energy usually required to train each specialized neural network for new platforms -- which can include billions of internet of things (IoT) devices. Using the system to train a computer-vision model, they estimated that the process required roughly 1/1,300 the carbon emissions compared to today's state-of-the-art neural architecture search approaches, while reducing the inference time by 1.5-2.6 times.

"The aim is smaller, greener neural networks," says Song Han, an assistant professor in the Department of Electrical Engineering and Computer Science. "Searching efficient neural network architectures has until now had a huge carbon footprint. But we reduced that footprint by orders of magnitude with these new methods."

The work was carried out on Satori, an efficient computing cluster donated to MIT by IBM that is capable of performing 2 quadrillion calculations per second. The paper is being presented next week at the International Conference on Learning Representations. Joining Han on the paper are four undergraduate and graduate students from EECS, MIT-IBM Watson AI Lab, and Shanghai Jiao Tong University.

Creating a "once-for-all" network

The researchers built the system on a recent AI advance called AutoML (for automatic machine learning), which eliminates manual network design. Neural networks automatically search massive design spaces for network architectures tailored, for instance, to specific hardware platforms. But there's still a training efficiency issue: Each model has to be selected then trained from scratch for its platform architecture.

"How do we train all those networks efficiently for such a broad spectrum of devices -- from a $10 IoT device to a $600 smartphone? Given the diversity of IoT devices, the computation cost of neural architecture search will explode," Han says.

The researchers invented an AutoML system that trains only a single, large "once-for-all" (OFA) network that serves as a "mother" network, nesting an extremely high number of subnetworks that are sparsely activated from the mother network. OFA shares all its learned weights with all subnetworks -- meaning they come essentially pretrained. Thus, each subnetwork can operate independently at inference time without retraining.

The team trained an OFA convolutional neural network (CNN) -- commonly used for image-processing tasks -- with versatile architectural configurations, including different numbers of layers and "neurons," diverse filter sizes, and diverse input image resolutions. Given a specific platform, the system uses the OFA as the search space to find the best subnetwork based on the accuracy and latency tradeoffs that correlate to the platform's power and speed limits. For an IoT device, for instance, the system will find a smaller subnetwork. For smartphones, it will select larger subnetworks, but with different structures depending on individual battery lifetimes and computation resources. OFA decouples model training and architecture search, and spreads the one-time training cost across many inference hardware platforms and resource constraints.

This relies on a "progressive shrinking" algorithm that efficiently trains the OFA network to support all of the subnetworks simultaneously. It starts with training the full network with the maximum size, then progressively shrinks the sizes of the network to include smaller subnetworks. Smaller subnetworks are trained with the help of large subnetworks to grow together. In the end, all of the subnetworks with different sizes are supported, allowing fast specialization based on the platform's power and speed limits. It supports many hardware devices with zero training cost when adding a new device.

In total, one OFA, the researchers found, can comprise more than 10 quintillion -- that's a 1 followed by 19 zeroes -- architectural settings, covering probably all platforms ever needed. But training the OFA and searching it ends up being far more efficient than spending hours training each neural network per platform. Moreover, OFA does not compromise accuracy or inference efficiency. Instead, it provides state-of-the-art ImageNet accuracy on mobile devices. And, compared with state-of-the-art industry-leading CNN models , the researchers say OFA provides 1.5-2.6 times speedup, with superior accuracy.

"That's a breakthrough technology," Han says. "If we want to run powerful AI on consumer devices, we have to figure out how to shrink AI down to size."

"The model is really compact. I am very excited to see OFA can keep pushing the boundary of efficient deep learning on edge devices," says Chuang Gan, a researcher at the MIT-IBM Watson AI Lab and co-author of the paper.

"If rapid progress in AI is to continue, we need to reduce its environmental impact," says John Cohn, an IBM fellow and member of the MIT-IBM Watson AI Lab. "The upside of developing methods to make AI models smaller and more efficient is that the models may also perform better."
-end-
Related links

Paper: "Once and For All: Train One Network and Specialize It for Efficient Deployment" https://arxiv.org/pdf/1908.09791.pdf

Video: "Once for All: Train One Network and Specialize it for Efficient Deployment, ICLR 2020." https://youtu.be/a_OeT8MXzWI

Massachusetts Institute of Technology

Related Emissions Articles:

Tracking fossil fuel emissions with carbon-14
Researchers from NOAA and the University of Colorado have devised a breakthrough method for estimating national emissions of carbon dioxide from fossil fuels using ambient air samples and a well-known isotope of carbon that scientists have relied on for decades to date archaeological sites.
COVID-19 puts brakes on global emissions
Carbon dioxide emissions from fossil fuel sources reached a maximum daily decline of 17 per cent in April as a result of drastic decline in energy demand that have occurred during the COVID-19 pandemic.
Egregious emissions
Call them 'super polluters' -- the handful of industrial facilities that emit unusually high levels of toxic chemical pollution year after year.
Continued CO2 emissions will impair cognition
New CU Boulder research finds that an anticipated rise in carbon dioxide concentrations in our indoor living and working spaces by the year 2100 could lead to impaired human cognition.
Major new study charts course to net zero industrial emissions
A major new study by an interdisciplinary team of researchers finds that it is possible -- and critical -- to bring industrial greenhouse gas emissions to net zero by 2070.
Capturing CO2 from trucks and reducing their emissions by 90%
Researchers at EPFL have patented a new concept that could cut trucks' CO2 emissions by almost 90%.
Big trucks, little emissions
Researchers reveal a new integrated, cost-efficient way of converting ethanol for fuel blends that can reduce greenhouse gas emissions.
Uncertainty in emissions estimates in the spotlight
National or other emissions inventories of greenhouse gases that are used to develop strategies and track progress in terms of emissions reductions for climate mitigation contain a certain amount of uncertainty, which inevitably has an impact on the decisions they inform.
How buildings can cut 80% of their carbon emissions by 2050
Energy use in buildings -- from heating and cooling your home to keeping the lights on in the office -- is responsible for over one-third of all carbon dioxide (CO2) emissions in the United States.
Fracking likely to result in high emissions
Natural gas releases fewer greenhouse gases than other fossil fuels.
More Emissions News and Emissions Current Events

Trending Science News

Current Coronavirus (COVID-19) News

Top Science Podcasts

We have hand picked the top science podcasts of 2020.
Now Playing: TED Radio Hour

Our Relationship With Water
We need water to live. But with rising seas and so many lacking clean water – water is in crisis and so are we. This hour, TED speakers explore ideas around restoring our relationship with water. Guests on the show include legal scholar Kelsey Leonard, artist LaToya Ruby Frazier, and community organizer Colette Pichon Battle.
Now Playing: Science for the People

#568 Poker Face Psychology
Anyone who's seen pop culture depictions of poker might think statistics and math is the only way to get ahead. But no, there's psychology too. Author Maria Konnikova took her Ph.D. in psychology to the poker table, and turned out to be good. So good, she went pro in poker, and learned all about her own biases on the way. We're talking about her new book "The Biggest Bluff: How I Learned to Pay Attention, Master Myself, and Win".
Now Playing: Radiolab

Uncounted
First things first: our very own Latif Nasser has an exciting new show on Netflix. He talks to Jad about the hidden forces of the world that connect us all. Then, with an eye on the upcoming election, we take a look back: at two pieces from More Perfect Season 3 about Constitutional amendments that determine who gets to vote. Former Radiolab producer Julia Longoria takes us to Washington, D.C. The capital is at the heart of our democracy, but it's not a state, and it wasn't until the 23rd Amendment that its people got the right to vote for president. But that still left DC without full representation in Congress; D.C. sends a "non-voting delegate" to the House. Julia profiles that delegate, Congresswoman Eleanor Holmes Norton, and her unique approach to fighting for power in a virtually powerless role. Second, Radiolab producer Sarah Qari looks at a current fight to lower the US voting age to 16 that harkens back to the fight for the 26th Amendment in the 1960s. Eighteen-year-olds at the time argued that if they were old enough to be drafted to fight in the War, they were old enough to have a voice in our democracy. But what about today, when even younger Americans are finding themselves at the center of national political debates? Does it mean we should lower the voting age even further? This episode was reported and produced by Julia Longoria and Sarah Qari. Check out Latif Nasser's new Netflix show Connected here. Support Radiolab today at Radiolab.org/donate.