Supercomputing speeds up deep learning trainingNovember 13, 2017
A team of researchers from the University of California, Berkeley, the University of California, Davis and the Texas Advanced Computing Center (TACC) published the results of an effort to harness the power of supercomputers to train a deep neural network (DNN) for image recognition at rapid speed.
The researchers efficiently used 1024 Skylake processors on the Stampede2 supercomputer at TACC to complete a 100-epoch ImageNet training with AlexNet in 11 minutes - the fastest time recorded to date. Using 1600 Skylake processors they also bested Facebook's prior results by finishing a 90-epoch ImageNet training with ResNet-50 in 32 minutes and, for batch sizes above 20,000, their accuracy was much higher than Facebook's. (In recent years, the ImageNet benchmark -- a visual database designed for use in image recognition research -- has played a significant role in assessing different approaches to DNN training.)
Using 512 Intel Xeon Phi chips on Stampede2 they finished the 100-epoch AlexNet in 24 minutes and 90-epoch ResNet-50 in 60 minutes.
"These results show the potential of using advanced computing resources, like those at TACC, along with large mini-batch enabling algorithms, to train deep neural networks interactively and in a distributed way," said Zhao Zhang, a research scientist at TACC, a leading supercomputing center. "Given our large user base and huge capacity, this will have a major impact on science."
They published their results in Arxiv in November 2017.
The DNN training system achieved state-of-the-art "top-1" test accuracy, which means the percentage of cases where the model answer (the one with highest probability) is exactly the expected answer. Using ResNet-50 (a Convolutional Neural Networks developed by Microsoft that won the 2015 ImageNet Large Scale Visual Recognition Competition and surpasses human performance on the ImageNet dataset) they achieved an accuracy of more than 75 percent - on par with Facebook and Amazon's batch training levels. Scaling to the batch size of the data 32,000 in this work only lost 0.6 percent top-1 accuracy.
Currently deep learning researchers need to use trial-and-error to design new models. This means they need to run the training process tens or even hundreds of times to build a model.
The relatively slow speed of training impacts the speed of science, and the kind of science that researchers are willing to explore. Researchers at Google have noted that if it takes one to four days to train a neural network, this is seen by researchers as tolerable. If it takes one to four weeks, the method will be utilized for only high value experiments. And if it requires more than one month, scientists won't even try. If researchers could finish the training process during a coffee break, it would significantly improve their productivity.
The group's breakthrough involved the development of the Layer-Wise Adaptive Rate Scaling (LARS) algorithm that is capable of distributing data efficiently to many processors to compute simultaneously using a larger-than-ever batch size (up to 32,000 items).
LARS incorporates many more training examples in one forward/backward pass and adaptively adjusts the learning rate between each layer of the neural network depending on a metric gleaned from the previous iteration.
As a consequence of these changes they were able to take advantage of the large number of Skylake and Intel Xeon Phi processors available on Stampede2 while preserving accuracy, which was not the case with previous large-batch methods.
"For deep learning applications, larger datasets and bigger models lead to significant improvements in accuracy, but at the cost of longer training times," said James Demmel, "A professor of Mathematics and Computer Science at UC Berkeley. "Using the LARS algorithm, jointly developed by Y. You with B. Ginsburg and I. Gitman during an NVIDIA internship, enabled us to maintain accuracy even at a batch size of 32K. This large batch size enables us to use distributed systems efficiently and to finish the ImageNet training with AlexNet in 11 minutes on 1024 Skylake processors, a significant improvement over prior results."
The findings show an alternative to the trend of using specialized hardware - either GPUs, Tensor Flow chips, FPGAs or other emerging architectures -- for deep learning. The team wrote the code based on Caffe and utilized Intel-Caffe, which supports multi-node training.
The training phase of a deep neural network is typically the most time-intensive part of deep learning. Until recently, the process accomplished by the UC Berkeley-led team would have taken hours or days. The advances in fast, distributed training will impact the speed of science, as well as the kind of science that researchers can explore with these new methods.
The experiment is part of a broader effort at TACC to test the applicability of CPU hardware for deep learning and machine learning applications and frameworks, including Caffe, MXNet and TensorFlow.
TACC's experts showed how they when scaling Caffe to 1024 Skylake processors using resNet-50 processors, the framework ran with about 73 percent efficiency -- or almost 750 times faster than on a single Skylake processor.
"Using commodity HPC servers to rapidly train deep learning algorithms on massive datasets is a powerful new tool for both measured and simulated research," said Niall Gaffney, TACC Director of Data Intensive Computing. "By not having to migrate large datasets between specialized hardware systems, the time to data driven discovery is reduced and overall efficiency can be significantly increased."
As researchers and scientific disciplines increasingly use machine and deep learning to extract insights from large scale experimental and simulated datasets, having systems that can handle this workload are important.
Recent results suggest such systems are now available to the open-science community through national advanced computing resources like Stampede2.
University of Texas at Austin, Texas Advanced Computing Center
Related Neural Network Articles:
A new technique helps elucidate the inner workings of neural networks trained on visual data.
New nanowire-coated, stretchy, multifunction fibers can be used to stimulate and monitor the spinal cord while subjects are in motion, MIT researchers report.
Telescopes, the workhorse instruments of astronomy, are limited by the size of the mirror or lens they use.
A new organic artificial synapse made by Stanford researchers could support computers that better recreate the way the human brain processes information.
Scientists from Mail.Ru Group, Insilico Medicine and MIPT for the first time have applied a generative neural network to create new pharmaceutical medicines with the desired characteristics.
Researchers from Ludwig-Maximilians-Universitaet (LMU) in Munich have demonstrated how deregulation of an epigenetic mechanism that is active only in the early phases of neurogenesis triggers the subsequent death of neural cells.
To date, it has been assumed that the differentiation of stem cells depends on the environment they are embedded in.
World Scientific's latest book 'Deep Learning Neural Networks: Design and Case Studies' shows how DLNN can be a powerful computational tool for solving prediction, diagnosis, detection and decision problems based on a well-defined computational architecture.
The UPV/EHU's Catalytic Processes for Waste Valorisation research group is working on various lines of research relating to renewable energies, one of which corresponds to the obtaining of bio-oils or synthetic petroleum using biomass.
A team of neuroscientists at the Champalimaud Centre for the Unknown, in Lisbon, has been able to map single neural connections over long distances in the brain.
Related Neural Network Reading:
Make Your Own Neural Network
by Tariq Rashid (Author)
A step-by-step gentle journey through the mathematics of neural networks, and making your own using the Python computer language. Neural networks are a key element of deep learning and artificial intelligence, which today is capable of some truly impressive feats. Yet too few really understand how neural networks actually work. This guide will take you on a fun and unhurried journey, starting from very simple ideas, and gradually building up an understanding of how neural networks work. You won't need any mathematics beyond secondary school, and an accessible introduction to calculus is also... View Details
Make Your Own Neural Network: An In-depth Visual Introduction For Beginners
by Michael Taylor (Author)
A step-by-step visual journey through the mathematics of neural networks, and making your own using Python and Tensorflow. What you will gain from this book: * A deep understanding of how a Neural Network works. * How to build a Neural Network from scratch using Python. Who this book is for: * Beginners who want to fully understand how networks work, and learn to build two step-by-step examples in Python. * Programmers who need an easy to read, but solid refresher, on the math of neural networks. What’s Inside - ‘Make Your Own Neural Network: An Indepth Visual Introduction For... View Details
Neural Network Design (2nd Edition)
by Martin T Hagan (Author), Howard B Demuth (Author), Mark H Beale (Author), Orlando De Jesús (Author)
This book, by the authors of the Neural Network Toolbox for MATLAB, provides a clear and detailed coverage of fundamental neural network architectures and learning rules. In it, the authors emphasize a coherent presentation of the principal neural networks, methods for training them and their applications to practical problems. Features Extensive coverage of training methods for both feedforward networks (including multilayer and radial basis networks) and recurrent networks. In addition to conjugate gradient and Levenberg-Marquardt variations of the backpropagation algorithm, the text also... View Details
The Math of Neural Networks
by Michael Taylor (Author)
There are many reasons why neural networks fascinate us and have captivated headlines in recent years. They make web searches better, organize photos, and are even used in speech translation. Heck, they can even generate encryption. At the same time, they are also mysterious and mind-bending: how exactly do they accomplish these things ? What goes on inside a neural network? On a high level, a network learns just like we do, through trial and error. This is true regardless if the network is supervised, unsupervised, or semi-supervised. Once we dig a bit deeper though, we discover that a... View Details
Neural Networks: Introduction to Artificial Neurons, Backpropagation Algorithms and Multilayer Feedforward Networks (Advanced Data Analytcs) (Volume 2)
by Joshua Chapmann (Author)
Why are engineers studying the human brain?
They are not doing it for fun, medical research or some form of global engineering competition. Engineers recognized that computers can process and store much more data than humans, yet even supercomputers can’t carry out tasks that the brain finds very simple such as facial recognition and natural language processing. MIT’s state-of-the-art research facility, named “Centre for Brains, Minds and Machines”, is a perfect testimonial to this fundamental interaction between the human brain and computers in today’s world.
Hence... View Details
Neural Networks for Complete Beginners: Introduction for Neural Network Programming
by Mark Smart (Author)
This book is an exploration of an artificial neural network. It has been created to suit even the complete beginners to artificial neural networks. The first part of the book is an overview of artificial neural networks so as to help the reader understand what they are. You will also learn the relationship between the neurons which make up the human brain and the artificial neurons. Artificial neural networks embrace the concept of learning which is common in human beings. This book guides you to understand how learning takes place in artificial neural networks. The back-propagation... View Details
Fundamentals of Artificial Neural Networks (MIT Press)
by Mohamad Hassoun (Author)
As book review editor of the IEEE Transactions on Neural Networks, Mohamad Hassoun has had the opportunity to assess the multitude of books on artificial neural networks that have appeared in recent years. Now, in Fundamentals of Artificial Neural Networks, he provides the first systematic account of artificial neural network paradigms by identifying clearly the fundamental concepts and major methodologies underlying most of the current theory and practice employed by neural network researchers.Such a systematic and unified treatment, although sadly lacking in most recent texts on neural... View Details
Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems
by Aurélien Géron (Author)
Graphics in this book are printed in black and white.
Through a series of recent breakthroughs, deep learning has boosted the entire field of machine learning. Now, even programmers who know close to nothing about this technology can use simple, efficient tools to implement programs capable of learning from data. This practical book shows you how.
By using concrete examples, minimal theory, and two production-ready Python frameworks—scikit-learn and TensorFlow—author Aurélien Géron helps you gain an intuitive understanding of the concepts and tools for building... View Details
Neural Networks for Pattern Recognition (Advanced Texts in Econometrics (Paperback))
by Christopher M. Bishop (Author)
This is the first comprehensive treatment of feed-forward neural networks from the perspective of statistical pattern recognition. After introducing the basic concepts, the book examines techniques for modeling probability density functions and the properties and merits of the multi-layer perceptron and radial basis function network models. Also covered are various forms of error functions, principal algorithms for error function minimalization, learning and generalization in neural networks, and Bayesian techniques and their applications. Designed as a text, with over 100 exercises, this... View Details
Neural Networks: Tricks of the Trade (Lecture Notes in Computer Science)
by Grégoire Montavon (Editor), Geneviève Orr (Editor), Klaus-Robert Müller (Editor)
The twenty last years have been marked by an increase in available data and computing power. In parallel to this trend, the focus of neural network research and the practice of training neural networks has undergone a number of important changes, for example, use of deep learning machines.
The second edition of the book augments the first edition with more tricks, which have resulted from 14 years of theory and experimentation by some of the world's most prominent neural network researchers. These tricks can make a substantial difference (in terms of speed, ease of implementation,... View Details