Mining for gold in a mountain of data

February 15, 2018

After shopping at your favorite grocery store week after week, you finally earned a free turkey.

The cashier scanned your loyalty card at every checkout, rewarding you with points toward a holiday turkey or ham - while at the same time sending an itemization of everything you bought to a database.

The grocery analyzes the "big data" it collects from you and other shoppers to uncover hidden patterns, correlations and other insights. The result is smarter business moves, more efficient operations, higher profits and the promise of happier customers.

Researchers estimate that more than 2.5 exabytes (that's 2.5 billion gigabytes) of data are generated throughout the world every day. The use of loyalty cards, fitness trackers, web-based email, public services and social media - including every post, comment, like, photo and geotag - all contribute to this vast warehouse of information.

Big data involves not only collecting data, but storing, analyzing, searching, sharing, transferring, visualizing, querying and updating it. In fact, big data is so voluminous and complex that traditional ways of processing have proved inadequate. Hundreds or even thousands of computers running in parallel are needed for proper analyses.

To help address these computational bottlenecks, a team from the Industrial and Systems Engineering department at Lehigh University gathered with their colleagues at King Abdullah Univ. of Science and Technology in Saudi Arabia Feb. 5-7, 2018.

The KAUST Research Workshop on Optimization and Big Data brought researchers from across academia and industry to discuss big data optimization algorithms, theory, applications and systems.

Tamás Terlaky, the George N. and Soteria Kledaras '87 Endowed Chair professor, was the keynote speaker at KAUST. Terlaky opened the workshop with his presentation, "60 Years of Interior Point Methods: From Periphery to Glory."

Terlaky's keynote focused on a technique pioneered in 1984 known as Interior Point Method (IPM). This novel methodology ignited far-reaching, intensive research toward discovering effective ways to solve large-scale optimization problems such as those found in big data analytics.

"Increasingly, we are getting different kinds of solutions in optimization," Terlaky said. "Computation has become ubiquitous, and thanks also to the 'Interior Point Revolution' we have seen tremendous advances in computing."

The concepts of IPMs and "machine learning" - where computers acquire the ability to learn and make decisions - were first proposed in the '50s and were ahead of their time, Terlaky said. With computer technology still in its infancy, they failed to make any real impact. By the '80s, however, the stars aligned to make the IPM revolution possible.

Now that we are in the era of big data, Terlaky said, recent advances in computer and information technology both enables and requires revolutionary advances in machine learning methodologies. "History always repeats itself," Terlaky said. "You should learn from it."

Terlaky concluded the day's session with "A Polynomial-time Rescaled von Neumann Algorithm for Linear Feasibility Problems."

Furthering the discussion on optimization was Katya Scheinberg, Harvey E. Wagner Endowed Chair professor, with her presentation "Direct and Efficient Optimization of Prediction Error and AUC of Linear Classifiers." Also presenting was Martin Takáč, assistant professor, with "SARAH: A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient."

Xi He, a fourth-year Ph.D. candidate, gave the poster presentation "An Inexact Regularized Stochastic Newton Method for Nonconvex Optimization." Takáč is his advisor.

Abstracts of the four talks and the poster presentation are given below.

60 Years of Interior Point Methods: From Periphery to Glory - Tamás Terlaky

The basic concepts of Interior Point Methods (IPMs) were introduced by Frish in 1950s, and further developed in the 1960s, among others by Fiacco-McCormick (SUMT) and Dikin (Affince scaling). By the early 70s it was concluded that, mostly due to numerical instability, IPMs most probably will not be viable algorithms for solving large scale optimization problems. Karmarkar's 1984 paper and the subsequent "Interior Point Revolution" fundamentally changed the landscape of optimization. IPMs become the method of choice to solve large-scale linear optimization problems and new classes of conic and convex optimization problems become efficiently solvable. The new powerful algorithmic and software tools opened new areas of applications. In this talk we walk through the history of IPMs, highlight the scientific and computer technology advances that make the Interior Point revolution possible.

A Polynomial-time Rescaled von Neumann Algorithm for Linear Feasibility Problems - Tamás Terlaky

The perceptron and von Neumann algorithms are known to be closely related, like duals. A deterministic rescaled version of the perceptron algorithm was proved to be polynomial by Pena and Soheil. Recently, Chubanov proposed a method which solves homogeneous linear equality systems with positive variables in polynomial time. Chubanov's method can be considered as a column-wise rescaling procedure. We adapt Chubanov's method to the von Neumann problem, and so we design a polynomial time column-wise rescaling von Neumann algorithm. This algorithm is the first variant of the von Neumann algorithm with polynomial complexity. Joint work with Dan Li and Kees Roos.

Direct and Efficient Optimization of Prediction Error and AUC of Linear Classifiers - Katya Scheinberg

The predictive quality of most machine learning models is measured by expected prediction error or so-called Area Under the Curve (AUC). However, these functions are not used in the empirical loss minimization, because their empirical approximations are nonconvex and discontinuous, and more importantly have zero derivative almost everywhere. Instead, other loss functions are used, such as logistic loss. In this work, we show that in the case of linear predictors, and under the assumption that the data has normal distribution, the expected error and the expected AUC are not only smooth, but have well defined derivatives, which depend on the first and second moments of the distribution. We show that these derivatives can be approximated and used in empirical risk minimization, thus proposing a gradient-based optimization method for direct optimization of prediction error and AUC. Moreover, the proposed algorithm has no dependence on the size of the dataset, unlike logistic regression and all other well-known empirical risk minimization techniques.

SARAH: A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient - Martin Takáč

In this paper, we propose a StochAstic Recursive grAdient algoritHm (SARAH), as well as its practical variant SARAH+, as a novel approach to the finite-sum minimization problems. Different from the vanilla SGD and other modern stochastic methods such as SVRG, S2GD, SAG and SAGA, SARAH admits a simple recursive framework for updating stochastic gradient estimates; when comparing to SAG/SAGA, SARAH does not require a storage of past gradients. The linear convergence rate of SARAH is proven under strong convexity assumption. We also prove a linear convergence rate (in the strongly convex case) for an inner loop of SARAH, the property that SVRG does not possess. The convergence rate for convex and non-convex case is also discussed. Numerical experiments demonstrate the efficiency of our algorithm. See the paper published in Proceedings of Machine Learning Research.

An Inexact Regularized Stochastic Newton Method for Nonconvex Optimization - Xi He
Key Links:

Department of Industrial and Systems Engineering, Lehigh University
Faculty profile and research website: Tamás Terlaky
Faculty profile and research website: Katya Scheinberg
Faculty profile and research website: Martin Takáč
Student profile and research website: Xi He

Lehigh University

Related Algorithm Articles from Brightsurf:

CCNY & partners in quantum algorithm breakthrough
Researchers led by City College of New York physicist Pouyan Ghaemi report the development of a quantum algorithm with the potential to study a class of many-electron quantums system using quantum computers.

Machine learning algorithm could provide Soldiers feedback
A new machine learning algorithm, developed with Army funding, can isolate patterns in brain signals that relate to a specific behavior and then decode it, potentially providing Soldiers with behavioral-based feedback.

New algorithm predicts likelihood of acute kidney injury
In a recent study, a new algorithm outperformed the standard method for predicting which hospitalized patients will develop acute kidney injury.

New algorithm could unleash the power of quantum computers
A new algorithm that fast forwards simulations could bring greater use ability to current and near-term quantum computers, opening the way for applications to run past strict time limits that hamper many quantum calculations.

QUT algorithm could quash Twitter abuse of women
Online abuse targeting women, including threats of harm or sexual violence, has proliferated across all social media platforms but QUT researchers have developed a sophisticated statistical model to identify misogynistic content and help drum it out of the Twittersphere.

New learning algorithm should significantly expand the possible applications of AI
The e-prop learning method developed at Graz University of Technology forms the basis for drastically more energy-efficient hardware implementations of Artificial Intelligence.

Algorithm predicts risk for PTSD after traumatic injury
With high precision, a new algorithm predicts which patients treated for traumatic injuries in the emergency department will later develop posttraumatic stress disorder.

New algorithm uses artificial intelligence to help manage type 1 diabetes
Researchers and physicians at Oregon Health & Science University have designed a method to help people with type 1 diabetes better manage their glucose levels.

A new algorithm predicts the difficulty in fighting fire
The tool completes previous studies with new variables and could improve the ability to respond to forest fires.

New algorithm predicts optimal materials among all possible compounds
Skoltech researchers have offered a solution to the problem of searching for materials with required properties among all possible combinations of chemical elements.

Read More: Algorithm News and Algorithm Current Events is a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn advertising fees by advertising and linking to