Researchers devise better recommendation algorithm

December 06, 2017

The recommendation systems at websites such as Amazon and Netflix use a technique called "collaborative filtering." To determine what products a given customer might like, they look for other customers who have assigned similar ratings to a similar range of products, and extrapolate from there.

The success of this approach depends vitally on the notion of similarity. Most recommendation systems use a measure called cosine similarity, which seems to work well in practice. Last year, at the Conference on Neural Information Processing Systems, MIT researchers used a new theoretical framework to demonstrate why, indeed, cosine similarity yields such good results.

This week, at the same conference, they are reporting that they have used their framework to construct a new recommendation algorithm that should work better than those in use today, particularly when ratings data is "sparse" -- that is, when there is little overlap between the products reviewed and the ratings assigned by different customers.

The algorithm's basic strategy is simple: When trying to predict a customer's rating of a product, use not only the ratings from people with similar tastes but also the ratings from people who are similar to those people, and so on.

The idea is intuitive, but in practice, everything again hinges on the specific measure of similarity.

"If we're really generous, everybody will effectively look like each other," says Devavrat Shah, a professor of electrical engineering and computer science and senior author on the paper. "On the other hand, if we're really stringent, we're back to effectively just looking at nearest neighbors. Or putting it another way, when you move from a friend's preferences to a friend of a friend's, what is the noise introduced in the process, and is there a right way to quantify that noise so that we balance the signal we gain with the noise we introduce? Because of our model, we knew exactly what is the right thing to do."

All the angles

As it turns out, the right thing to do is to again use cosine similarity. Essentially, cosine similarity represents a customer's preferences as a line in a very high-dimensional space and quantifies similarity as the angle between two lines.

Suppose, for instance, that you have two points in a Cartesian plane, the two-dimensional coordinate system familiar from high school algebra. If you connect the points to the origin -- the point with coordinates (0, 0) -- you define an angle, and its cosine can be calculated from the point coordinates themselves.

If a movie-streaming service has, say, 5,000 titles in its database, then the ratings that any given user has assigned some subset of them defines a single point in a 5,000-dimensional space. Cosine similarity measures the angle between any two sets of ratings in that space.

When data is sparse, however, there may be so little overlap between users' ratings that cosine similarity is essentially meaningless. In that context, aggregating the data of many users becomes necessary.

The researchers' analysis is theoretical, but here's an example of how their algorithm might work in practice. For any given customer, it would select a small set -- say, five -- of those customers with the greatest cosine similarity and average their ratings. Then, for each of those customers, it would select five similar customers, average their ratings, and fold that average into the cumulative average. It would continue fanning out in this manner, building up an increasingly complete set of ratings, until it had enough data to make a reasonable estimate about the rating of the product of interest.

Filling in blanks

For Shah and his colleagues -- first author Christina Lee PhD '17, who is a postdoc at Microsoft Research, and two of her Microsoft colleagues, Christian Borgs and Jennifer Chayes -- devising such an algorithm wasn't the hard part. The challenge was proving that it would work well, and that's what the paper concentrates on.

Imagine a huge 2-D grid that maps all of a movie-streaming service's users against all its titles, with a number in each cell that corresponds to a movie that a given user has rated. Most users have rated only a handful of movies, so most of the grid is empty. The goal of a recommendation engine is to fill in the empty grid cells as accurately as possible.

Ordinarily, Shah says, a machine-learning system learns two things: the features of the data set that are useful for prediction, and the mathematical function that computes a prediction from those features. For purposes of predicting movie tastes, useful features might include a movie's genre, its box office performance, the number of Oscar nominations it received, the historical box-office success of its leads, its distributor, or any number of other things.

Each of a movie-streaming service's customers has his or her own value function: One might be inclined to rate a movie much more highly if it fits in the action genre and has a big budget; another might give a high rating to a movie that received numerous Oscar nominations and has a small, arty distributor.

Playing the odds

In the new analytic scheme, "You don't learn features; you don't learn functions," Shah says. But the researchers do assume that each user's value function stays the same: The relative weight that a user assigns to, say, genre and distributor doesn't change. The researchers also assume that each user's function is operating on the same set of movie features.

This, it turns out, provides enough consistency that it's possible to draw statistical inferences about the likelihood that one user's ratings will predict another's.

"When we sample a movie, we don't actually know what its feature is, so if we wanted to exactly predict the function, we wouldn't be able to," Lee says. "But if we just wanted to estimate the difference between users' functions, we can compute that difference."

Using their analytic framework, the researchers showed that, in cases of sparse data -- which describes the situation of most online retailers -- their "neighbor's-neighbor" algorithm should yield more accurate predictions than any known algorithm.

Translating between this type of theoretical algorithmic analysis and working computer systems, however, often requires some innovative engineering, so the researchers' next step is to try to apply their algorithm to real data.
Additional background

PAPER: Thy friend is my friend: Iterative collaborative filtering for sparse matrix estimation

ARCHIVE: What buyers want

ARCHIVE: Recommendation theory

ARCHIVE: Predicting what topics will trend on Twitter

ARCHIVE: Improving recommendation systems

Massachusetts Institute of Technology

Related Algorithm Articles from Brightsurf:

CCNY & partners in quantum algorithm breakthrough
Researchers led by City College of New York physicist Pouyan Ghaemi report the development of a quantum algorithm with the potential to study a class of many-electron quantums system using quantum computers.

Machine learning algorithm could provide Soldiers feedback
A new machine learning algorithm, developed with Army funding, can isolate patterns in brain signals that relate to a specific behavior and then decode it, potentially providing Soldiers with behavioral-based feedback.

New algorithm predicts likelihood of acute kidney injury
In a recent study, a new algorithm outperformed the standard method for predicting which hospitalized patients will develop acute kidney injury.

New algorithm could unleash the power of quantum computers
A new algorithm that fast forwards simulations could bring greater use ability to current and near-term quantum computers, opening the way for applications to run past strict time limits that hamper many quantum calculations.

QUT algorithm could quash Twitter abuse of women
Online abuse targeting women, including threats of harm or sexual violence, has proliferated across all social media platforms but QUT researchers have developed a sophisticated statistical model to identify misogynistic content and help drum it out of the Twittersphere.

New learning algorithm should significantly expand the possible applications of AI
The e-prop learning method developed at Graz University of Technology forms the basis for drastically more energy-efficient hardware implementations of Artificial Intelligence.

Algorithm predicts risk for PTSD after traumatic injury
With high precision, a new algorithm predicts which patients treated for traumatic injuries in the emergency department will later develop posttraumatic stress disorder.

New algorithm uses artificial intelligence to help manage type 1 diabetes
Researchers and physicians at Oregon Health & Science University have designed a method to help people with type 1 diabetes better manage their glucose levels.

A new algorithm predicts the difficulty in fighting fire
The tool completes previous studies with new variables and could improve the ability to respond to forest fires.

New algorithm predicts optimal materials among all possible compounds
Skoltech researchers have offered a solution to the problem of searching for materials with required properties among all possible combinations of chemical elements.

Read More: Algorithm News and Algorithm Current Events is a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn advertising fees by advertising and linking to