![]() |
![]() ALSO SPONSORED BY: ![]()
Wine Industry Insight |
![]() |
In 2006, Netflix launched a competition (the Netflix Prize) that would award a prize of $1 million to the person or team that could improve its Cinematch five-star-based recommendation system by 10 percent.
Several relatively well-matched teams competed for the money and the geek-glory.
The prize was awarded to Team BellKor’s Pragmatic Chaos in 2009.
Amazingly, Netflix never did use the algorithm because the five-star system was inaccurate and was used by very few viewers. Reviews also proved to be a mostly confusing non-starter for the purpose of recommending movies.
According to Netflix (Goodbye Stars, Hello Thumbs):
Netflix has had star ratings for much of our history, but we’ve learned through over a year of testing that while we’ve used stars to help you personalize your suggestions, many of our members are confused about what they do.
That’s because we’ve all gotten used to star ratings on e-commerce and review apps, where rating contributes to an overall average, and the star rating shown next to a restaurant or a pair of shoes is an average of all the reviewers. On those apps, being a reviewer can be fun and helpful to others, but the primary goal isn’t always to help you get better suggestions.
In contrast, when people see thumbs, they know that they are used to teach the system about their tastes with the goal of finding more great content. That’s why when we tested replacing stars with thumbs we saw an astounding 200% increase in ratings activity.
While the Thumbs-up/Thumbs-down system was more individually significant to the individual user, Netflix found that they needed additional data in order to create accurate individual recommendations. See: how Netflix’s Recommendations System Works.
The decision to punt on the five star system and reviews was further amplified by: Netflix doesn’t care whether you think the film is good—it just wants to know if you liked it.
While engineering costs have been cited as the reason Netflix never used the improved algorithm, top Netflix developers and executives have been clear that the real motivation was the limited, biased, and obsolete nature of consumer ratings and reviews.
According to a scholarly paper (free PDF) published in 2015 by two of the key Netflix developers of the recommendation system: Carlos Gomez-Uribe, VP of product innovation and Chief Product Officer Neil Hunt:
Historically, the Netflix recommendation problem has been thought of as equivalent to the problem of predicting the number of stars that a person would rate a video after watching it, on a scale from 1 to 5.
We indeed relied on such an algorithm heavily when our main business was shipping DVDs by mail, partly because in that context, a star rating was the main feedback that we received that a member had actually watched [emphasis added] the video. The days when stars and DVDs were the focus of recommendations at Netflix have long passed….
There are much better ways of to help people find videos to watch than focusing only on those with a high predicted star rating.
Before getting into the “much better ways to help people find videos to watch,” it’s useful to understand additional reasons why ratings, reviews, and the old fell short.
According to a published paper by the winners of the Netflix Prize, user ratings change over time, a phenomenon known as temporal effects.
1. Movie biases – movies go in and out of popularity over time…. This effect is relatively easy to capture, because we…. we have relatively many ratings per movie, what allows us to model these effects adequately.
2. User biases – users change their baseline ratings over time. For example, a user who tended to rate an average movie “4 stars.” may now rate such a movie “3 stars.”…. Such effects can stem from many reasons. For example, it is related to a natural drift in a user’s rating scale, to the fact that ratings are given in relevance to other ratings that were given recently. [T]he effective user bias on a day can be significantly different than the user bias on the day earlier or the day after. [This] stems from the fact that users are usually associated with only a handful of ratings….
3. User preferences – users change their preferences over time. For example, a fan of the “psychological thrillers” genre may become a fan of “crime dramas” a year later. Similarly, humans change their perception on certain actors and directors.
According to the paper (free PDF) by Gomez-Uribe and Hunt:
Now, we stream the content, and have vast amounts of data that describe what each Netflix member watches, how each member watches (e.g., the device, time of day, day of week, intensity of watching), the place in our product in which each video was discovered, and even the recommendations that were shown but not played in each session….
Now, our recommender system consists of a variety of algorithms that collectively define the Netflix experience, most of which come together on the Netflix homepage.
As illustrated by this screenshot of a now-defunct 2010 Facebook app shows, the concept was first tested by a wine site/Facebook app called SavvyTaste.
SavvyTaste began with the concept of eliminating the psychological biases in numerical ratings by a different way of representing them.
That system eventually graduated to the binary ThumbsUp/Down as illustrated in the Facebook images.
Additional Reading:
When 4.3 Stars Is Average: The Internet’s Grade-Inflation Problem
Understanding and overcoming issues of biases in online review systems
The Problem With Online Ratings