FREE! Subscribe to News Fetch, THE daily wine industry briefing - Click Here


Sponsored by:
Banner_Xpur_160x600---Wine-Industry-Insight[63]
InnoVint_WII_ad_portrait

Reviews and 5-Star ratings are so useless for recommendations that Netflix tossed its prized $1-million algorithm. They’re even worse for wine

Articles in this series include:


Screen Shot 2021-09-21 at 2.31.33 PM

In 2006, Netflix launched a competition (the Netflix Prize) that would award a prize of $1 million to the person or team that could improve its Cinematch five-star-based recommendation system by 10 percent.

 

Several relatively well-matched teams competed for the money and the geek-glory.

 

The prize was awarded to Team BellKor’s Pragmatic Chaos in 2009.

Stars Suck!

Screen Shot 2021-09-21 at 2.30.59 PM

Amazingly, Netflix never did use the algorithm because the five-star system was inaccurate and was used by very few viewers. Reviews also proved to be a mostly confusing non-starter for the purpose of recommending movies.

According to Netflix (Goodbye Stars, Hello Thumbs):

Netflix has had star ratings for much of our history, but we’ve learned through over a year of testing that while we’ve used stars to help you personalize your suggestions, many of our members are confused about what they do.

That’s because we’ve all gotten used to star ratings on e-commerce and review apps, where rating contributes to an overall average, and the star rating shown next to a restaurant or a pair of shoes is an average of all the reviewers. On those apps, being a reviewer can be fun and helpful to others, but the primary goal isn’t always to help you get better suggestions.

In contrast, when people see thumbs, they know that they are used to teach the system about their tastes with the goal of finding more great content. That’s why when we tested replacing stars with thumbs we saw an astounding 200% increase in ratings activity.

The 100-point system sucks too!

Screen Shot 2021-09-21 at 2.34.52 PM

More data needed

While the Thumbs-up/Thumbs-down system was more individually significant to the individual user, Netflix found that they needed additional data in order to create accurate individual recommendations. See: how Netflix’s Recommendations System Works.

 

The decision to punt on the five star system and reviews was further amplified by: Netflix doesn’t care whether you think the film is good—it just wants to know if you liked it.

Netflix star ratings & reviews all gone in 2018

While engineering costs have been cited as the reason Netflix never used the improved algorithm, top Netflix developers and executives have been clear that the real motivation was the limited, biased, and obsolete nature of consumer ratings and reviews.

 

According to a scholarly paper  (free PDF) published in 2015 by two of the key Netflix developers of the recommendation system: Carlos Gomez-Uribe, VP of product innovation and Chief Product Officer Neil Hunt:

Historically, the Netflix recommendation problem has been thought of as equivalent to the problem of predicting the number of stars that a person would rate a video after watching it, on a scale from 1 to 5.

We indeed relied on such an algorithm heavily when our main business was shipping DVDs by mail, partly because in that context, a star rating was the main feedback that we received that a member had actually watched [emphasis added] the video. The days when stars and DVDs were the focus of recommendations at Netflix have long passed….

There are much better ways of to help people find videos to watch than focusing only on those with a high predicted star rating.

Why have the days of stars and reviews long passed?

Before getting into the “much better ways to help people find videos to watch,” it’s useful to understand additional reasons why ratings, reviews, and the old fell short.

According to a published paper by the winners of the Netflix Prize, user ratings change over time, a phenomenon known as temporal effects.

1. Movie biases – movies go in and out of popularity over time…. This effect is relatively easy to capture, because we…. we have relatively many ratings per movie, what allows us to model these effects adequately.

2. User biases – users change their baseline ratings over time. For example, a user who tended to rate an average movie “4 stars.” may now rate such a movie “3 stars.”…. Such effects can stem from many reasons. For example, it is related to a natural drift in a user’s rating scale, to the fact that ratings are given in relevance to other ratings that were given recently. [T]he effective user bias on a day can be significantly different than the user bias on the day earlier or the day after. [This] stems from the fact that users are usually associated with only a handful of ratings….

3. User preferences – users change their preferences over time. For example, a fan of the “psychological thrillers” genre may become a fan of “crime dramas” a year later. Similarly, humans change their perception on certain actors and directors.

Why data?

According to the paper  (free PDF) by Gomez-Uribe and Hunt:

Now, we stream the content, and have vast amounts of data that describe what each Netflix member watches, how each member watches (e.g., the device, time of day, day of week, intensity of watching), the place in our product in which each video was discovered, and even the recommendations that were shown but not played in each session….

Now, our recommender system consists of a variety of algorithms that collectively define the Netflix experience, most of which come together on the Netflix homepage.

Thumbs-up/Thumbs-down first applied to wine by SavvyTaste years prior to Netflix

As illustrated by this screenshot of a now-defunct 2010 Facebook app shows, the concept was first tested by a wine site/Facebook app called SavvyTaste.

 

Screen Shot 2021-02-02 at 8.58.10 AM

savvttaste

 

SavvyTaste began with the concept of eliminating the psychological biases in numerical ratings by a different way of representing them.

 

Screen Shot 2021-02-02 at 9.37.02 AM

 

That system eventually graduated to the binary ThumbsUp/Down as illustrated in the Facebook images.


 

Additional Reading:

When 4.3 Stars Is Average: The Internet’s Grade-Inflation Problem

Rating The Rating Systems

Understanding and overcoming issues of biases in online review systems

The Problem With Online Ratings