When Yelp’s stock shot up 63 percent on March 2, its first full day of trading, commentators couldn’t resist pointing out that investors had given the user-generated ratings site “a five-star review.” The one-to-five scale is everywhere on the web, inviting surfers to become critics: Amazon, Netflix, and the iTunes app store also employ it. There’s just one problem: This democratization of reviewing tends to produce aggregate scores that reveal nothing much at all.
The earliest-known star rankings appeared in 1926, when Michelin published a guide to the Brittany region of France. Michelin began with a single-star system, then went up to three. In 1958, as Americans began to hit the new interstate highways, Mobil put out its own travel guide and kicked it up America style by adding two more. But in the hands of amateurs, these professional tools go awry. Most online reviewers, the data show, are either cranks or starry-eyed fanatics—and in this supposedly snarky age there are a lot more of the latter than the former. In 2009, The Wall Street Journal found that the average rating in a five-star system, Internet-wide, was a 4.3, suggesting a world of uniformly awesome products, services, and experiences.
User-feedback expert Randy Farmer, co-author of Building Web Reputation Systems, calls this pattern “the J-curve.” (Picture a chart with ratings along the x-axis and the number of users choosing that rating along the y-axis. A few ones, a dip in the two-to-four region, and a proliferation of fives gives you a J-shape.) YouTube used to be an egregious J-curve offender; a few years ago, product manager Shiva Rajaraman posted a graph on the company’s blog indicating that the average rating on the site was roughly 4.8. The company’s solution was to replace the star system with “like” and “dislike” options. Few users bother to give a thumbs down (a video of Farmer speaking has gotten 82 likes against three dislikes), which means that ratings just end up as a crude measure of popularity.
Yelp’s users produce an average rating of about 3.8—somewhat more discriminating, but still suggesting the consumer can seldom go wrong. Out of 470 options for “cheap dinner” in the East Village, only seven earn two stars or less. Netflix tried to introduce more subtlety to its five-star system by adding half-star increments. But perhaps overwhelmed by the larger scale, fewer users left ratings—and Netflix went back to round numbers.
How to compel more discernment? A poster on techie-dominated discussion site Quora suggested a seven-point scale, which seems like it might just make six the new four. MIT professor Devavrat Shah proposes a more complex fix: Rather than having users rate individual items, Shah believes that they should always be forced to choose between two or more. You can play with a sample version of Shah’s “Celect” system on an MIT website ranking ten movies. At the most recent tally, The Matrix leads Citizen Kane by three spots.