A study looking into the hidden formula that drives Metacritic made headlines this week, but Kotaku has discovered some critical errors that call it into question.
Earlier this week, Full Sail University professor Adams Greenwood-Ericksen held a GDC session in San Francisco in which he shared some of his research on the effects of Metacritic, the aggregation site that takes media reviews from hundreds of outlets and outputs them as a single number, or Metascore.
Metacritic has taken some heat over the past few years for refusing to reveal the formula they use to produce their scores. Itâs not a simple average: Metacritic admits they give more weight to some outlets when crunching the numbers. But theyâve never said how that weighting system works.
So when Greenwood-Ericksen said he had a model that replicated Metacriticâs scores, people took notice. Gamasutra ran an article titled âMetacriticâs weighting system revealed,â and it got a whole lot of video game developers and reporters talking. The system categorized outlets in six different âtiersâ and gave heavy weight to sites like IGN and Wired (and significantly less weight to other big sites like Giant Bomb).
Shortly afterwards, Metacritic came out firing. They took to Facebook to shoot down the formula, calling it âwildly, wholly inaccurate,â and they accused Gamasutra of running a misleading headline. (When reached by Kotaku for comment, Gamasutra editor Kris Graft apologized: âYeah, I feel that the main issue was a poor headline, and we apologize for the confusion over this. Itâs also unfortunate that a session with inaccurate information like this got into the show.â)
Some, however, have remained skeptical of Metacriticâs accusations, as the aggregator still wonât share the formula that they use.
However, today Kotaku discovered a flaw in Greenwood-Ericksenâs formula: at least two of the listed weightsâfor the outlets The Sixth Axis and Play UKâare incorrect.
Letâs start from the beginning. Greenwood-Ericksenâs modelâdevised based on Metacritic data spanning from 2005 or 2006 until 2011âassigns certain numerical weights, like 1.5 and 0.5, to each video game outlet. The formula: look at a video gameâs Metacritic page, take all of the review scores listed, multiply each one by the weight associated with its outlet, add them all together, and divide by the total number of scores. This model has successfully replicated something like 50 scores, Greenwood-Ericksen said.
Except, while plugging in the numbers and testing out the formula today, I discovered that the math just didnât work for the PS3 game Swords & Soldiers. When I tried to get the Metascore, I found that my results were 7-8 points off. (The math did work for some of the other games I experimented with, like Venetica.)
So I reached out to Greenwood-Ericksen, who Iâve been chatting with throughout the day.
âLooks like [Swords & Soldiers] was the development case for The Sixth Axis, and also for Play UK,â he told me via Gchat. âSo those two weights were actually set using that erroneous data.â
I asked exactly what that means.
âIt means you caught us making a mistake,â he said. âIt also means that at least one of those two weights of the 189 are probably off⊠So those particular weights are unreliable. The good news is that it suggests the process still works, one of us just made a mistake somewhere in applying it⊠Itâs embarrassing, certainly. On the one hand, Iâm glad somebody spotted the issue. On the other hand, I wish weâd done it before we were so far into the public spotlight.â
I asked what makes him think there are no other mistakes like this in the study.
âI donât think weâd have made a mistake like that one twice, but itâs always possible,â he said. âCertainly Iâm going to have to check our work over again to make sure.â
But the Full Sail professor doesnât believe that these flaws invalidate the study: the point, he says, was not to determine the values of each weight, but to show that itâs possible to figure out the weight behind each outlet.
Greenwood-Ericksen and I had a long conversation on the phone this morning, before I started digging into this formula. He wanted to make it clear that these weights are just one part of a larger studyâa study that makes a number of other conclusions about Metacritic, like its strong connections to sales dataâand he told me that the goal was never to show off an accurate model of how Metacritic weights scores.
âOne of the things that virtually everybody missed was that this was a model,â he said. âWe didnât go down under the basement with a flashlight and find out what the results were. A lot of words like ârevealedâ and âdiscoveredâ were all kinds of inaccurate.â
The professor said he was pleased by Metacriticâs Facebook response, even though the aggregator called his work inaccurate. Heâs pleased because it offered new information: Metacritic said they use fewer than six tiers, for example, and that publication weights are much closer together than they were in Greenwood-Ericksenâs model.
It seems like Greenwood-Ericksen is on the right track, even if the model didnât quite fit in this case. As he continues crunching numbers and trying to figure out exactly how each Metascore works, the truth behind this formula could eventually come out.
Greenwood-Ericksen said he wishes Metacritic would be more transparent about the formula that they use. Itâd certainly preempt situations like this.
âI think the communityâand Metacritic as wellâwould be better served by transparency on this,â he said. âPart of what makes them so unpopular and what creates so much resentment is that people have the sense that thereâs this sort of arbitrary magical process that produces this score. I donât think thatâs the case. I think Metacritic is actually trying very hard to get a reasonable score to represent the quality of the product.
âI think thatâs what comes across because theyâre opaque about this particular issue.â
Photo: Gualtiero Boffi/Shutterstock