Can Wine Be Rated Like Chess Players?

chess board with wine bottlesGeorge Nordahl on his Substack has an interesting proposal for rethinking wine scores.

Of course problems with the current system are well-known and have been endlessly discussed. Wine scores pretend to be precise, objective measurements when they’re really blunt compressions of a wildly multidimensional experience. A single number has to stand in for aroma, texture, acidity, tannin, balance, evolution in the glass, context of drinking, and even the drinker’s mood—variables that can’t easily been quantified and certainly don’t line up on a  linear scale.

And crowd sourced reviews import their own distortions. Bottles opened  too young or stored improperly are only the beginning. They are a popularity contest, not an evaluation of quality. Lots of evaluations by people who don’t know what they’re talking about don’t add up to wisdom when collated.

So Nordahl asks us to imagine rating wine the way the World Chess Federation rates chess players, not by asking how good a wine is on an arbitrary scale but by tracking what happens when it meets another wine in a head-to-head contest. The result is called an Elo score named after Arpad Emmerich Elo who invented the scoring system.

I’m no expert on chess or its ranking system but you can check out Nordahl’s article for more details.

But applied to wine it apparently works like this: Each wine starts with a provisional Elo number (say 1500). When you taste two wines side by side, you record a result: Wine A “wins” if you prefer it, Wine B wins if you prefer that one, and it’s a draw if you can’t choose a winner. Before that comparison, the system has calculated an expected outcome from previous comparisons. That expected outcome is derived from a formula based on the principle if A is rated higher than B, A is expected to win more often. If B is lower rated, it is the underdog. That standard is quantified in the formula that determines how many points a win or loss in the current comparison earns.

Then the score is a matter of  bookkeeping applying the formula to come up with a new Elo number for each wine.  If the favorite wins, it gains only a few points and the underdog loses only a few. If the underdog upsets the favorite, the underdog gains a lot and the favorite drops a lot. And there is a scaling factor (called the K value) which controls how fast ratings move: high K for new wines with little data (so they “find their level” quickly), lower K for wines with many tastings (so one odd comparison doesn’t affect the ratings inordinately).

Over time, thousands of these tiny duels create a  dynamic hierarchy. Importantly, the system doesn’t need to reward any style in advance. A taut, mineral white can climb by beating wines it actually outperforms in direct comparisons; a plush, oaky red can do the same. “Greatness” becomes the pattern of victories against strong opponents, not a single critic’s number.

You could run this at two levels: a global Elo for commercial rankings, and a personal Elo that tracks your palate, because your “wins” are your preferences. The result is a map of wine quality as a relation, updated every time you pour two glasses.

Now as Nordahl suggests, this has a variety of benefits over our current method:

It sidesteps the impossible expectation that consumers (or indeed critics) can quantify their experience on an arbitrary scale and instead asks a far simpler, and much less cognitively taxing question, namely “Which of these two wines do you prefer right now?”. The mathematics of the Elo formula then transforms this intuitive choice into a continuously updating system reflecting both the strength of past favourites and the impact of new discoveries…..By pitting wines up against each other directly, asking “is wine A better than wine B?”, only the immediate result matters, removing any emotional friction which might be triggered when assigning too high or too low a rating. As a result, it should also avoid the issue of rating inflation altogether. It would also put a stopper to the concept that any wine can be perfect, with the maximum achievable Elo score for any wine being relative to all other wines it is compared to…The real power of the Elo rating of course lies in the fact that the quality assessment is relative to other wines, rather than being relative to an abstract number, which is much more in line with how we think of, and process the concept of quality as humans.

I think Nordahl is right about the benefits of such a system but I also have reservations:

The basic problem is that skill at chess doesn’t vary much with context. The Elo score is an indirect measure of a stable disposition that doesn’t change a lot from match to match. Players improve of course but only gradually and it is reasonable to think that gradual improvement is captured in victories over opponents.

But wine preference are not so stable; they vary with context. Elo assumes that if A tends to beat B, and B tends to beat C, A should tend to beat C. But wine preference is often non-transitive because different dimensions dominate in different matchups. For example, a lean Loire Chenin might beat a flabby Chardonnay because it feels alive and precise. But that same Chenin might then lose to an oxidative Jura white because the drinker, in that moment, wants intensity and weirdness. But then the Jura loses to the Chardonnay because with those scallops in that butter sauce, the Chardonnay made more sense.

So it looks like context is a kind of hidden referee. Match conditions matter enormously in wine: food, the occasion, mood, fatigue, the order in which you sample, or social influence can shape preferences. Unless you control for context, Elo will rate the situation as much as the wine. So we’ve traded 93 pts of faux objectivity for a mathematical summary of uncontrolled variables.

Furthermore, won’t popularity and sampling bias the results at least if we tried to scale up Elo using popular pairwise comparisons. In chess, top players are forced into a meaningful competition. In wine, what gets tasted gets rated. A wine that’s widely distributed, poured at parties, and repeatedly paired against supermarket peers will rack up wins and stabilize at a high score without ever facing “tough opponents.” On the other hand, genuinely great but rare wines may have too few matches to climb the ladder. It seems to me Elo produces a ranking of what is consumed, not necessarily quality.

And does this kind of system not invite gaming? If Elo is an open rating system it attracts  coordinated voting, brands manufacturing easy wins by curating opponents, and producer driven tasting events engineered for favorable pairings. Chess, I imagine, has a system for preventing cheating. Wine surely does not.

Perhaps none of this is fatal to the idea. Perhaps there are ways of creating match conditions that avoid some of these contextual variables.

It occurs to me that to make this work we have to be careful about choosing comparison classes.

A Napa Cabernet “beating” a New Zealand Sauvignon Blanc is like declaring a bulldozer superior to a violin because it wins in a head-to-head contest of “impact.” Impact on what? The palate? The meal? The mood? That a Cabernet beats a Sauvignon Blanc even repeatedly gives me no information because i have no idea what is being assessed in the comparison. But comparing two Cabernet Sauvignons would be useful because at least I know what is being compared is “Cabernet Sauvignon-ness.”

An Elo-for-wine that actually respects wine would need comparison classes that are deliberately constructed, not passively inherited from the market. You could compare “aperitif whites,” “steak reds,” “meditation wines,” “summer porch wines” . Or organize comparisons according to style families such as acid-driven whites, aromatic whites, or oxidative whites; or Classic Napa Cabernets against modern styles, etc. With the right comparisons we have some idea about what a win means.

Comparison classes prevent Elo from becoming a popularity test. If the “league” is defined by stylistic intent and function, then the question stops being “what do most people like?” and becomes “what wins among wines trying to do a similar thing?”

Which is basically what good criticism has always tried to do. The Elo system is worth a shot because the present system is broken.

And Nordahl has built an app for the wind Elo system if you have an IPhone. (I don’t so was unable to test it out.)

  • .

 

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.