Stats wizard David Morrison has a compelling argument that scores between wine critics cannot be compared except in the rare case in which two critics use the same objective scoring scheme. I actually think the situation is worse than David suggests, but more on that in a moment.
His reasoning is as follows.
I have finally concluded that there are two fundamentally different sorts of wine-quality scores in use: (1) what we might call an objective score, based on explicitly assigning points to a series of pre-defined wine characteristics, and then summing them to get the wine score; and (2) subjective (but expert) scores, where the overall score comes from whatever characteristics the scorer wants to express.
David is not referring to the way the scores are expressed via a 100 pt. scale vs a 20 pt. scale. Rather he is referring to the underlying method by which the scores are created. With objective scoring there is only one scoring scheme so a disagreement among critics would be a genuine disagreement about wine quality. But when critics choose their own idiosyncratic scoring system a disagreement about scores may reflect a different scoring method or different interpretation of the method rather than a difference in wine quality.
The same scores could mean different qualities (because the scoring schemes are different), and different scores could mean the same quality (because the scoring schemes are different). How on earth are we to know? We can’t!
I think David is exactly right about subjective scores. But I also think what he calls objective scores are also subject to varying interpretations. For example, suppose we develop an objective scoring scheme based on the following pre-defined characteristics with a numerical scale for each characteristic.
We will assess wines for intensity, complexity, balance, length, typicity, and mouthfeel. We will assess every wine using a numerical scale from 0-9 and sum the results producing a score at the end. These characteristics don’t begin to capture wine quality but adding a more complicated scheme will only increase the problem I want to identify.
The problem is, in this scheme, all the criteria are assigned equal weight—intensity counts the same as balance, etc. But why assume each criterion is of equal value especially across all wines and all varietals? Pinot Noir will not have the length of most Cabernets. A shorter finish may not damage Pinot Noir in the way it would disappoint in the Cab. A wine even slightly out of balance will suffer in quality despite making up for that deficit in its intensity and complexity. Typicity is important for some purposes but to give it equal weight in every case would disadvantage wines designed to be atypical. A young, old-school Barolo will have mouth ripping tannins undermining mouthfeel so we would be forced to discount mouthfeel or make a guess about how it will develop.
For a meaningful objective scoring scheme we would have to find some objective way of weighting the various criteria but that would have to be specific to varietal, region, and style since these all require different values.
Even if we were to manage to develop such a scheme, the most important factor in wine quality at least according some critics—the degree to which a wine expresses the distinctive features of the vineyard—is not and could not be in the picture. There is no objective measure of such a quality.
The whole idea of an objective scoring system is hopeless.
So why do I use scores in my reviews you might ask? Because they are useful in assessing how much a critic enjoyed the wine. But they mean nothing more than that. They are a subjective measure of the degree of quality I found in a wine when compared to other wines I’ve tasted. Nothing more, but nothing less. This information is meaningful to the degree you find a particular critic’s palate trustworthy.
A wine score is an invitation to try the wine, not a data point in a competition.