Discussion about this post

User's avatar
Daniel Greco's avatar

This reminds me a bit of the old joke, usually targeting economists, about people who ask: "it works in practice, but does it work in theory?" I think it's hard to look at Tetlock's forecasting tournaments, where some people are clearly reliably outperforming others, and so clearly have skills/habits that it's worth trying to understand/evaluate/reproduce, and to think there's no there there. Likewise with the difference between Nate Silver style poll aggregation and its associated probabilistic forecasts vs. pundit-style one off predictions, or even just plain old weather forecasting. Would you prefer your local meteorologist to just say "rain tomorrow" or "no rain tomorrow" to the probabilistic forecasts they do provide? My strong guess is that a hedge fund that took on board as an internal rule--"no probabilistic forecasting"--would go broke fast.

Big picture, I think it makes more sense to accept that there's clearly a robust, well-justified practice of probabilistic forecasting, and to then task how we can evaluate probabilistic forecasts in light of the sorts of puzzles you raise above, rather than to treat those puzzles as genuinely threatening the coherence/justifiability of the practice.

On the dependency point, one natural move is to point out that nobody should think that calibration is the only desideratum in a set of probabilistic forecasts; accuracy matters too, and it's often easy to achieve calibration at the price of accuracy. E.g., suppose I'm trying to tell you how a series of coin tosses resulted, and it's not prediction; rather, I get to see how the coin landed, albeit from far away, so it's a demanding test of visual acuity. It's easy to achieve excellent calibration by just assigning each toss 50% probability of landing heads, 50% tails. But in that case I'm throwing out all the info I get from my eyes. If I try to use that info, I may end up straying from perfect calibration. There's a good chance that my best strategy if I care about accuracy (i.e., I don't care about calibration at all) is to aim to be as accurate as I can, and then, if I get feedback, see where I'm departing from perfect calibration and use that to improve my accuracy even more.

I think that's pretty similar to what's going on in your Nimoy case; there's a way to get perfect calibration in that case, but it involves giving up on accuracy. In general, I think you should think of calibration as a means to accuracy, rather than as an end in its own right; if you know a set of forecasts is uncalibrated, and how, then you can produce a set of forecasts that is more accurate than it. So aiming for calibration is best understood as aiming for a set of forecasts that isn't obviously less accurate than some identifiable other set of forecasts.

Expand full comment
Philippe Bélanger's avatar

When we assess the accuracy of probabilistic predictions we usually use a scoring rule like the Brier score. If you use such a rule in your Nimoy example, the score of your predictions worsens.

Expand full comment
7 more comments...

No posts