Wednesday, March 24, 2004

It's Heteroskedastic!

PECOTA fans at Baseball Prospectus have been pleased to call their projection system "deadly accurate" on the outside blurb on the back cover. But after reading their caveats about the system, I'm increasingly inclined to think that the authors thereof are waffling far more than they let on. Here's a couple key phrases appearing therein:
The Five-Year Performance forecast measures a hitter's EQA at various percentiles over the course of the next five seasons. Unlike the Value forecast, the Performance forecast has no convenient way to adjust for dropped comparables, and so it simply ignores them. For this reason, the Performance forecast may be unreliable for players whose comparables have a high attrition rate. [emphasis mine]
And here's a real damning one, on the subject of how comparables are generated:
In most cases, the database is large enough to provide a meaningfully large set of comparables. When it isn't, the program is designed to 'cheat' by expanding its tolerance for dissimilar players until a reasonable sample size is reached. In the case of very old or very young hitters, there may not be a significant number of hitters who played at that age, and so the results of their forecast may be less reliable.
In other words, if our system isn't working, we expand the error bands and claim that it does, in fact, work. What bugs me about this is that it introduces a problem into the comparables statistics because the data is (or at least, is very likely to be) heteroskedastic. That is, the errors tend to change depending on who you use as your "comparables"; and admitting others into the mix who aren't really close matches just amplifies that problem. This gets really bad for guys like Albert Pujols, of whom there have been only a handful of similar players (18 selected by PECOTA), all of whom are at least All-Stars or Hall of Famers, and Adrian Beltre, with 36 comparables. (Subscriptions required for all these PECOTA card links.). Beltre's early years predicted legitimate stardom, but seems to have been ruined by his appendectomy. Unfortunately, the bad news for Eckstein is that he has 47 comparables, which doesn't leave me feeling too good about his ability to stay healthy.


