Monday, December 03, 2007

Hitting Overachievers

OK, this is the clearest explanation I can come up with for why Scott McLeod's analysis of gross state production statistics is just meaningless (see also the less clear explanation and subsequent discussion).

Gross State Product (GSP) measures the economic output of a state. It is an important number, but not very useful for comparisons between states, one obvious reason being that some states are vastly larger than others. So statistic that mitigates that problem is per-capita GSP, the GSP divided by the population of the state.

Now, let's look at baseball. One important number is hits. The problem with using this number to compare players is that all players don't have the same number of chances to get hits. So in 1869 Henry Chadwick started publishing a table of hits divided by at bats -- the batting average -- which allows you to compare the effectiveness of hitters who have a different number of at bats.

In Scott's off the cuff analysis, he's finding the difference between the rank of each state by the raw (not good for comparison) number and rank of the state by the average (better for comparison) number. In baseball terms, this would be like comparing a hitter's rank in hits to their rank in batting average. For example, for 2007 (hits, avg):

  • ISuzuki: -1 (#1 hits, #2 avg)
  • M Holiday: -2 (#2 hits, #4 avg)
  • M Ordonez: +2 (#3 hits, #1, avg)
  • H Ramirez: -3 (#4 hits, #7 avg)
  • A Rodriguez: +3 (#24 hits, #27 avg)
  • D Pedroia: +33 (#53 hits, #20 avg)
  • C Beltran: -18 (#74 hits, #92 avg)

This information (that is, the difference value) is useless. Yes, a high number is good, because it means you have a high batting average. But this additional calculation implies that a good batting average with few at bats (thus a lower hit rank) is better than one with more at bats. The entire point of calculating the batting average is to remove the number of at bats from the analysis. Adding it back in does not clarify the information, it creates noise, period.

The same point holds for comparing the difference between GSP rank and per capita GSP rank. A state with a small population will get a high ranking differential compared to a large state with the same per capita GSP. This is telling us nothing.

No comments: