I recall once, for example, a Reading test asking fourth-graders about a passage they'd read about the human tongue and taste buds. One question asked the kids four distinct things (their favorite food, its flavor, where on the tongue that flavor was found, and how the taste buds work), with the original scoring rubric (established by classroom teachers) instructing the scorers to dole out one point for each of the four elements listed above. The teachers writing the rubric imagined straightforward answers like "my favorite food is popcorn, which is salty" (two points!) and "I like apples, a sweet taste found on the front of the tongue" (three points!), a scoring system that worked fine at least until theory turned into practice. Once it did -- once those intransigent schoolchildren started swamping us with all their unusual and unexpected answers -- then the scoring philosophy of those schoolteachers had to be laid to rest and the genius of the testing industry could be brought to bear.
The kids, you see, weren't just saying they liked to eat "apples" or that apples were "sweet." The kids were saying their favorite foods were "grass" and "water" and "Styrofoam," too, and even when they were identifying normal foods like "pizza" as a favorite they were then saying it was "salty," "sour," "bitter," and "sweet" (a.k.a. the entire spectrum of four flavors the human tongue can recognize). Furthermore, the students would often list a favorite food with what seemed an incorrect flavor ("my favorite food is ice cream, which is salty"), and then they would say they tasted that flavor on the tip of their tongue, which is not where one would taste "salty" (the side of the tongue) but is where one would taste ice cream, assuming it was sweet. The first couple hours of this scoring project, in other words, were pretty much total bedlam, massive disagreements within the group of employees I was training about whether "toothpaste" or "ice cubes" could be counted as favorite foods ("no" to the former and "yes" the latter), or "bitter" could be counted as the flavor of pizza (originally "no," at least until we considered toppings such as anchovies and artichokes, so then "yes").
Amid all that arguing ("I refuse to accept ice cubes as a favorite food!"), amid all that bickering ("no, I would not call pizza sweet even if there is pineapple on it!"), I realized I would have to do the same thing I always did. The only way I could ensure those 60,000, fourth-grade student responses were scored by my fifty temps in a standardized way was to establish scoring rules so firm, so rigid, so absolutely unyielding, that we would eliminate from the process any element of humanity.
It wasn't so hard. I did so first by making an exhaustive list of anything that could be counted as an acceptable favorite food (pizza, popcorn, Kool-Aid, water, salt, grass, Gummi worms, etc.) and anything that could not (dirt, plastic straws, real worms, beer, wine, etc.). Then I established that any flavor a student identified would be accepted in conjunction with any favorite food. Ergo, a student identifying "pizza" as a favorite food would be credited for saying it was "salty" (of course), but also for saying it was "sweet" (the pineapple?), "sour" (anchovies, onions, etc.), or "bitter" (anchovies, onions, etc.). Enough kids said that ice cream tasted salty (pistachios?) or sour (lemon sherbet?), and enough kids said that potatoes were salty (uh-huh) or sweet (sweet potatoes?) or sour and bitter (sour cream?), that ultimately I decided we just had to accept 'em all, adult logic be damned. I was also pretty lenient on how the group should award credit regarding the location of the four basic flavors on the tongue, ultimately deciding to accept answers both when the kids identified the correct placement of the flavors (sweet in the front, salty on the side, etc.) and when they did not (sweet on the side, if talking about popcorn, which really would have been tasted on the side, it being salty and all...)
To reiterate, teachers and ex-teachers made bad standardized test scorers because they actually gave a damn about the students, while my scoring projects were usually better served by people who cared a little less. Ironically, that means if test scores do end up being used to evaluate the jobs being done by American teachers, those people who "cared a little less" will end up assessing the jobs being done by those classroom teachers who really are invested in American education.