Last week, Barbara Chow, the director of the education program at the Hewlett Foundation explained to a meeting of grantees why the foundation was investing in research concerning Automated Essay Score Predictors as part of their strategy of expanding opportunities for Deeper Learning in schools. (Disclosure: I run a Hewlett-funded research project, and Hewlett has indirectly paid me a salary for four years, though Harvard is my direct employer. That said, when I had a chance to speak for 15 minutes at the Grantee meeting, I devoted the entire time to explaining how their Open Educational Resources grantmaking program could potentially be expanding educational inequalities. So there is some evidence that I try to call it as I see it.) Again, it's the kind of argument that raises eyebrows. "If we replace human essay raters with machines, students will have a richer learning experience." Oh, really?
First point: there are two consortia (PARCC and SBAC) developing new tests for the Common Core Standards. In 2014 or 2015, we're going to have some brand new tests in states all across the country. We have an opportunity to make them better. Here's how Barbara makes the case that Automated Essay Score Predictors can do that
Here is an example of a test question from the AP US History test (2006 Released Exam):
Which of the following colonies required each community of 50 or more families to provide a teacher of reading and writing?
E. Rhode Island
Now, this is the kind of question that makes most educators go berserk. A student can have a deep, rich understanding of early American history and not know that factoid. So what if we could replace questions like that, with questions like this (thanks to College Board for sharing):
By the early twentieth century, the United States had emerged as a world power. Historians have proposed various dates for the beginning of this process, including the three listed below. Choose one of the three dates below or choose one of your own, and write a paragraph explaining why this date best marks the beginning of the United States' emergence as a world power. Write a second paragraph explaining why you did not choose the other dates. Support your argument with appropriate evidence.
- 1898 (Spanish-American War)
- 1917 (Entry into the First World War)
- 1941 (Entry into the Second World War)
I have some quibbles, but this is a much, much better question. The question calls upon several skills broadly identified with deeper learning: solving an ill structured problem—one without a correct answer and requiring tacit knowledge—and communicating that answer in a persuasive, evidence-based argument.
The problem with this is that the computer can score this for structure -- for having the form of an academic argument based on evidence -- but it does not know the whole scope of the problem domain -- it can't know all the evidence and facts relating to the issue, so a savvy student can just make stuff up. So... for example:
Among naval historians, the standard definition of a "world power" is one that can maintain two high seas fleets in two separate oceans indefinitely, while retaining a significant reserve and shipbuilding capacity. In Michael Doyle's authoritative history of the US Navy, Quahogs and Other Submersibles, a Rumination, until 1923 and the launch of the US 5th Fleet in the Pacific the US could only maintain one high-seas fleet. Therefore, in 1923, the US became a naval "world power," a status it maintains to this day.
I chose this date because I am a naval historian, and I am applying the standard definition of "world power" as used by naval historians.
If you're my teacher in an actual class -- and in particular you know what a clever smartass I can be -- you'll see through this in a second. If you are a computer, what's it going to do, check my facts? All the computer can say is that this has more or less has the form of an academic response including evidence.
You can also note that temps scoring stacks of these things in data centers can't really score this kind of thing accurately either, but that's an argument against high stakes standardized tests in general, not in favor of computer scoring.
This is why the Common Core standards are so strict about keeping everything limited to textual analysis and evidence from texts. It is partly a revival of New Criticism, but mostly it is to make sure that the valid answers to the question is constrained to bits of text that can be identified by a computer.