Tuesday, August 03, 2010

Shorter Pallas/Hess

Aaron Pallas:

The way DCPS tells teachers it assesses their "value-added" contribution is idiotic!

Rick Hess:

If Aaron Pallas doesn't know that the actual assessment is completely different than what is described to teachers, he's an idiot!

Or, to go back to an earlier post by Pallas on the subject:

What is troubling to me is that, to date, districts using these complex value-added systems to evaluate teacher performance haven’t made the methodologies known to the general public.

Hess's "rebuttal" only confirms Pallas's actual point. Pallas set a rhetorical trap, and Hess fell right in.

19 comments:

NYC Educator said...

It's kind of Hess's job, though, to argue with facts on the ground so as to make it easier for Bill Gates and Wal-Mart to turn this into the kind of country that fits their grand vision. You have to twist logic a little when you're pushing a system that benefits less than 1% of the population.

Jason said...

Actually, what Pallas wrote was, "The way DCPS calculates value-added is idiotic," or at least, that's how I'd characterize it.

What Hess said, "Pallas didn't do the minimum due diligence to contact them to ask them how they did it."

This vertical scaling issue exists in many, many states and there are materials all over the web which explains the NCEIA model for growth that many states have adopted/slightly modified (Massachusetts and RI included).

This is a common problem and there's one solution that has become rather common. Rather than assume that there is minimum awareness of what's going on in the field (which Pallas may not have), Pallas should have checked his facts before publishing in WaPo.

Tom Hoffman said...

Well, no, Jason, literally Pallas said:

"the procedures described in the DCPS IMPACT Guidebook for producing a value-added score are idiotic."

The wording of Pallas's article is quite precise. If it is too subtle for you and Hess, that's not his fault.

Jason said...

So you think it's ok, as an expert, to write a sensationalist article that details all the problems with "the procedures described in the ... Guidebook", even if these are not the procedures carried out to make the decisions?

Pallas may have made some slight disclaimers on "well maybe they did a better job" but his tone was clearly accusing DCPS of being idiots and totally botching this whole process which itself was a bad idea.

In fact, pages 10 and 11 of the IMPACT Guidebook for these teachers specifically explains how the error he mentioned is NOT what they do, albeit not in the clearest way possible. Pallas' argument is based on the idea that an individual student has their two scaled scores subtracted. What actually happens in DC is that they predict what the new scaled score would be, taking into account demographics and previous testing scores, for students in a particular class. Their value-added is the difference between the predicted class average and the actual class average.

What's confusing in the DCPS report is that they use the term "growth" several times as a synonym to "value-added" when it is not. They are correctly calculating value-added scores, however, and this is obvious to anyone with a bit of knowledge about these things reading the Guidebook.

Honestly, before writing this comment I hadn't looked at the Guidebook. Halfway through when I looked it up, I'm now even more furious with Pallas' article because the error that DCPS made is clear and does not undermine the technical use of these scores at all.

Tom Hoffman said...

Jason,

You are so completely and shamelessly full of shit. Both the actual and predicted value-add scores are described exactly as Pallas says, by subtracting one score from the other. It is RIGHT IN FRONT OF YOU.

Jason said...

If your predicted score was 546 and you score a 536, you'd be correct in saying that your value-added was -10. If your predicted score is 556, you'd be correct in saying your value-added score was 10.

Though they unnecessarily choose to include the whole range, because of the use of a predicted score it makes no difference.

Let me break it down in 6th grade math for you:

You have an initial score (i), a predicted score (i+p), and an actual score (i+a).

The value-added is the actual - predicted, which is:
(i+a)-(i+p)=a-p

So in the end, you're just comparing the actual score to the predicted score. Pallas argued that you were wrongly comparing the ACTUAL TO THE INITIAL. "If a fifth-grader received a scaled score of 535 in math and a score of 448 on the fourth-grade test the previous year, his actual gain would be calculated as 87 points."

This is wrong. They instead have a score they predict for each teacher's class (based on the initial scores for those students and their demographics) which is in the 500 range. Then they compare the 535 number to predicted number on the same scale.

Of course, Pallas is also completely wrong in saying that "The scaling approach taken by the DC CAS is, to my mind, pretty unconventional, because the scaled scores do not overlap across grades." This is just not true and hasn't been since just before NCLB. Many tests are "vertically-moderated" and there are a ton of issues with vertical scaling in an accurate way. I recommend reading this publication by the National Center for Improving Education Assessment (http://www.nciea.org/publications/MeasuringGrowthMCASTests_CD06.pdf)

Tom, you are guilty of something we all are on this one-- typing faster than you're thinking.

Tom Hoffman said...
This comment has been removed by the author.
Tom Hoffman said...

None of that removes subtracting last year's score from this year's score as the determination of actual growth. Pallas's point is that if this is done as described, there is no reason to think that that number is valid.

The fact that this data is then interpreted in the light of the projected growth does not necessarily make the calculation of actual growth more valid, if a ten point increase in actual scores does not have a consistent meaning across the range of scores.

Jason said...

You're still not getting it, because this, "None of that removes subtracting last year's score from this year's score as the determination of actual growth. " is not what they're doing, even though they show a picture that could make you think that. By including their estimate/projection, they are essentially subtracting your actual score in year 1 with the score you're projected to get in year 1.

Within each year, if the CAPT operates like the MCAS, another vertically-moderated test that's in the line I shared, 10 points means the same thing across all ranges. So if you're comparing 746 to a projected score of 736 that's the same as a 766 when you projected a 756.

The error is simply not there. I admit that the graph is confusing because it unnecessarily includes the gap between the initial and the actual in both the projected and actual scores (which is why the initial score cancels out). This was their heuristic to explain to teachers that they account for student scores in year 0 when devising a projection. I can see why it might be confusing, but if you think about it for just a minute, the mistake Pallas claims is there simply isn't. He wrote his paper based on a knee jerk reaction and he missed the mark, plain and simple.

The fact that you think I'm shamelessly full of shit is simply hilarious. You read this article by Pallas and had your mind made up before you knew a thing about the content-- if they're using value-added it must be bad. But the facts are that Pallas made a mistake, the error he claims exists does not exist, and DCPS is only at fault for trying to simplify their model and instead creating a point of potential confusion.

So long as you're comparing scores from within the same year, the Pallas Error does not exist. DCPS is either comparing actual to predicted (which was built off previous year), or, if they are using the system as they showed in the picture, they are including the initial score in both calculations so that the mathematical effect is excluding the initial score. Either way Pallas is wrong.

Jason said...

FWIW, there is a big difference between value-added and growth, and I think that their use of both terms is misleading and your continued use of growth shows that you're falling into the trap they shouldn't have set.

They have a value-added metric here, so they're estimating what your score should be in year 1 based on what your score was in year 0 and comparing the difference between the actual and estimated year 1 scores. This shows how much of an effect a teacher had that is atypical. The question is, can we isolate the teacher's effect that resulted in a difference from our projection.

How well they project is not the debate Pallas is having. He's claiming they calculated the VAM wrong, but they didn't. They took the difference between estimate and actual. Even if they took the difference between the initial and the estimate and subtracted the difference between the actual and the initial, simple 7th grade math will reveal that's the exact same thing as subtracting the estimate from the actual because your initial scores are equal.

Tom Hoffman said...

"Within each year, if the CAPT operates like the MCAS, another vertically-moderated test that's in the line I shared, 10 points means the same thing across all ranges. So if you're comparing 746 to a projected score of 736 that's the same as a 766 when you projected a 756."

Is it?

If a student has a 302 in third grade year and a 400 next year, they might have learned nothing whatsoever but only lost two points relative to the year before.

A student scoring 395 in third grade and 499 in fourth might have learned calculus and only gained four.

Jason said...

You're still making the same exact mistake.

Here's how it works:

If a student has a 302 in third grade, that score, along with demographic information and various other adjusters, is used to create an estimated score of 410 the next year. This is the expectation value for the next years score when we use all sorts of factors to predict where you end up.

The next year you score a 415.

Your value-added score is 5-- the difference between your expected score and your actual score. It's 5 because 415-410= 5. The difference between actual and expected.

If you look at DCPS, the math they do is 410-302 = 108 and 415-302= 113 and then subtract those two and get 5.

It's the same thing because the 302, which is the actual initial performance, is not different for the estimate or actual.

IF they were solely saying "Your value added is 113 because 415-302=113 so your score is 113," they would be talking about nonsense. The actual information is the gap between the estimate and the actual. Since they are subtracting the "actual gap" from the "estimated gap" they're not comparing, in your example, a 98 for someone who scored 400 after a 302 and a 104 for someone who scored a 499 versus 395.

They're comparing 400 to the estimate of what the score would have been in the 400 scale based on where they were last year and their demographics. They're comparing the 499 to the estimate of what the score would have been in the 400 scale based on where they were last year and their demographics.

So it would be saying what's 400 relative to an estimate of 410 (-10) and a 499 relative to an estimate of 485 (+14).

Tom Hoffman said...

Let's say you have twins, who in third grade score 302. One learns exactly nothing in fourth grade math; the second gains exactly one year's worth of achievement as measured by DCAS.

The first might get, say, 401 or 400. The second should get 402, right? Isn't a full grade level's difference crammed into one or two points on the scale?

Whether their predicted score is 402, 410 or 420 doesn't change the fact that you have one or two points representing a disproportionate amount of learning.

Jason said...

"The first might get, say, 401 or 400. The second should get 402, right? Isn't a full grade level's difference crammed into one or two points on the scale?"

This is completely wrong and a total misunderstanding of vertically-moderated scales. Please read the NCEIA report I posted previously (http://www.nciea.org/publications/
MeasuringGrowthMCASTests_CD06.pdf).

Each scale is set independently, and learning "one grade level" of material is not translated to a 100 point gain at all. Scoring a 400, 401, or 402 are all essentially representative of not knowing third-grade level material at all (since 4 means testing grade 4, so teaching grade 3).

You are trying to treat a vertically-moderated scale like a vertical-scale, precisely what DCPS doesn't do and precisely what would be wrong to assume/do.

The DCPS technique uses regressions with lots of controls to predict what your score will be on next year's test. One of those controls is the previous year's score. Once they have an estimate of where you should be, your actual score represents growth which is unexpected and can be attributed to the teacher (the way the regression is designed). The scale doesn't matter for producing the regression because the statistics are looking at what kind of scores are typical in year 1 for students who got score X in year 0 (accounting for all the other controls).

They are only ever comparing scores within the same scale, and within the 400-499 range are normalized scores of students set totally independent of the meaning of scores between 300-399 but totally consistent within the 400-499 range.

The compression your talking about doesn't exist.

I'm not sure how else to explain it other than to recommend reading about vertically-moderated scaling.

Jason said...

FWIW, if a student learned all of the third grade material they should score a 499 (or something on the high ends of the 4) and someone who knows none of the third grade material will score down in the 400 area. So in your example one twin would have scored 485 (remarkable to learn all of the third grade material without knowing any second grade material) and the other twin would earn a 402 (essentially not knowing any third grade material).

The estimate would be something like 425 for each student. So the student who learned nothing would have a -23 and the student who learned a remarkable amount would be +60.

Tom Hoffman said...

So, I have twins. At the end of the year, both have a score of 302. The model predicts that one year of value added growth would get them to 407 at the next year.

One actually learns one year worth of stuff. The other learns nothing. What should their scores be?

Tom Hoffman said...

No, that's not what I'm asking. You have two students who start the year more or less a year behind. One finishes still a year behind having progressed one year, one finishes two years behind having progressed not at all. What are their scores?

Jason said...

That's a separate question than the technical issue that Pallas raises.

If we can move on from Pallas being wrong about the "subtraction" issue, I'll gladly talk about this new one.

For what it's worth, the issue you're now talking about (how far out of grade-level can these tests distinguish students) is complex and varies from test construction to test construction. Using a vertically-moderated scale does not mean that you can't differentiate for students who are well below grade level, but the extent to which this is possible is different for each test. As you can imagine, the further you are from grade level the less precision the exam has. However, since the estimates of someone who's in the pits is still going to be that they're gonna be in the pits, it won't actually harm the teacher. It just won't help them when they're bringing students from 3rd grade reading to 5th grade if they're in high school.

That being said, I know for a fact that in practice there are rarely scaled scores that are on either extreme-- you never see a 400 or 402 or a 497 or 499. This suggest to me that the NECAP at least is reasonably sensitive to out of grade level testing. The other issue here is that by law states are no longer allowed to give students tests which are not on grade level for their state exams. By federal law, our hands are tied, but most LEAs and certainly RIDE is aware that NECAPs has limits when looking at students performing very far below grade level which is why they all tend to use alternative information in addition to NECAP to have a finer look at what's happening.

Tom Hoffman said...

It seems to me that it is exactly the same issue. If the tests get "less precise" when you're too far above or below grade level and things become "complex" that's exactly Pallas's point. That's what happens with these kinds of scales, which is why you can't just subtract, which is why it would be idiotic if DC did what they said.