Our results have implications for policymakers currently debating ESEA reauthorization. The first is that the types of schools identified as being in the bottom 5% will vary dramatically depending on whether status, growth, or a combination is used. Status models identify middle schools serving more poor and minority students. While low-achieving, these schools are near the average in terms of proficiency growth. In contrast, growth models identify smaller schools that are demographically typical, perhaps suggesting they are mainly identifying random year-toyear fluctuations. A combined model –here the average of the standardized proficiency rate and API growth scores – identifies schools that are low-performing on achievement status and growth. These are the schools we most want to identify for improvement.
The second finding is that the stability of classifications in a growth or combined model is near zero if only one year of data is used. This suggests that, as with evaluating teachers’ contributions to student learning, year-to-year comparisons are noisy (Kane & Staiger, 2002). However, simple three-year averages of combined proficiency level and growth measures dramatically reduce this noisiness and still focus the PLAS on low-achieving, low-growing schools.
Third, the AGS criterion expands accountability to a different set of schools – schools with moderate and improving achievement but consistently large achievement gaps. These AGS are stable across time – nearly as stable as PLAS defined by status.
Fourth, the subgroup criterion for identifying the bottom 5% is mainly a measure of the performance of students with disabilities in schools that have a significant number of those students. This is likely not what lawmakers have in mind for this measure. This finding may highlight a tension between inclusion and universal accountability (Thurlow, 2004).
Last, elementary schools are favored over middle schools under all criteria, as they were under AYP. Unless we really believe that elementary schools are so dramatically better than middle schools, this finding speaks to a flaw in the proposed methods of identifying schools.
What is the argument against RIDE using three years of data to generate their classifications instead of just one? They've got the data. We paid a lot of money for it. Processor cycles are cheap.
Why don't they release these rankings retrospectively for the past decade? Again, they've got the data. Is it because once we'd see it we'd know the system was junk? For that matter, I hope they already made the retrospective calculations during the design process to see if the results worked the way they expected.