Tuesday, March 26, 2013

If 11th Grade NECAP Math was Aligned with Other Assessments, It Would Be... A Lot Easier

This is from the Explain To Me Why My Interpretation Is Incorrect file.

So Tom Sgouros finally dragged me into the weeds of the NECAP technical documentation, and I found an interesting passage regarding the process of setting the cut-points for NECAP 11th grade math Achievement Level Descriptions. This is from the Collection and Analysis of Existing Performance Data section on page 311 of the 2007-08 NECAP Technical Report. the "ordered item booklet" has test items in order from easiest to most difficult.

Existing Test Data. Two categories of existing test data were examined: 1) fall 2007 scores in grades 6 through 8 and 2) historical performance on other high school-level tests (for example, NAEP).

For reading, starting cut-points were calculated from the existing test data as follows: the pattern of performance on the fall 2007 NECAP reading tests in grades 6, 7, and 8, was determined (specifically, the percentage of students in each achievement level category). Predicted grade 11 scores were then calculated by extrapolation. The resulting cuts were found to be in line with other high school-level testing data and to represent reasonable starting points. Therefore, they were adopted as starting cuts for standard setting. The starting cuts were presented to panelists as placements in the ordered item booklet (see below for complete details), and panelists were asked to either validate the placements or recommend modifications.

For mathematics, potential starting cuts were calculated in the same way as for reading, but were not used for standard setting. The purposes of using starting cuts are to streamline and simplify the standard-setting process and to make use of any other relevant sources of available information. However, the grade 11 mathematics test was quite difficult for the students, and the extrapolated starting placements for the lower two cuts appeared very early in the ordered item booklet (specifically, between ordered items 1 and 2 and between ordered items 6 and 7). This anomaly suggested that differences between the grade 11 mathematics test and the previously existing data rendered the use of those data, and the resulting cuts, inappropriate. In addition, it was feared that the use of such low starting cuts would complicate the process for the panelists and possibly impact the validity of the results negatively. For these reasons, a standard-setting, rather than a standards- validation, approach was adopted for mathematics.

Let me emphasize that:

the extrapolated starting placements for the lower two cuts appeared very early in the ordered item booklet (specifically, between ordered items 1 and 2 and between ordered items 6 and 7)

So based on extrapolation of performance on other comparable tests, the cut-point between scoring a "1" and "2" on the 11th grade NECAP math (aka, not-graduating or graduating) would have come after answering two questions correctly. And scoring "3" -- proficient -- would have required seven correct answers out of 40 or more questions.

Later... OK, here's why I struck that paragraph and changed the title (quoting the instructions for the cut-point setting process):

  • What you need to know is that the ordered item cut point for a given cut does not equal the raw score a student must obtain to be categorized into the higher achievement level
  • For example, if the Substantially Below Proficient/Partially Proficient cut is set between ordered items 3 and 3, this does not mean that a student only need sot get 4 points on the test in order to be classified into the Partially Proficient level

There is a reason I don't plunge this deep into the weeds if I can avoid it.

Nonetheless, the fact remains that the cut-points extrapolated from NAEP and other NECAP scores were so low compared to the difficulty of the items in the ordered item booklet that NECAP chose to not show them at all.

Pointing that out to panelists certainly would have "complicated the process" of setting much higher cut points. Not just high in comparison to other slacker, racing-to-the-bottom states. Higher in comparison to other NECAP math tests and NAEP.

2 comments:

jason said...

"the ordered item cut point for a given cut does not equal the raw score a student must obtain to be categorized into the higher achievement level"

Then how is the raw score for categorization calculated?

Tom Hoffman said...

Oh... I don't know. It isn't clear to me that the ordered items correspond exactly to the items that would be on a single test.