Daniel Koretz teaches educational measurement at Harvard University's Graduate School of Education which means he knows a thing or two about standardized tests. He was also a member of a technical committee that led New York State to decide its old exams were too easy, and that the state needed to raise its standards. In an interview, excerpted here, he offered suggestions for how to interpret the results.
Education historian Diane Ravitch wrote a column in which she said it wasn't fair to make these new exams similar to the national tests (National Assessment of Educational Progress), because if you are proficient on NAEP, that's really like getting an A. Their proficiency standard is so high. Do you needed to get the equivalent of an A to be proficient on this test instead of a B or C?
That brings up the first thing that I think is confused in some of the press, and this is something I actually specifically alerted [State Education Commissioner] John King to be worried about - which is that the performance levels we see now are different than in the past for two entirely different reasons.
One is that the test is different. Kids are asked to do things they were not asked to do before. And it's quite common when a test is changed, when a testing program is changed, the performance is low the first year or two.
The second is completely different, which is that the standards have been set again. And one of these things, this is not a New York problem this is a national problem, is everybody now reports performance in terms of what we technically call percent above cut. The proportion proficient, percent proficient or above. It is, in my opinion, an absolutely dreadful way to report performance. For a lot of reasons. One is that it confounds actual changes in performance with how a group of judges sitting in a room sets standards. So the standards were set from scratch this time. And the definitions of performance that the judges were given were necessarily different from what they were in the past because they're supposed to be thinking about college and career readiness - a different definition of proficiency.
So even if the tests were identical, the percent proficient could be quite different. And we don't have enough data to separate the impact of the change in standard-setting from the change in the test itself.
You are distinguishing two different things: it's not that the tests were that much harder, it's that they were much harder to pass by what students need to be proficient?
It's even more complicated than that. Because let's say you had two versions of this new test. One written by company A and one written by company B, and they differed only in that the mix of item difficulties was different. The items, if you looked at them, looked like the same content but my test happened to be easier than your test.
If the standard-setting method which was used in New York - which is the most commonly used (nationally) - works properly, and you and I are both marginally proficient kids, we should both be selected as marginally proficient regardless of which test, which of those two tests, were given.
The critical point is the difficulty of the test. If you mean by that what proportion of kids get items right (which is how most people think of difficulty) it doesn't tell you where the standards are going to be set. The same kid should be identified as proficient regardless of the mix of item difficulties. This is really hard for people who are not technically immersed in the stuff to understand. But the bottom line is that when I say the tests may have been harder, I don't mean the typical item was answered correctly by fewer people. I mean more of the content was novel for kids.
There are three different things going into the performance level of kids. This is a nightmare for laypeople who are trying to make sense of the change and it's not New York problem, it's the way it happens everywhere. It's exactly what happened in Kentucky for example.
So back to Diane Ravitch's point, was this comparable to saying that in order to be proficient you needed to get an A instead of the B or a C?
What's an A?
What do we really mean by college and career readiness? Is the level of preparation and mathematics that will allow someone to take a job in a high-tech machining plant in Saratoga Springs at all similar to the math a kid needs to be able to apply to MIT for RPI? That's a discussion that I would like to see more of, or one of them.
I think there's a real argument about which skills are important for college and career readiness and there's also a question of keeping kids' options open. So you may say, this kid is going to go to a job in such and such a field, he doesn't need the following mathematics, but do you want that door closed when he's in grade 8? Or do you want to keep the door open? So New York is like every other state in the country, it is in a sense stuck with a phrase [college and career readiness] that has very ambiguous meaning.
So how do you think we should interpret these results, when we see only 31 percent of students in New York State were proficient in math and English Language Arts, and even fewer in New York City?
What I suggest is as we get more results, more than percent proficient, as we begin to learn more about how kids did on specific content in the test, and how different kinds of kids did on specific content, then I think people ought to start debating that, and what we want our kids to do and what does this suggest about what we need to change? Because some of the changes in the Common Core are clearly for the better. Some of the changes that New York wants to make are clearly for the better. So to the extent that the changes are for the better, what do we need to make sure the level of performance is higher in the future than it is now? What do we need to change about teaching and other aspects of the educational system?
You're saying we still have a big debate ahead of us. So I'm wondering was it too soon for New York and Kentucky to have the state exams? Are we still in the process of the conversation that needs to be settled a little bit more before we start testing our kids on this?
Well, researchers always ask for more time!
If you go back even more than half a century, in the measurement literature you find people bemoaning the fact that we always really are stuck with assessing curriculum that people devise. And we rarely have the ability or leisure to track kids for 10 or 15 years and see what they're actually doing with stuff later on. So I think, in general, I think the push is in the right direction. Further details I think we should have more debate about, mostly at the high school level.