Research Stories

Test-score inflation can boost graduation rates but comes with consequences, Stanford study finds

New study reveals why teachers inflated student test scores on New York's high school exit exams.

April 13, 2016

By Edmund L. Andrews

Six years ago, a team of educational researchers shocked New York state with clear statistical evidence of widespread manipulation of test scores on the high school exit exams, or Regents Examinations.

The analysis, which formed the basis for an investigative report in the Wall Street Journal and sparked major reforms by New York state, showed that test graders were artificially lifting the scores for 40 percent of the students who had fallen just short of passing.

Students who took the English exam, for example, were five times as likely to receive the minimum passing score of 65 than just one point lower. Students who took the history exam were 14 times as likely to get a 65 than a 64.

The original research team included Thomas S. Dee, now of Stanford Graduate School of Education; Brian A. Jacob at the University of Michigan; and Jonah Rockoff of Columbia University.

Now, Dee and his colleagues, which now include Will Dobbie of Princeton University, have updated the data and completed a deeper study that sheds important new light on the motivations and consequences of test-score manipulation. It also confirmed the dramatic success of reforms that the group proposed back in 2011.

Among their findings, published April 11 by the National Bureau of Economic Research:

*The urge to nudge scores upward had nothing to do with incentives and penalties, such as those under the No Child Left Behind law, that increase the pressure of schools to deliver better results. The patterns before and after No Child Left Behind were essentially the same.

*The primary motivation seems to have been “altruistic,” in Dee’s words: many test graders wanted to spare students they knew from the consequences of failing to graduate, particularly those with a prior record of high achievement and good behavior.

*The manipulation of test scores was more prevalent in schools with largely African-American and Latino student populations. Indeed, it artificially narrowed the black-white gap in graduation rates. Had there been no manipulation, the researchers estimated, the gap would have been 5 percent wider.

*Two reforms after 2011 – prohibiting teachers from grading students in their own schools, and prohibiting graders from re-scoring tests that of students who came in just below the thresholds – eliminated virtually all of the manipulation.

“We think this manipulation was mainly to help students avoid the risk of dropping out, rather than a response to the school or teacher-specific incentives created by accountability systems or incentive pay,” said Dee, faculty director at Stanford Center for Education Policy Analysis and a senior fellow at the Stanford Institute for Economic Policy Research.

However, Dee noted, “We also find that raising a student’s score above the threshold required for high-school graduation had unintended negative consequences for some students.”

Specifically, they find that this manipulation reduces the probability a student will meet the requirement for an advanced Regents diploma. “This is consistent with the hypothesis that students who just pass the exam don’t reinforce their knowledge of foundational material that supports more advanced coursework,” Dee said.

The manipulation in New York contrasts sharply with the testing scandals in Atlanta, but it offers its own cautionary tale for many school systems around the nation.

In Atlanta, teachers were under intense pressure from the school system and from principals to boost their school’s results. As a result, teachers in a number of struggling schools systematically changed students’ answers on multiple-choice exams.

In New York, by contrast, the shading occurred in more open-ended questions that require written answers.

The high-minded purpose of such questions is to better evaluate students’ ability to reason, to write and to understand complex material. The problem is that grading such answers is more subjective, and thus easier to shade in one direction or another.

The new study had unusual origins. In the wake of testing scandals in Atlanta, reporters for the Wall Street Journal used New York’s Freedom of Information Law to obtain Regent examination test scores over the course of nearly a decade. They then asked the university scholars to analyze the results and see if they showed the kind of statistical anomaly that would indicate manipulation.

The researchers found strikingly high spikes in the frequency of test scores just above crucial cut-off levels. They found that the amount of manipulation varied widely between schools, and that it was higher at schools with high black and Hispanic populations. They also found that students with lower baseline scores and worse behavioral records were more likely to have the scores nudged over the passing level.

The manipulation had a measurable impact in the real world: it increased a student’s probability of graduating from high school by 21.9 percentage points. As a result, the researchers estimated that the manipulation of test scores narrowed the black-white gap in graduation rate from 16.3 percentage points to 15.6 percentage points.

The most encouraging finding from the latest analysis, however, was the spectacular success of two fairly simple reforms that the researchers proposed back in 2011.

The first reform was to prohibit teachers from re-scoring tests of students who were just below the proficiency cut-off level. Before 2011, schools were actually required to re-score any test that looked like a close call. The second reform was to prohibit teachers from grading students in their own schools – students whom a teacher might know personally and want to help.

When the researchers went back and analyzed the Regents score data after 2011, they found that almost all of the signs of manipulation had disappeared.

“It’s probably the most concrete impact any of my research has ever had,” Dee remarked.

The broader message of the new study, Dee said, is to address challenges of objectively grading examinations that require written answers to complex questions. Under the Common Core State Standards, school systems are beginning to make heavier use of such exams, and for good reason, but school districts need to make sure that their grading procedures don’t inadvertently encourage cheating.