Research Stories

Report urges educators to avoid using international tests to make policy

Report says U.S. policymakers should focus less on international tests and more on how states compare to each other when trying to improve schools.

October 30, 2015

By Jonathan Rabinovitz

Policymakers should concentrate less on Finland and Korea and more on Massachusetts and Minnesota when drawing lessons about how best to improve school systems throughout the United States.

That’s the message of a new report by Stanford education professor Martin Carnoy and two colleagues that calls on U.S. educators to stop paying so much attention to the many nations who rank above it on international tests and instead delve deeply into results from the National Assessment of Educational Progress (NAEP), often referred to as the nation’s report card.

“We should question the relevance of comparing so-called U.S. national student performance with average scores in other countries, when U.S. students attend schools in 51 separate education systems run by states and the District of Columbia, not the federal government,” said Carnoy. “Nobody has really looked deeply at NAEP data with the idea of seeing what individual states can learn from each other.”

The report was published today by the Economic Policy Institute, a nonprofit think-tank in Washington D.C. The other authors are Emma Garcia, an EPI economist, and Tatiana Khavenson, a researcher at the Institute of Education at National Research University Higher School of Economics in Moscow

One of the report’s central tenets is that the United States need not look beyond its borders for models of what works in education; according to its analysis, certain states compare quite favorably with other nations on the two major international tests: the Program for International Student Assessment (PISA) and Trends in International Mathematics and Science Study (TIMSS). On the PISA in 2012, for instance, Massachusetts did as well as high-scoring Finland and Canada, even though the United States as a whole lagged much further behind them. The Massachusetts performance is boosted even higher when the report’s authors adjusted the scores for differences in family academic resources. (2012 was the first time that PISA generated scores for an individual state, with Massachusetts and two others participating.)

This report builds on a previous one from EPI (co-authored by Carnoy and another colleague before the 2012 PISA results) that called the PISA rankings skewed and misleading. That report documented how the United States ranked worse on the PISA because a disproportionately greater share of U.S. students comes from disadvantaged social class groups, whose performance is relatively low in every country. When Carnoy adjusted the scores so that every nation had the same composition, the United States rose to 13th from 25th in math and to sixth from 14th in reading.

In this new report, Carnoy and his colleagues say that the international test scores do have some value: The results show how the United States has room for improvement, particularly in mathematics. They note that South Korea’s average PISA scores remain much higher than the United States (and most other nations and states) even when demographics are taken into account.

But Carnoy added that South Korea is also used as an example in the new report of another reason why international comparisons may not be very helpful in evaluating different education systems. “Are the Korean students doing better because of their schools or because of tutoring and cram courses outside of school?” he said, referring to the widespread practice in South Korea of families investing significantly in extracurricular instruction for their children. “It’s not worthwhile to compare schools in countries where the conditions are so different.”

The report questions the U.S. Department of Education’s seeking policy recommendations from PISA’s administrator, the Organization for Economic Cooperation and Development, in 2011 and 2013. It urges U.S. policymakers to reconsider looking to East Asia and some of the European countries for lessons on how to improve education in the United States.

The report comes out two days after the latest NAEP results were released, presenting the first nationwide decline in math scores in 25 years. While the EPI report does not include these data — it stops at the 2013 NAEP test — Carnoy says that the new scores don’t affect the report’s larger point: It’s more fruitful to explore results from NAEP, which has been given in every state since 2003 as well as a large majority of them starting about a decade earlier, than the comparable international tests.

With NAEP, researchers can track and compare schools’ performance over time: For example, from 1992-2013, the annual increase in 8th grade mathematics scores, adjusted for different demographics, was 1.6 points per year for the top 10 gaining states. That’s double the 0.8 points per year increase (adjusted) of students in the bottom gaining 10 states. “Over 20 years, that advantage adds up to about one-half a standard deviation of the individual student variation, which is a huge difference in performance gain by typical educational improvement standards,” said Carnoy.

Because all 50 states and Washington D.C. take the NAEP, this large number of observations allows researchers to see whether state scores adjusted for student and school demographics can be correlated with such state level variables as spending per student, state-level poverty, teacher union strength and school accountability, among others. The report found the following correlations in its analysis of scores on the 8th-grade math test:

Students in states with more poverty are likely to have lower achievement on the test, whether they are poor or well off.
States that have implemented stronger accountability measures are likely to have achieved higher adjusted scores.
There does not appear to be a relationship between the adjusted average 8th grade math test score in each state and a state’s average expenditures per student in primary and secondary schooling.
No association emerged when examining the relationship between average student performance and the degree of teacher union presence in a state.

In addition to this parsing of all 50 states’ and the District of Columbia’s scores, the report shows how NAEP results could lend themselves to comparisons between two very similar states to isolate what policies may have made the difference. Connecticut and Massachusetts had nearly identical scores in 8th-grade math in 2003, but by 2013 Massachusetts’ score was much higher. New Jersey widened the gap with New York over the same period for the same test, as did Minnesota over Iowa.

“We’d like in the future to be able to give definitive explanations why one state excelled while another made little progress,” said Carnoy. “There’s much to be learned from the NAEP data if policymakers would spend more effort on assessing our own states’ successes and less on trying to draw lessons from countries with very different social and educational contexts.”