Skip to content Skip to navigation

Photo-closeup of a Hoover Tower top

Question format may impact how boys and girls score on standardized tests, Stanford study finds

March 29, 2018
Stanford study finds question format may impact how boys and girls score on standardized tests (Photo: FatCamera/iStock)
Stanford study finds question format may impact how boys and girls score on standardized tests (Photo: FatCamera/iStock)
Researchers say the evidence has implications for test developers and policymakers.

A new study by Stanford education scholars finds that girls perform better on standardized tests that have more open-ended questions while boys score higher when the tests include more multiple-choice.

The study, published online on March 27 in Educational Researcher, a peer-reviewed journal of the American Educational Research Association (AERA), said the test format explained about 25 percent of the variation in state- and district-level gender achievement gaps in the U.S.

The association appeared stronger in English Language Arts than in math, the researchers said, but the differences were not statistically significant.

A study snapshot published by AERA is excerpted here and summarizes the findings. 

The researchers said test developers and educators will need to attend more carefully to the mix of item types and the multidimensional sets of skills measured by tests. Policymakers, too, will need to be aware of how states’ use of different test formats or emphases on different skills may influence cross-state comparisons of gender gaps and funding decisions based on those results.

“The evidence that how male and female students are tested changes the perception of their relative ability in both math and ELA suggests that we must be concerned with questions of test fairness and validity,” said Sean Reardon, the professor of Poverty and Inequality in Education at Stanford Graduate School of Education and a senior fellow at Stanford Institute for Economic Policy Research (SIEPR). “Does the assessment measure the intended skills? Does it produce consistent scores for different student subgroups? Is the assessment appropriate for its intended use?”

The researchers used scores of roughly 8 million students tested in fourth and eighth grades in math and reading/ELA in 47 states during the 2008–09 school year to estimate state- and district-level subject-specific achievement gaps on each state’s accountability tests.

With Reardon, the study’s co-authors are Demetra Kalogrides, Erin M. Fahle and Rosalía C. Zárate of Stanford and Anne Podolsky of the Learning Policy Institute.

The study, “The Relationship Between Test Item Format and Gender Achievement Gaps on Math and ELA Tests in Fourth and Eighth Grades,” is here.