JOB TALK: Educational Measurement and Assessment
Item Response Theory and the Selection of an Educational Test Score Scale
Item Response Theory (IRT) is a psychometric modeling framework that supports almost all large-scale educational assessments, from state accountability tests to national and international assessments, including NAEP, TIMSS, and PISA. It is generally accepted that IRT methods do not guarantee "equal-interval" scales. Successive equal intervals on IRT score scales are not necessarily equal intervals on a desired scale for interpretation. This motivates two responses. First, an analyst can transform scores from the IRT scale to the desired scale for interpretation. I demonstrate that ordinal, nonparametric, and semiparametric methods are often equivalent to this response, and I recommend that a rarely used approach, normalizing score distributions, should be more widely embraced. Second, an analyst can address whether practical differences exist between interpretations from the IRT score scale, the desired score scale, and other plausible scales for interpretation. I present tools for this second response, and I argue that these tools can help to describe the "scale dependence" of educational research results, from the impact of interventions to the "value added" scores of teachers and schools.
