Journal of Applied Measurement

A publication of the Department of Educational Psychology and Counseling
National Taiwan Normal University

Journal of Applied Measurement Abstracts
Volume 22, Issue 1/2 (2021)

Effects of Item Misfit on Proficiency Estimates Under the Rasch Model

Chunyan Liu
Peter Baldwin
Raja Subhiyah
National Board of Medical Examiners, Philadelphia

When IRT parameter estimates are used to make inferences about examinee performance, assessment of model-data fit is an important consideration. Although many studies have examined the effects of violating IRT model assumptions, relatively few have focused on the effects of violating the equal discrimination assumption on examinee proficiency estimation conditional on true proficiency under the Rasch model. The findings of this simulation study suggest that systematic item misfit due to violating this assumption can have noticeable effects on proficiency estimation, especially for candidates with relatively high or low proficiency. We also consider the consequences of misfit for examinee classification and show that while the effects on overall classification (e.g., pass/fail) rates are generally very small, false-negative and false-positive rates can still be affected in important ways.

Keywords: Item misfit, equal discrimination, proficiency estimate, classification error

Citation:
Liu, C., Baldwin, P., & Subhiyah, R. (2021). Effects of item misfit on proficiency estimates under the Rasch model. Journal of Applied Measurement, 22(1/2), 1–12.

Examining Rating Designs with Cross-Classification Multilevel Rasch Models

Jue Wang
The University of Miami
Zhenqiu Lu
George Engelhard Jr.
Allan S. Cohen
The University of Georgia

The scoring of rater-mediated assessments largely relies on human raters, and their ratings empirically reflect student proficiency of a specific skill. Incomplete rating designs are common in operational scoring procedures because raters do not typically score all student performances. The cross-classification mixed-effect models can be used for examining data with a complex structure. By incorporating Rasch measurement models into the multilevel models, the cross-classification multilevel Rasch model (CCM-RM) can examine both students and raters on a single latent continuum, and also examine random effects for higher-level units. In addition, the CCM-RM provides flexibilities for modeling characteristics of raters and features of student performances. This study investigates the effect of different rating designs on the estimation accuracy of CCM-RM with consideration of sample sizes and variances of rater through a simulation study. We also illustrate the use of CCM-RM for evaluating rater accuracy with different rating designs on data from a statewide writing assessment.

Keywords: Rater-mediated assessments, rating designs, cross-classification, multilevel Rasch models, rating accuracy

Citation:
Wang, J., Lu, Z., Engelhard, G., Jr., & Cohen, A. S. (2021). Examining rating designs with cross-classification multilevel Rasch models. Journal of Applied Measurement, 22(1/2), 13–34.

Measuring the Complexity of Equity-Centered Teaching Practice Development and Validation of a Rasch/Guttman Scenario Scale

Wen-Chia C. Chang
International Coalition for Multilingual Education and Equity, University of Nebraska, Lincoln

The Teaching Equity Enactment Scenario (TEES) Scale was developed to measure the complexity of teaching practice for equity by integrating Rasch measurement and Guttman facet theory. This paper extends the existing work to develop and validate an efficient, short-form TEES Scale that can be used for research and evaluation purposes. The Rasch rating scale model is used to analyze the responses of 354 teachers across the United States. Validity evidence, which addresses the data/theory alignment, item and person fit, rating scale functioning, dimensionality, generalizability, and relations to external variables, is examined to support the adequacy and appropriateness of the proposed score interpretations and uses. The short-form TEES Scale functions well to measure teaching practice for equity and provides evidence for research or evaluation studies on whether and to what extent teachers or candidates learn to enact equity-centered practice. Limitations and future directions of the scale are discussed.

Keywords: Rasch measurement, validation, scale reduction, teaching, equity

Citation:
Chang, W.-C. C. (2021). Measuring the complexity of equity-centered teaching practice: Development and validation of a Rasch/Guttman scenario scale. Journal of Applied Measurement, 22(1/2), 35–59.

Tracing Morals: Reconstructing the Moral Foundations Questionnaire in New Zealand and Sweden Using Mokken Scale Analysis and Optimal Scaling Procedure

Erik Forsberg
Department of Psychology, Division of Personality, Social and Developmental
Psychology, Stockholm University
Anders Sjöberg
Department of Psychology, Division of Work and Organizational Psychology,
Stockholm University

The Moral Foundations Questionnaire, consisting of the Relevance subscale and the Judgment subscale, was constructed using the framework of classical test theory for the purpose of measuring five moral foundations. However, so far, no study has investigated the latent properties of the questionnaire. Two independent samples, one from the New Zealand Attitudes and Values Study (N = 3989), and one nationally representative sample from Sweden (N = 1004), were analyzed using Mokken scale analysis and optimal scaling procedure. The results indicate strong shared effects across both samples. Foremost, the Moral Foundations Questionnaire holds two latent trait dimensions, corresponding to the theoretical partitioning between Individualizing and Binding foundations. However, while the Relevance subscale was, in all, reliable in ordering respondents on level of ability, the Judgment subscale was not. Moreover, the dimensionality analysis showed that the Relevance subscale carries three cross-cultural homogeneity outlier items (items for loyalty and disorder concerns) in both samples. Lastly, while the test for local independence indicated adequate fit for the Individualizing trait dimension, the Binding dimension was theoretically ambiguous. Suggestions for improvements and future directions are discussed.

Keywords: moral psychology, item response theory, Mokken scale analysis, optimal scaling procedure, psychometrics

Supplementary materials: https://reurl.cc/KbvvQe

Citation:
Forsberg, E., & Sjöberg, A. (2021). Tracing morals: Reconstructing the moral foundations questionnaire in New Zealand and Sweden using Mokken scale analysis and optimal scaling procedure. Journal of Applied Measurement, 22(1/2), 60–82.

Career Advancement Inventory: Assessing Decent Work among Individuals with Psychiatric Disabilities

Uma Chandrika Millner
Sarah A. Satgunam
James Green
Tracy Woods
Lesley University
Richard Love
Adelphi University
Amanda Nutton
Health Care Resources Center
Larry Ludlow
Boston College

Comprehensive assessments of the outcomes of vocational programs and interventions are necessary to ameliorate the significant employment disparities among individuals with psychiatric disabilities. In particular, measuring the attainment of decent work is critical for assessing their vocational outcomes. In the absence of existing vocational instruments that assess progress towards decent work among individuals with psychiatric disabilities, we developed the Career Advancement Inventory (CAI). The CAI was theoretically grounded in the Career Pathways Framework (CPF), review of focus group data and existing literature, and constructed utilizing an iterative scale development approach and a combination of classical test theory and item response theory principles, specifically Rasch modeling. The CAI included five subscales: Self-Efficacy, Environmental Awareness, Work Motivation, Vocational Identity, and Career Adaptabilities. Rasch analyses indicated mixed results where some items in the subscales mapped onto the hierarchical stage-like progression as proposed by CPF, while others did not. The results support construct validity of the subscales, with the exception of Work Motivation, and contribute to the expansion of the theoretical propositions of CPF. The CAI has the potential to be an effective career assessment for individuals with psychiatric disabilities and has implications for vocational psychology and vocational rehabilitation.

Keywords: career development, psychiatric disabilities, serious mental illness, self-efficacy, vocational identity, Rasch models

Citation:
Millner, U. C., Satgunam, S. A., Green, J. B., Woods, T., Love, R., Nutton, A., & Ludlow, L. (2021). Career advancement inventory: Assessing decent work among individuals with psychiatric disabilities. Journal of Applied Measurement, 22(1/2), 83–113.

Using An Exploratory Quantitative Text Analysis (EQTA) to Synthesize Research Articles

Cheng Hua
Catanya Stager
Stefanie A. Wind
The University of Alabama

An Exploratory Quantitative Text Analysis (EQTA) method was proposed to synthesize large sets of scholarly publications and to examine thematic characteristics in the Journal of Applied Measurement (JAM). After synthesizing 578 articles published in JAM from 2000 to 2020, authors classified each article into five categories to compare the difference in three phases: (1) word frequency analysis from EQTA; (2) descriptive analysis in the trend of research articles and classifications in counts; and (3) thematic analysis in word frequency between article classifications. We found that (1) the most frequently used words are Item, Rasch model, and Measure; (2) most authors are from North America (380/578; 65.74%), followed by Europe (68/578; 11.76%) and other countries (130/578; 22.5%); (3) articles focus on model comparisons (77/578; 13%), followed by methodological developments (69/578; 12%) and reviews/other (43/578; 7%); (4) differences in classifications between application and methodology are displayed using Pyramid Plots. The EQTA revealed insight into the nature of JAM publications, including common topics and areas of emphasis. We recommend the use of EQTA in future studies with other journals.

Keywords: Exploratory Quantitative Text Analysis, text mining, Meta-evaluation, Journal evaluation

Citation:
Hua, C., Stager, C., & Wind, S. A. (2021). Using an exploratory quantitative text analysis (EQTA) to synthesize research articles. Journal of Applied Measurement, 22(1/2), 114–132.

Extended Rater Representations in the Many-Facet Rasch Model

Mark Elliott
Paula J.Buttery
University of Cambridge

Many-Facet Rasch Models (Eckes, 2009, 2015; Engelhard & Wind, 2017; Linacre, 1989) provide a framework for measuring rater effects for examiner-scored assessments, even under sparse data designs. However, the representation of a rater as a global scalar measure involves an assumption of uniformity of severity across the range of rating scales and criteria within each scale. We introduce extended rater representations of vectors or matrices of local measures relating to individual rating scales and criteria. We contrast these extended representations with previous work on local rater effects (Myford & Wolfe, 2003) and discuss issues related to their application, for raters and other facets. We conduct a simulation study to evaluate the models, using an extension of the CPAT algorithm (Elliott & Buttery, 2021). We conclude that extended representations more naturally and completely reflect the role of the rater within the assessment process and provide greater inferential power than the traditional global measure of severity. Extended representations also have applicability to other facets which may have non-uniform effects across items and thresholds.

Keywords: Many-facet Rasch model, rater effects, estimation

Citation:
Elliott, M., & Buttery, P. J. (2021). Extended rater representations in the many-facet Rasch model. Journal of Applied Measurement, 22(1/2), 133–160.