Goodness of Fit and Precision of the Graded Response Model Estimates of a Psychometric Scale at Varying Number of Categories
Abstract
This study aims to evaluate the absolute goodness of fit of Graded Response Model (GRM), proposed by Samejima in 1969, when the number of categories of a polytomous scale is changed, and to analyze the precision of GRM estimates of the latent trait. These polytomic scales are frequently used in psychometric measurements of skill levels, such as quantitative skills or reading skills. The analysis performs a simulation study design to assess these objectives. It was simulated a polytomous scale derived from normal variables based on a one-factor model. It was computed 100 replications of 1000 cases and 14 items for each scale of 3, 4, 5, and 6 categories. The GRM is estimated for each replication. The absolute goodness of fit is evaluated using the Likelihood Ratio Test (LRT) that contrasts the model with the saturated model: the G2 test. I analyze precision computing the Mean Bias and the Root-Mean-Square-Error (RMSE) of the GRM the ta parameter. The results show that, based on the G2 test, the best fit of the GRM is obtained with a scale of 5 categories, and the fit worsens with a 6 categories scale. There were no good predictions of the skill level with any of the 4 scales because all of them had RMSE of 0.97 and Mean Biases of 0.87. The study result suggests that it is better to operationalize skill levels with a 5-categories scale.
Full Text: PDF DOI: 10.15640/arms.v3n2a4
Abstract
This study aims to evaluate the absolute goodness of fit of Graded Response Model (GRM), proposed by Samejima in 1969, when the number of categories of a polytomous scale is changed, and to analyze the precision of GRM estimates of the latent trait. These polytomic scales are frequently used in psychometric measurements of skill levels, such as quantitative skills or reading skills. The analysis performs a simulation study design to assess these objectives. It was simulated a polytomous scale derived from normal variables based on a one-factor model. It was computed 100 replications of 1000 cases and 14 items for each scale of 3, 4, 5, and 6 categories. The GRM is estimated for each replication. The absolute goodness of fit is evaluated using the Likelihood Ratio Test (LRT) that contrasts the model with the saturated model: the G2 test. I analyze precision computing the Mean Bias and the Root-Mean-Square-Error (RMSE) of the GRM the ta parameter. The results show that, based on the G2 test, the best fit of the GRM is obtained with a scale of 5 categories, and the fit worsens with a 6 categories scale. There were no good predictions of the skill level with any of the 4 scales because all of them had RMSE of 0.97 and Mean Biases of 0.87. The study result suggests that it is better to operationalize skill levels with a 5-categories scale.
Full Text: PDF DOI: 10.15640/arms.v3n2a4
Browse Journals
Journal Policies
Information
Useful Links
- Call for Papers
- Submit Your Paper
- Publish in Your Native Language
- Subscribe the Journal
- Frequently Asked Questions
- Contact the Executive Editor
- Recommend this Journal to Librarian
- View the Current Issue
- View the Previous Issues
- Recommend this Journal to Friends
- Recommend a Special Issue
- Comment on the Journal
- Publish the Conference Proceedings
Latest Activities
Resources
Visiting Status
Today | 18 |
Yesterday | 263 |
This Month | 4273 |
Last Month | 4321 |
All Days | 1016424 |
Online | 4 |