Examining of Internal Consistency Coefficients in Mixed-Format Tests in Different Simulation Conditions

Demir, Ergul; GÜRDİL, HATİCE

doi:10.14689/ejer.2020.87.5

Examining of Internal Consistency Coefficients in Mixed-Format Tests in Different Simulation Conditions

Hatice GURDIL EGE, (Atam İlköğretim Okulu, Türkiye)

ERGÜL DEMİR (Ankara Üniversitesi, Eğitim Bilimleri Fakültesi, Türkiye)

Eurasian Journal of Educational Research

1 1

Yıl: 2020 Cilt: 20 Sayı: 87 Sayfa Aralığı: 101 - 118 Metin Dili: İngilizce DOI: 10.14689/ejer.2020.87.5 İndeks Tarihi: 26-11-2020

Examining of Internal Consistency Coefficients in Mixed-Format Tests in Different Simulation Conditions

Öz:

Purpose: The present study aims to evaluate how thereliabilities computed using α, Stratified α, AngoffFeldt, and Feldt-Raju estimators may differ whensample size (500, 1000, and 2000) and item type ratioof dichotomous to polytomous items (2:1; 1:1, 1:2)included in the scale are varied.Research Methods: In this study, Cronbach’s α,Stratified α, Angoff-Feldt, and Feldt-Raju reliabilitycoefficients were estimated on simulated datasets(sample sizes 500, 1000, 2000) and the number of dichotomous versus polytomous item ratios(2:1, 1:1, 1:2).Findings: In the simulation conditions of this research, in all sample size conditions, estimatedAngoff-Feldt, and Feldt-Raju reliability coefficients were higher when the number ofdichotomous items in the item-type ratio was higher than that of polytomous items. This wasalso the case for the estimated α and Stratified α reliability coefficients when the item-typeratio was reversed. While all different reliability estimators gave similar results in the largesamples (n≥1000), there were some differences in reliability estimates depending on the itemtype ratio in the small samples (n=500).Implications for Research and Practice: In the light of the findings and conclusions obtainedin this study, it may be advisable to use α and Stratified α for mixed-type scales when thenumber of polytomously scored items in the scale is higher than that of the dichotomouslyscored items. On the other hand, the coefficients Angoff-Feldt and Feldt-Raju arerecommended when the number of items scored dichotomously is higher.

Anahtar Kelime:

Belge Türü: Makale Makale Türü: Araştırma Makalesi Erişim Türü: Erişime Açık

Baker, F. B. (1998). An investigation of the item parameter recovery characteristics of a Gibbs sampling procedure. Applied Psychological Measurement, 22(2), 153-169.
Bastari, B. (2000). Linking MCQ and CR Itemsto a common proficiency scale (Unpublished doctoral dissertation). University of Massachusetts Amherst, USA.
Berger, M. P. (1998). Optimal design of tests with dichotomous and polytomous items. Applied Psychological Measurement, 22(3), 248-258.
Cao, Y. (2008). Mixed-format test equating: Effects of test dimensionality and common-item sets(Doctoral dissertation). Retrived from https://drum.lib.umd.edu/handle/1903/8843
Charter, R. A. (1999). Sample size requirements for precise estimates of reliability, generalizability, and validity coefficients. Journal of Clinical and Experimental Neuropsychology, 21(4), 559-566.
Crocker, L., &Algina J. (2008). Introductiont a classical and modern test theory. N.Y.: Nelson Education.Cronbach, L. J., Schönemann, P., &McKie, D. (1965). α coefficients for stratified-parallel tests. Educational and Psychological Measurement, 25(2), 291-312.
Cronbach, L. J., &Shavelson, R. J. (2004). My current thoughts on coefficient alpha and successor procedures. Educational and psychological measurement, 64(3), 391- 418.
DeVellis, R. F. (2003). Scale development: Theory and application. Sage Publications: California.
Donoghue, J. R. (1993). An empirical examination of the IRT information in polytomously scored reading items. ETS Research Report Series, 1993(1).
Ercikan, K., Schwarz, R., Julian, M.W., Burket, G.R., Weber, M.W., &Link, V. (1998). Calibration and scoring of tests with multiplie-choice and constructed response test item type. Journal of Educational Measurement, 35(2), 137-154.
Eren, B. (2015). The comparison of student achievements, students' and teachers' views for multiple choice and mixed format test applications (Unpublished master’s dissertation). Ankara üniversitesi, Ankara.
Falk, C. F., & Savalei, V. (2011). The relationship between unstandardized and standardized alpha, true reliability, and the underlying measurement model. Journal of personality assessment, 93(5), 445-453.
Feldt, L. S., &Brennan, R. L. (1989). Reliability. In R. L. Linn (Ed.), Educational Measurement(3rd ed.,pp.105-146). New York: Macmillan.
Feldt, L. S. (2002). Estimating the internal consistency reliability of tests composed of test lets varying in length. Applied Measurement in Education, 15(1), 33-48.
Feldt, L. S., & Charter, R. A. (2003). Estimation of internal consistency reliability when test parts vary in effective length. Measurement and Evaluation in Counseling and Development, 36(1), 23-27.
Gao, F., & Chen, L. (2005). Bayesia nor non-Bayesian: A comparison study of item parameter estimation in the three-parameter logistic model. Applied Measurement in Education, 18(4), 351-380.
Gay, L. R. (1996). Educational research: competencies for analysis and application (5th ed). By Prentice-HallInc.: USA.
Gubes, N. Ö. (2014). The effects of test dimensionality, common item format, ability distribution and scale transformation methods on mixed - format test equating(Doctoral dissertation). Retrieved from https://tez.yok.gov.tr/UlusalTezMerkezi/. (Accession Number: 399465)
Gul, E. (2015). Examining multidimensional structure in view of unidimensional and multidimensional item response theory (Doctoral dissertation). Retrieved from https://tez.yok.gov.tr/UlusalTezMerkezi/. (Accession Number: 419288)
Gultekin, S. (2011). The evaluation based on Item Response Theory of the psychometric characteristics in multiple choice, constructed response and mixed format tests (Doctoral dissertation). Retrieved from https://tez.yok.gov.tr/UlusalTezMerkezi/. (Accession Number: 302033)
Guttman, L. (1945). A basis for analyzing test-retest reliability. Psychometrika, 10(4), 255-282.
Hambleton, R. K., Swaminathan, H., Rogers, H. (1991), Fundamentals of Item Response Theory. Newbury Park CA: Sage Publications.
Harwell, M., Stone, C. A., Hsu, T. C., &Kirisci, L. (1996). Monte Carlo studies in item response theory. Applied psychological measurement, 20(2), 101-125.
He, Q. (2009). Estimating the reliability of composite scores. Retrieved from https://pdfs.semanticscholar.org/0f54/d8c356f82fbca4fd2326239c1d21fbc9 b778.pdf
He, Y. (2011). Evaluating equating properties formixed-format tests (Unpublished doctoral dissertation). University of Iowa, Iowa City.
Hu, B. (2018). Equating Errors and Scale Drift in Linked-Chain IRT Equating with Mixed-Format Tests. Journal of applied measurement, 19(1), 41-58.
Kim, S. H., & Lee, W.-C. (2006). An extension of four IRT linking methods for mixedformat tests. Journal of Educational Measurement, 43, 53-76.
Kim, S. Y., & Lee, W. C. (2018). Simple-Structure MIRT True-Score Equating for MixedFormat Tests. Mixed-Format Tests: Psychometric Properties with a Primary Focus on Equating (Volume 5), 127.
Kim, S. Y., & Lee, W. C. (2019). Classification consistency and accuracy for mixedformat tests. Applied Measurement in Education, 32(2), 97-115.
Kinsey, T. L. (2003). A comparison of IRT and rasch procedures in a mixed-item format test (Doctoral dissertation).Retrieved from ProQuest Digital Dissertations.
Kirkpatrick, R. K. (2005). The effects of item format in common item equating(Unpublished doctoral dissertation). University of Iowa, Iowa City.
Lee, G., & Lee, W. C. (2016). Bi-factor MIRT observed-score equating for mixed-format tests. Applied Measurement in Education, 29(3), 224-241.
Li, Z., Chen, H., & Li, T. (2018). Exploring the Accuracy of MIRT Scale Linking Procedures for Mixed-format Tests. arXiv preprint arXiv:1805.00189.
Lord, F. M. (1980). Applications of item response theory topractical testing problems. London: Routledge.
Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading MA: Addison-Welsley Publishing CompanyLucke, J. F. (2005a). The α and ω of congeneric test theory: An extension of reliability and internal consistency to heterogeneous tests. Applied Psychological Measurements. 29(1), 65-81.
Masters, G. N. (1982). A Rasch model forpartial credit scoring. Psychometrika, 47(2), 149-174.
Mehren, W.A. & Lehmann I.J. (1973). A measurement and evaluation in educationand psychology. New York: Holt. Rinehartand Winston.
Nunnally, J.C. (1964). Educational measurement and evaluation (6th ed.). New York: McGraw- Hill Book Company.
Odabas, M. (2016). The comparison of DINA model signed difference index, standardization and logistic regression techniques for detecting differential item functioning (Unpublished doctoral dissertation). Hacettepe Üniversitesi, Ankara.
Osbourn, H.G. (2000). Coefficient α and related internal consistency reliability coefficients. Psychological Methods, 5, 343-355.
Qualls, A. L. (1995). Estimating the reliability of a test containing multiple item formats. Applied Measurement in Education, 8(2), 111-120.
Raykov, T., &Shrout, P. E. (2002). Reliability of scaleswith general structure: Point andinterval estimation using a structural equation modeling approach. Structural Equation Modeling, 9(2), 195-212.
Saen-amnuaiphon, R., Tuksino, P.,& Nichanong, C. (2012). The Effect of Proportion of Mixed-Format Scoring: Mixed-Format Achievement Tests. Procedia-Social and Behavioral Sciences, 69, 1522-1528.
Sijtsma, K. (2009). On the use, the misuse, and the very limited usefulness of Cronbach’s α. Psychometrika, 74(1), 107.
Spray, J. A. (1990). Comparison of Two Logistic Multidimensional Item Response Theory Models. (Research Report ONR90-8). ACT, Inc., Iowa City, IA. Sykes, R. C., Truskosky, D.,&White, H. (11-12 April 2001), Determining The Representation of Constructed Response Items in Mixed-Item-Format Exams. Paperpresented at Annual Meeting of the National Council on Measurement in Education, ABD: Seattle.
Tekin, H. (1991). Measurement and evaluation in education. Ankara: Yargı yayınevi.
Uysal, İ., & Kilmen, S. (2016). Comparison of Item Response Theory Test Equating Methods for Mixed Format Tests. International Online Journal of Educational Sciences, 8(2), 1-11.
Wainer, H. (1976). Estimating coefficients in linear models: It don't make nonever mind. Psychological Bulletin, 83(2), 213.
Wang, W., Drasgow, F., & Liu, L. (2016). Classification accuracy of mixed format tests: A bi-factor item response theory approach. Frontiers in psychology, 7, 270
Warrens, M. J. (2016). A comparison of reliability coefficients for psychometric tests that consist of two parts. Advances in Data Analysis and Classification, 10(1), 71- 84.
Young, M. J., &Yoon, B. (1998). Estimating the consistency and accuracy of classification in a standards-referenced assessment.Retrieved from https://cresst.org/wpcontent/uploads/TECH475.pdf.
Zinbarg, R. E., Revelle, W., Yovel, I. & Li, W. (2005). Cronbach’s α, Revelle’s, β and McDonalds ω: their relations with each other and two alternative conceptualizations of reliability. Psychometrika, 70(1), 1-11.

APA	GÜRDİL H, Demir E (2020). Examining of Internal Consistency Coefficients in Mixed-Format Tests in Different Simulation Conditions. , 101 - 118. 10.14689/ejer.2020.87.5
Chicago	GÜRDİL HATİCE,Demir Ergul Examining of Internal Consistency Coefficients in Mixed-Format Tests in Different Simulation Conditions. (2020): 101 - 118. 10.14689/ejer.2020.87.5
MLA	GÜRDİL HATİCE,Demir Ergul Examining of Internal Consistency Coefficients in Mixed-Format Tests in Different Simulation Conditions. , 2020, ss.101 - 118. 10.14689/ejer.2020.87.5
AMA	GÜRDİL H,Demir E Examining of Internal Consistency Coefficients in Mixed-Format Tests in Different Simulation Conditions. . 2020; 101 - 118. 10.14689/ejer.2020.87.5
Vancouver	GÜRDİL H,Demir E Examining of Internal Consistency Coefficients in Mixed-Format Tests in Different Simulation Conditions. . 2020; 101 - 118. 10.14689/ejer.2020.87.5
IEEE	GÜRDİL H,Demir E "Examining of Internal Consistency Coefficients in Mixed-Format Tests in Different Simulation Conditions." , ss.101 - 118, 2020. 10.14689/ejer.2020.87.5
ISNAD	GÜRDİL, HATİCE - Demir, Ergul. "Examining of Internal Consistency Coefficients in Mixed-Format Tests in Different Simulation Conditions". (2020), 101-118. https://doi.org/10.14689/ejer.2020.87.5

APA	GÜRDİL H, Demir E (2020). Examining of Internal Consistency Coefficients in Mixed-Format Tests in Different Simulation Conditions. Eurasian Journal of Educational Research, 20(87), 101 - 118. 10.14689/ejer.2020.87.5
Chicago	GÜRDİL HATİCE,Demir Ergul Examining of Internal Consistency Coefficients in Mixed-Format Tests in Different Simulation Conditions. Eurasian Journal of Educational Research 20, no.87 (2020): 101 - 118. 10.14689/ejer.2020.87.5
MLA	GÜRDİL HATİCE,Demir Ergul Examining of Internal Consistency Coefficients in Mixed-Format Tests in Different Simulation Conditions. Eurasian Journal of Educational Research, vol.20, no.87, 2020, ss.101 - 118. 10.14689/ejer.2020.87.5
AMA	GÜRDİL H,Demir E Examining of Internal Consistency Coefficients in Mixed-Format Tests in Different Simulation Conditions. Eurasian Journal of Educational Research. 2020; 20(87): 101 - 118. 10.14689/ejer.2020.87.5
Vancouver	GÜRDİL H,Demir E Examining of Internal Consistency Coefficients in Mixed-Format Tests in Different Simulation Conditions. Eurasian Journal of Educational Research. 2020; 20(87): 101 - 118. 10.14689/ejer.2020.87.5
IEEE	GÜRDİL H,Demir E "Examining of Internal Consistency Coefficients in Mixed-Format Tests in Different Simulation Conditions." Eurasian Journal of Educational Research, 20, ss.101 - 118, 2020. 10.14689/ejer.2020.87.5
ISNAD	GÜRDİL, HATİCE - Demir, Ergul. "Examining of Internal Consistency Coefficients in Mixed-Format Tests in Different Simulation Conditions". Eurasian Journal of Educational Research 20/87 (2020), 101-118. https://doi.org/10.14689/ejer.2020.87.5