PISA 2012 MATEMATİK OKURYAZARLIĞI TESTİNDE FARKLI ÖLÇEK DÖNÜŞTÜRME YÖNTEMLERİNİN KARŞILAŞTIRILMASI

OZTÜRK GÜBES, NESE; AKSEKİOĞLU, Burcu; Uyar, Şeyma

PISA 2012 MATEMATİK OKURYAZARLIĞI TESTİNDE FARKLI ÖLÇEK DÖNÜŞTÜRME YÖNTEMLERİNİN KARŞILAŞTIRILMASI

Şeyma UYAR, (Mehmet Akif Ersoy Üniversitesi, Eğitim Fakültesi, Eğitimde Ölçme ve Değerlendirme Anabilim Dalı, Burdur, Türkiye)

Burcu AKSEKİOĞLU, (Mehmet Akif Ersoy Üniversitesi, Eğitim Fakültesi, Eğitimde Ölçme ve Değerlendirme Anabilim Dalı, Burdur, Türkiye)

Neşe ÖZTÜRK GÜBEŞ (Mehmet Akif Ersoy Üniversitesi, Eğitim Fakültesi, Burdur, Türkiye)

Mehmet Akif Ersoy Üniversitesi Eğitim Fakültesi Dergisi

3 1

Yıl: 2018 Cilt: 1 Sayı: 46 Sayfa Aralığı: 121 - 148 Metin Dili: Türkçe İndeks Tarihi: 27-08-2019

PISA 2012 MATEMATİK OKURYAZARLIĞI TESTİNDE FARKLI ÖLÇEK DÖNÜŞTÜRME YÖNTEMLERİNİN KARŞILAŞTIRILMASI

Öz:

Bu çalışmada farklı ölçek dönüştürme yöntemlerini PISA 2012 matematik okuryazarlığı verileri üzerinde karşılaştırmak amaçlanmıştır. Bu amaçla seçilen iki kitapçıktan elde edilen puanlar madde tepki kuramına dayalı ölçek dönüştürme (ortalama-ortalama, ortalama-standart sapma, Stocking-Lord, Haebara) ve test eşitleme yöntemleri (MTK gerçek-puan eşitleme, MTK gözlenen-puan eşitleme) kullanılarak eşitlenmiş ve farklı yöntemlerden elde edilen sonuçlar incelenmiştir. Çalıma, 4 ve 11 numaralı kitapçıklardaki matematik testlerine verilen cevaplar kullanılarak yürütülmüştür. Bu nedenle araştırmanın çalışma grubunu Türkiye örnekleminde 4 numaralı kitapçığı cevaplayan 348 ve 11 numaralı kitapçığı cevaplayan 368 olmak üzere toplam 716 öğrenci oluşturmaktadır. Çalışmada test eşitleme için “denk olmayan gruplarda ortak madde deseni” kullanılmıştır. Verilerin analizinin ilk aşamasında madde tepki kuramının tek boyutluluk varsayımı test edilmiştir. Ardından PARSCALE 4.1 programı ile madde ve yetenek parametreleri kestirilmiştir. Parametre kestiriminde iki-parametreli lojistik model ve genelleştirilme kısmi kredi modeli kullanılmıştır. Daha sonra STUIRT programı ile dört farklı yöntem kullanılarak ölçek dönüştürme işlemi yapılmıştır. Son aĢamada ise her iki formdan elde edilen test puanları POLYEQUATE programı ile eĢitlenmiĢtir. Farklı yöntemlerden elde edilen hata miktarları ise ağırlıklandırılmış hata kareleri ortalaması (WMSE) ile hesaplanmıştır. Çalıma sonucunda, en az hata miktarına sahip yöntemin gerçek-puan eĢitlemede Stocking-Lord, gözlenen-puan eşitlemede ise Haebara yönteminin olduğu bulunmuştur. En yüksek eşitleme hatasını ise ortalama-standart sapma yönteminin verdiği tespit edilmiştir.

Anahtar Kelime:

Konular: Eğitim, Eğitim Araştırmaları

COMPARISON OF DIFFERENT SCALE LINKING METHODS IN PISA 2012 MATHEMATICS LITERACY TEST

Öz:

In this study, the objective was to compare different scale linking methods over the PISA 2012 mathematics literacy data. For this purpose, scores obtained from two selected booklets were equated using scale linking (mean-mean, mean-sigma, Stocking-Lord, Haebara) and test equating methods (IRT true-score equating, IRT observed-score equating) based on the item response theory, and results obtained from different methods were analyzed. The study was conducted using answers given to mathematics tests in booklet-4 and booklet-11. Therefore, the sample consists of 716 students in Turkey; 348 of these participants are the takers of booklet-4, 368 of them are the takers of booklet-11. In order to equate test forms, “the commonitem nonequivalent groups” design was used in this research. In the first stage of data analysis, unidimensionality assumption of the item response theory was analyzed. Then PARSCALE 4.1 was used to estimate item and ability parameters. Generalized partial credit and two-parameter logistic model were used to estimate parameters. Afterwards, STUIRT program was used for scale linking for four different methods. In the last step test scores obtained from different forms were equated by using POLYEQUATE program. Equating error obtained from different methods calculated with weighted mean squares error (WMSE) index. Results showed that Stocking-Lord method had the smallest equating error in true-score equating and Haebara method had the smallest equating error in observed-score equating. The amount of maximum error has been established that of the mean-sigma method.

Anahtar Kelime:

Konular: Eğitim, Eğitim Araştırmaları

Belge Türü: Makale Makale Türü: Araştırma Makalesi Erişim Türü: Erişime Açık

Angoff, W. H. (1984). Scales, norms and equivalent scores. Princeton, New Jersey: Educational Testing Service.
Baker, F. B. & Al-Karni, A. (1991). A comparison of two procedures for computing IRT equating coefficients. Journal of Educational Measurement, 28 (2), 147- 162.
Büyüköztürk, İ., Çokluk, Ö. & Köklü, N. (2013). Sosyal bilimler için istatistik (12. Baskı). Ankara: Pegem Akademi.
Cohen, A. S. & Kim, S. H. (1998). An investigation of linking methods under the graded response model. Applied Psychological Measurement, 22(2), 116-130.
Cook L. & Eignor D. R. (1991). NCME instructional module: IRT equating methods. Educational Measurement: Issues and Practices, 10(3), 37-45.
Crocker, L. & Algina, J. (1986). Introduction to classical and modern test theory. USA: Harcourt Brace Jovanovich College.
Çokluk, Ö., İekercioğlu, G. & Büyüköztürk, İ. (2014). Sosyal bilimler için çok değiİkenli istatistik: SPSS ve LISREL uygulamaları (3. Baskı). Ankara: Pegem Yayıncılık.
de Ayala, R. J. (2009). The theory and practice of item response theory. New York: The Guilford Press.
Embretson, S. E. & Reise, S. P. (2000). Item response theory for psychologists. London: Lawrence Erlbaum Associates Publishers.
Felan, G. D. (2002, February). Test equating: mean, linear, equipercentile and item response theory. Paper presented at the Annual Meeting of the Southwest Educational Research Association, Austin, Texas.
French, D. J. (1996). The utility of Stocking-Lord’s equating procedure for equating norm-referenced and criterion-referenced tests with both dichotomous and plytomous components. Unpublished doctorate dissertation, University of Texas, Texas.
Gök, B. (2012). Denk olmayan gruplarda ortak madde deseni kullanılarak madde tepki kuramına dayalı eİitleme yöntemlerinin karİılaİtırılması. Yayımlanmamıİ doktora tezi, Hacettepe Üniversitesi, Ankara.
Gültekin, S. (2014). Testlerde kullanılacak madde türleri, hazırlama ilkeleri ve puanlaması. N. Demirtaİlı (Ed.), Eğitimde ölçme ve değerlendirme (2. Baskı) içinde. Ankara: Edge Akademi.
Haebara, T. (1980). Equating lojistic ability scales by a weighted least squares method. Japanese Psychological Research, 22(3), 144-149.
Hagge, S. L. (2010). The impact of equating method and format representation of common items on the adequacy of mixed format test equating using nonequivalent groups. Unpublished doctorate dissertation, University of Lowa, Lowa City.
Hambleton, R. K. (1989). Item response theory: Introduction and bibliography. (Rapor no:196) Amherst: University of Massachusetts.
Hambleton, R. K. (1993). Principles and selected applications of item response theory. R. Linn (Ed.), Educational measurement (3. Baskı) içinde. Washington, D.C.: American Council on
Hambleton, R. K. & Swaminathan, H. (1985). Item response theory: Principles and applications. Boston: Kluwer, Nijhoff Publishing.
Hambleton, R. K., Swaminathan, H. & Rogers, H. J. (1991). Fundamentals of item response theory. USA: Sage.
Han, T., Kolen, M. & Pohlmann, J. (1997). A comparison among IRT true and observed-score equatings and traditional equipercentile equating. Applied Measurement in Education, 10(2), 105-121, doi: 10.1207/s15324818ame10021.
Hanson, B. A. & Beguin, A. A. (2002). Obtaining a common scale for item response theory item parameters using seperate versus concurrent estimation in the common-item equating design. Applied Psychological Measurement, 26 (3), 3-24.
Harris, D. J. & Crouse, J. D. (1993). A study of criteria used in equating. Applied Measurement in Education, 6 (3), 195-240.
Holland, P. W., Dorans, N. J. & Petersen, N. S. (2007). Equating test scores. C. R. Rao, S. Sinharay (Eds.), Handbook of statistics: Pschometrics (pp. 169-197) içinde. Amsterdam: Elsevier B. V.
Jones, P., Smith, R. V. & Talley, D. (2006). Developing test forms for small-scale achievement testing systems. S. M. Downing & T. M. Haladyna (Eds.), Handbook of test development içinde. Mahwah, N. J.: L. Erlbaum.
Jöreskog, K. G. & Sorbön, D. (1986). LISREL 8.7: Prells a program for multivariate data screening and data summarization [Computer software]. Mooresville, Ind: Scientific Software Inc.
Kim, H. K. (2006). The effect of repeaters on equating: A population invariance approach. Unpublished doctorate dissertation, The University of Lowa, Lowa City.
Kim, S. & Cohen, A. S. (2002). A comparison of linking and concurrent calibration under the graded response model. Applied Psychological Measurement, 26 (1), 25-41.
Kim, S. & Kolen, M. J. (2004). STUIRT: A computer program for scale transformation under unidimentional item response theory models [Computer software]. Lowa City, IA. The Center for Advanced Studies in Measurement and Assessment (CASMA), The University of Lowa.
Kim, S. & Kolen, M. J. (2006). Robustness to format effects of IRT linking methods for mixed-format tests. Applied Measurement in Education, 19 (4), 357-381.
Kim, S. & Kolen, M. J. (2007). Effects on scale linking of different definitions of criterion functions for the IRT characteristic curve methods. Journal of Educational and Behavioral Statistics 32(4), 371-397.
Kim, S. & Lee, W. (2004). IRT scale linking methods for mixed-format tests. (ACT Research Report 2004-5). Lowa City, IA: Act, Inc.
Kim, S. & Lee, W. (2006). IRT scale linking methods for mixed-format tests (ACT Research Report 2004-5). Lowa City, IA: Act, Inc.
Kolen, M. J. (1981). Comparison of traditional and item response theory methods for equating tests. Journal of Educational Measurement, 18 (1), 1-11.
Kolen, M. J. (1988). An NCME instructional module on traditional equating methodology. Educational Measurement: Issues and Practice, 7(4), 29-36.
Kolen, M. J. (2004). POLYEQUATE windows console version [Computer software]. Lowa City IA: The Center for Advanced Studies in Measurement and Assessment (CASMA), The University of Lowa.
Kolen, M. J. & Brennan, R. L. (1995). Test equating: Methods and practices. New York: Springer.
Kolen, M. J. & Brennan, R. L. (2004). Test equating, scalling and linking (2nd ed.). New York: Springer.
Kolen, M. J. & Brennan, R. L. (2014). Test equating, scaling and linking: Methods and practices (3rd ed.). New York: Springer.
Kubiszyn, T. & Borich, G. D. (2013). Educational testing and measurement: Classroom application and practice (10th ed.). New Jersey: Wiley.
Lee, W. & Ban, J. (2010). A comparison of IRT linking procedures. Applied Measurement in Education 23(1), 23-48.
Li, Y. H., Lissitz R. W. & Yang, Y. N. (1999, April). Estimating IRT equating coefficients for tests with poltomously and dichotomously scored items. Paper presented at Annual Meeting of The National Council on Measurement in Education, Montreal, Canada.
Lord F. M. & Wingersky M. S. (1984). Comparison of IRT true-score and equipercentile observed score equatings. Applied Psychological Measurement, 8, 452–461.
Lorenzo-Seva, U. & Ferrando, P. J. (2006). FAKTOR 10.4 [Computer software]. Tarragona: Universitat Rovira i Virgili.
Loyd, B. H. & Hoover, H. D. (1980). Vertical equating using the rasch model. Journal of Educational Measurement, 17(3), 179-193.
Marco, G. L. (1977). Item characteristic curve solutions to three intracteble testing problems. Journal of Educational Measurement, 14(2), 139-160.
MEB (2013). PISA 2012 ulusal ön raporu. Ankara: Sebit.
Muraki, E. (1997). A generalized partial credit model. W.J. van der Linden & R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 153-164) içinde. New York: Springer.
Muraki, E. & Bock, R. D. (2003). PARSCALE 4.1 [Computer software]. Chicago, IL: Scientific Software International, Inc.
OECD (2009). PISA Data Analysis Manual: SPSS (Second Edition). PISA, OECD Publishing, doi: 10.1787/9789264056275-en.
Ogasawara, H. (2000). Asymptotic standard errors of IRT equating coefficients using moments. Economic Review (Otaru University of Commerce), 51(1), 1-23.
Ogasawara, H. (2001). Standart errors of item response theory equating / linking by response function methods. Applied Psychological Measurement, 25 (1), 53- 67.
Ostini, R. & Nering, M. L. (2006). Polytomous item response theory models. California: Sage.
Öztürk-Gübeİ, N. & Kelecioğlu, H. (2016). The impact of test dimensionality, common-item set format, and scale linking methods on mixed-format test equating. Educational Sciences: Theory and Practice, 16, 715-734.
Petersen, N. S., Kolen, M. J. & Hoover, H. D. (1989). Scaling, norming and equating. R. L. Linn (Ed.), Educational measurement (pp. 221-262) içinde. New York: Macmillan.
Sinharay, S. & Hollland, P. W. (2010). A new approach to comparing several equating methods in the context of the NEAT design. Journal of Educational Measurement, 47(3), 261-285.
Skaggs, G & Lissitz, R. (1982, March) Test equating: relevant ıssues and a review of recent research. Paper presented at the Annual Meeting of the American Educational Research Association, Los Angeles, California.
Speron, E. (2009). A comparison of metric linking procedures in item response theory. Unpublished doctorate dissertation, IIIinois Institute of Technology, Chicago.
Stocking, M. L. & Lord, F. M. (1983). Developing a common metric in item response theory. Applied Psychological Measurement, 7(2), 201-2010.
Sönmez, V. & Alacapınar, F. G. (2016). Örneklendirilmiİ bilimsel araİtırma yöntemleri (4. Baskı). Ankara: Anı Yayıncılık.
Tanguma, J. (2000, January). Equating test scores using the linear method: a primer. Paper presented at the Annual Meeting of the Southwest Educational Research Association, Dallas, Texas.
Tate, R. (2000). Performance of a proposed method for he linking of mixed-format tests with constructed response and multiple choice items. Journal of Educational Measurement, 37(4), 329-346.
Tsai, T., Hanson, B. A., Kolen, M. J. & Forsyth, R. A. (2001). A comparison of bootstrap standard errors of IRT equating methods for the common-item nonequivalent groups design. Applied Measurement in Education, 14(1), 17-30, doi: 10.1207/S15324818AME1401_03.
Uysal, İ. (2014). Madde tepki kuramına dayalı test eİitleme yöntemlerinin karma modeller üzerinde karİılaİtırılması. Yayımlanmamıİ yüksek lisans tezi, Abant İzzet Baysal Üniversitesi, Bolu.
Yang, W. L. & Houang, R. T. (1996, April). The effect of anchor length and equating method on the accuracy of test equating comparisons of linear and IRT-based equating using an anchor-item design. Paper presented at American Educational Research Association, New York, USA.
Yen, W. & Fitzpatrick, A. R. (2006). Item response theory. R. L. Brennan (Ed.), Educational measurement içinde (4. Baskı). Westport, CT: Praeger Publishers.
Zhu, W. (1998). Test equating: What, why and how? Research Quarterly for Exercises and Sport, 69(1), 11–23.

APA	Uyar Ş, AKSEKİOĞLU B, OZTÜRK GÜBES N (2018). PISA 2012 MATEMATİK OKURYAZARLIĞI TESTİNDE FARKLI ÖLÇEK DÖNÜŞTÜRME YÖNTEMLERİNİN KARŞILAŞTIRILMASI. , 121 - 148.
Chicago	Uyar Şeyma,AKSEKİOĞLU Burcu,OZTÜRK GÜBES NESE PISA 2012 MATEMATİK OKURYAZARLIĞI TESTİNDE FARKLI ÖLÇEK DÖNÜŞTÜRME YÖNTEMLERİNİN KARŞILAŞTIRILMASI. (2018): 121 - 148.
MLA	Uyar Şeyma,AKSEKİOĞLU Burcu,OZTÜRK GÜBES NESE PISA 2012 MATEMATİK OKURYAZARLIĞI TESTİNDE FARKLI ÖLÇEK DÖNÜŞTÜRME YÖNTEMLERİNİN KARŞILAŞTIRILMASI. , 2018, ss.121 - 148.
AMA	Uyar Ş,AKSEKİOĞLU B,OZTÜRK GÜBES N PISA 2012 MATEMATİK OKURYAZARLIĞI TESTİNDE FARKLI ÖLÇEK DÖNÜŞTÜRME YÖNTEMLERİNİN KARŞILAŞTIRILMASI. . 2018; 121 - 148.
Vancouver	Uyar Ş,AKSEKİOĞLU B,OZTÜRK GÜBES N PISA 2012 MATEMATİK OKURYAZARLIĞI TESTİNDE FARKLI ÖLÇEK DÖNÜŞTÜRME YÖNTEMLERİNİN KARŞILAŞTIRILMASI. . 2018; 121 - 148.
IEEE	Uyar Ş,AKSEKİOĞLU B,OZTÜRK GÜBES N "PISA 2012 MATEMATİK OKURYAZARLIĞI TESTİNDE FARKLI ÖLÇEK DÖNÜŞTÜRME YÖNTEMLERİNİN KARŞILAŞTIRILMASI." , ss.121 - 148, 2018.
ISNAD	Uyar, Şeyma vd. "PISA 2012 MATEMATİK OKURYAZARLIĞI TESTİNDE FARKLI ÖLÇEK DÖNÜŞTÜRME YÖNTEMLERİNİN KARŞILAŞTIRILMASI". (2018), 121-148.

APA	Uyar Ş, AKSEKİOĞLU B, OZTÜRK GÜBES N (2018). PISA 2012 MATEMATİK OKURYAZARLIĞI TESTİNDE FARKLI ÖLÇEK DÖNÜŞTÜRME YÖNTEMLERİNİN KARŞILAŞTIRILMASI. Mehmet Akif Ersoy Üniversitesi Eğitim Fakültesi Dergisi, 1(46), 121 - 148.
Chicago	Uyar Şeyma,AKSEKİOĞLU Burcu,OZTÜRK GÜBES NESE PISA 2012 MATEMATİK OKURYAZARLIĞI TESTİNDE FARKLI ÖLÇEK DÖNÜŞTÜRME YÖNTEMLERİNİN KARŞILAŞTIRILMASI. Mehmet Akif Ersoy Üniversitesi Eğitim Fakültesi Dergisi 1, no.46 (2018): 121 - 148.
MLA	Uyar Şeyma,AKSEKİOĞLU Burcu,OZTÜRK GÜBES NESE PISA 2012 MATEMATİK OKURYAZARLIĞI TESTİNDE FARKLI ÖLÇEK DÖNÜŞTÜRME YÖNTEMLERİNİN KARŞILAŞTIRILMASI. Mehmet Akif Ersoy Üniversitesi Eğitim Fakültesi Dergisi, vol.1, no.46, 2018, ss.121 - 148.
AMA	Uyar Ş,AKSEKİOĞLU B,OZTÜRK GÜBES N PISA 2012 MATEMATİK OKURYAZARLIĞI TESTİNDE FARKLI ÖLÇEK DÖNÜŞTÜRME YÖNTEMLERİNİN KARŞILAŞTIRILMASI. Mehmet Akif Ersoy Üniversitesi Eğitim Fakültesi Dergisi. 2018; 1(46): 121 - 148.
Vancouver	Uyar Ş,AKSEKİOĞLU B,OZTÜRK GÜBES N PISA 2012 MATEMATİK OKURYAZARLIĞI TESTİNDE FARKLI ÖLÇEK DÖNÜŞTÜRME YÖNTEMLERİNİN KARŞILAŞTIRILMASI. Mehmet Akif Ersoy Üniversitesi Eğitim Fakültesi Dergisi. 2018; 1(46): 121 - 148.
IEEE	Uyar Ş,AKSEKİOĞLU B,OZTÜRK GÜBES N "PISA 2012 MATEMATİK OKURYAZARLIĞI TESTİNDE FARKLI ÖLÇEK DÖNÜŞTÜRME YÖNTEMLERİNİN KARŞILAŞTIRILMASI." Mehmet Akif Ersoy Üniversitesi Eğitim Fakültesi Dergisi, 1, ss.121 - 148, 2018.
ISNAD	Uyar, Şeyma vd. "PISA 2012 MATEMATİK OKURYAZARLIĞI TESTİNDE FARKLI ÖLÇEK DÖNÜŞTÜRME YÖNTEMLERİNİN KARŞILAŞTIRILMASI". Mehmet Akif Ersoy Üniversitesi Eğitim Fakültesi Dergisi 1/46 (2018), 121-148.