Yıl: 2011 Cilt: 40 Sayı: 1 Sayfa Aralığı: 86 - 123 Metin Dili: Türkçe İndeks Tarihi: 29-07-2022

Structural equation modeling (SEM) of categorical and mixed-data using the novel Gifi transformations and information complexity (ICOMP) criterion

Öz:
Bu çalışmada Yapısal Eşitlik Modelleri’nde (YEM) kategorik, ikili veya karma veri setlerinin analizine ilişkin var olan problemleri çözmek için özgün bir alternatif yaklaşım olarak Gifi yöntemi önerilmiştir. Gifi yönteminde, kategorik değişkenleri nicel hale dönüştürmek için optimal ölçekleme yöntemi kullanılır. Nicelleştirme sürecinde gözlenen değişkendeki bilgi, dönüştürülmüş değişkende aynen korunur. Yani Gifi yöntemi, kategorik değişkenlerin ölçek özelliklerini bozmadan kategorik veriyi sürekli veriye dönüştürür ve bu dönüştürme işleminde herhangi bir bilgi kaybı söz konusu olmaz. Ölçek özellikleri, dönüştürülmüş doğrusal olmayan sürekli Gifi veri uzayında saklanır. Bu nedenle dönüştürme işleminden geriye dönüş mümkündür. Bu işlem, literatürde halen uygulanmakta olan rasgele belirlenmiş başlangıç değerlerini göz ardı eden Gifi sisteme özgün bir özelliktir. Gifi dönüşümünden sonra, çoklu normal dağılım varsayımına dayalı YEM kullanılarak dönüştürülmüş veri seti analiz edilmiştir. Böyle bir yaklaşım YEM’de, kategorik veriler için göz ardı edilen çok değişkenli normal dağılım varsayımını sağlamaktadır. Akaike’nin [1] Akaike Bilgi Kriteri (AIC), Bozdoğan’ın [2] Tutarlı Akaike Bilgi Kriteri (CAIC) ve Bozdoğan’ın [3-7] Bilgi Karmaşıklığı Kriteri (ICOMP) gibi bilgiye dayalı model seçim kriterleri YEM’de uyumun bir ölçümü olarak uygulanmaktadır. Minimum kriter değerini veren model, rakip modeller arasında veriye en iyi uyumlu model olarak seçilir. Bu çalışmada yaşam kalitesinin ölçüldüğü gerçek bir kategorik veri seti kullanılmıştır. Bu veri setine Gifi dönüşüm uygulayarak önerilen yaklaşımın çok yönlülüğü ve esnekliği gösterilmiştir. Ayrıca dönüştürülmüş veri seti üzerinden farklı YEM için model seçim kriter değerleri elde edilmiş ve minimum kriter değerini veren en iyi model belirlenmiştir.
Anahtar Kelime:

Kategorik ve karma veri setlerinin yapısal eşitlik modellemesinde (YEM) Gifi yaklaşımı kullanımı ve bilgi karmaşıklığı kriteri (ICOMP)

Öz:
This paper introduces and develops a novel and computationally feasible alternative approach to the analysis of categorical, dichotomous, and mixed data sets in structural equation models (SEMs) to overcome currently existing problems. Our approach is based on the Gifi system. The Gifi system uses the optimal scaling methodology to quantify the observed categorical variables. In the quantification process, information in the observed variable is retained in the quantified variable. That is, the Gifi system transforms categorical data to continuous data without destroying the scale properties of the categorical variables. The scaling is thus preserved in the transformed nonlinear continuous Gifi data space. Hence the transformation is invertible. This is one of the unique characteristics of the Gifi system which avoids the arbitrary thresholding specification that is currently practiced and used in the literature. After the Gifi transformation, we analyze the transformed data set using SEM based on the multinormal distributional assumption. Such an approach legitimizes the distributional assumption of multivariate normality in SEM. Information-theoretic model selection criteria such as Akaike’s [1] AIC, Bozdogan’s [2] Consistent AIC, called CAIC, and the information-theoretic measure of complexity ICOMP criterion of Bozdogan [3-7] are introduced and develop as measures of fit in SEMs. The model with the minimum values of the criteria is selected as the best fitting model among a portfolio of candidate models. We provide a real benchmark numerical example using SEM on a categorical data set which measures the quality of life (QOL) to illustrate the versatility and flexibility of our approach using the Gifi transformations on this data set and fit five alternative SEM models by scoring the model selection criteria.
Anahtar Kelime:

Belge Türü: Makale Makale Türü: Araştırma Makalesi Erişim Türü: Erişime Açık
  • [1] H. Akaike, Information theory and an extension of the maximum likelihood principle. In B.N. Petrov and F. Csáki (Eds.), Second International Symposium on Information Theory, Académiai Kiadó, Budapest, 267-281 (1973).
  • [2] H. Bozdogan, Model selection and Akaike’s information criterion (AIC): The general heory and its analytical extensions, Psychometrika, 52, 3, 345-370 (1987).
  • [3] H. Bozdogan, Mixture-model cluster analysis using a new informational complexity and model selection criteria. In Multivariate Statistical Modeling, H. Bozdogan (Ed.), Vol. 2, Proceedings of the First US/Japan Conference on the Frontiers of Statistical Modeling: An Informational Approach, Kluwer Academic Publishers, the Netherlands, Dordrecht, 69-113 (1994).
  • [4] H. Bozdogan, Akaike's information criterion and recent developments in information complexity. Journal of Mathematical Psychology, 44, 62-91 (2000).
  • [5] H. Bozdogan, Intelligent Statistical Data Mining with Information Complexity and Genetic Algorithms. In H. Bozdogan (Ed.), Statistical Data Mining & Knowledge Discovery, Chapman & Hall/CRC, 15–56 (2004).
  • [6] H. Bozdogan, A new class of information complexity (ICOMP) criteria with an application to customer profiling and segmentation, Istanbul University Journal of the School of Business Administration, 39, 2, 370-398 (2010).
  • [7] H. Bozdogan, Information Complexity and Multivariate Learning in High Dimensions in Data Mining. A forthcoming book to appear (2011).
  • [8] K. Bollen, Structural Equations with Latent Variables, Wiley, 1989.
  • [9] S-Y Lee, Structural Equation Modeling: A Bayesian Approach, Wiley, 2007.
  • [10] A. Skrondal, and S. Rabe-Hesketh, Generalized Latent Variable Modeling, Chapman & Hall/CRC, 2004.
  • [11] A. Skrondal, and S. Rabe-Hesketh, Structural equation modeling: categorical variables, Entry for the Encyclopedia of Statistics in Behavioral Science, Wiley,2005.
  • [12] A. Christoffersson, Factor analysis of dichotomized variables, Psychometrika, 40, 5-32 (1975).
  • [13] B. Muthén, Contributions to factor analysis of dichotomous variables, Psychometrika, 43, 551-560 (1978).
  • [14] B. Muthén, A structural probit model with latent variables. Journal of American Statistical Association, 74, 807-811 (1979).
  • [15] B. Muthén, Factor analysis of dichotomous variables: American attitudes towards abortion. In D.J. Jackson and E. F. Borgetta (Eds.), Factor Analysis and Measurement in Sociological Research. Beverly Hills, Sage, 1980.
  • [16] B. Muthén, Some categorical response models with continuous latent variables. In K. G. Joreskög and H. Wold (Eds.), Systems under Indirect Observation: Causality, Structure, and Prediction. Noth Holland Publishing Co., Amsterdam, 1981.
  • [17] B. Muthén, Latent variable structural equation modeling with categorical data, Journal of Econometrics, 22, 43-65 (1983).
  • [18] B. Muthén, A general structural equation model with dichotomous, ordered categorical, and continuous latent variable indicators, Psychometrika, 49, 115-132(1984).
  • [19] B. Muthén, and A. Christoffersson, Simultaneous factor analysis of dichotomous variables in several groups, Psychometrika, 46, 407-419 (1981).
  • [20] B. Muthén, and D. Kaplan, A comparison for some methodologies for the factor analysis of non-normal Likert variables, British Journal of Mathematical and Statistical Psychology, 38, 171-189 (1985).
  • [21] D.J. Bartholomew, Factor analysis for categorical data (with discussion). Journal of the Royal Statistical Society, Series B, 42, 293-321 (1980).
  • [22] D.J. Bartholomew, Latent Variable Models and Factor Analysis, Charles Griffin & Company, London, 1987.
  • [23] R. Mislevy, Recent developments in the factor analysis of categorical variables, Journal of Educational Statistics, 11, 3-31 (1986).
  • [24] S. Geman, and D. Geman, Stochastic relaxation, Gibbs distribution and the Bayesian restoration of images, IEEE Transactions on Pattern Analysis and Machine Intelligence, 6, 721-741 (1984).
  • [25] A. Gifi, Nonlinear Multivariate Analysis, Wiley, 1990.
  • [26] H. Bozdogan, ICOMP: A new model-selection criterion. In H. H. Bock (Ed.), Classification and Related Methods of Data Analysis, Elsevier Science Publishers, Amsterdam; 599-608 (1988).
  • [27] H. Bozdogan, On the information-based measure of covariance complexity and its application to the evaluation of multivariate linear models. Communications in Statistics, Theory and Methods, 19, 221-278 (1990).
  • [28] H. Bozdogan, and D.M.A. Haughton, Informational complexity criteria for regression models. Computational Statistics and Data Analysis, 28, 51-76 (1998).
  • [29] H. Bozdogan, and P.M. Bearse, Subset selection in vector autoregressive models using the genetic algorithm with informational complexity as the fitness function. Systems Analysis, Modeling, Simulation (SAMS) (1998).
  • [30] H. Bozdogan, and M. Ueno, A unified approach to information-theoretic and Bayesian model selection criteria. Invited paper presented in the Technical Session Track C on: Information Theoretic Methods and Bayesian Modeling at the 6th World Meeting of the International Society for Bayesian Analysis (ISBA), May 28-June 1, 2000, Hersonissos-Heraklion, Crete, (2000).
  • [31] P.M. Bentler, Covariance structure analysis: statistical practice, theory, and directions, Annu. Rev. Psychol. 47, 563-592 (1996).
  • [32] K.G. Jöreskog, and D. Sörbom, LISREL 7, A guide to the program and applications, 2nd Edition, SPSS Inc., (1989).
  • [33] R. Fletcher, and M.J.D. Powell, A rapidly convergent descent method for minimization, Computer Journal, 6, 163-168 (1963).
  • [34] R. Fletcher, and C.M. Reeves, Function minimization by conjugate gradients, Computer Journal, 7, 149-154 (1964).
  • [35] P.M. Bentler, EQS structural equations program manual, Encino, CA: Multivariate software, 1995.
  • [36] S.Y. Lee, and R.I. Jennrich, A study of algorithms for covariance structure analysis with specific comparisons using factor analysis, Psychometrika, 44, 1, 99-113(1979).
  • [37] M. Jamshidian, and P.M. Bentler, ML estimation of mean and covariance structures with missing data using complete data routines, Journal of Educational and Behavioral Statistics, 24, 1, 21-41 (1999).
  • [38] H. Bozdogan, A new information theoretic measure of complexity index for model evaluation in general structural equation models with latent variables, invited paper presented at the Symposium on Model Selection in Covariance Structures at the Joint Meeting of Psychometric Society & the Classification Society, June 13-16, Rutgers University, Newark, NJ. (1991).
  • [39] J.L. Williams, H. Bozdogan, H., and L. Aiman-Smith, Inference problems with equivalent models, in Advanced Structural Equation Modeling Issues and Techniques, Eds. Marcoulides, A.G. & Schumaker, R.E., Lawrence Erlbaum Associates, New Jersey, pp. 279-314, 1995.
  • [40] J.R. Magnus, Linear Structures, Charles Griffin & Company, London and Oxford University Press, New York, 1988.
  • [41] J.R. Magnus, and H. Neudecker, Matrix Differential Calculus with Applications in Statistics and Econometrics, John Wiley & Sons, Chichester/New York, 1999.
  • [42] S.B. Cho, Decomposing individual and group differences of categorical variables with genetic factor model, Unpushed Masters Thesis, the University of Missouri-Columbia, 2007.
  • [43] W.J. Heiser, and J.J. Meulman, Homogeneity Analysis: Exploring the distribution of variables and their nonlinear relationships. In M. Greenacre & J. Blasius (Eds.), Correspondence analysis in the social sciences: Recent developments and applications, pages pp. 179–209 (1994).
  • [44] J.J. Meulman, Homogeneity analysis of incomplete data. Leiden, The Netherlands: DSWO Press, 1982.
  • [45] J.J. Meulman, The integration of multidimensional scaling and multivariate analysis with optimal transformations of the variables. Psychometrika, 57:539–565 (1992).
  • [46] J.J. Meulman, Nonlinear principal coordinates analysis: minimizing the sum of squares of the smallest eigenvalues. British Journal of Mathematical and Statistical Psychology, 46:287–300 (1993).
  • [47] J.J. Meulman, Fitting a distance model to homogeneous subsets of variables: Points of view analysis of categorical data. Journal of Classification, 13:249–266 (1996).
  • [48] J.J. Meulman, Optimal scaling methods for multivariate categorical data analysis. SPSS White Paper, 1998.
  • [49] J.J. Meulman, Prediction and Classification in NonLinear Data Analysis: Something Old, Something New, Something Borrowed, Something Blue. Physchometrica, Vol 68 (No 4):493–517 (2003).
  • [50] J.J. Meulman, and A. J. V. der Kooij, Transformations towards independence through optimal scaling. Paper presented at the International Conference on Measurement and Multivariate Analysis (ICMMA), Banff, Canada (2000).
  • [51] J.J. Meulman, A.J.V. der Kooji, and W. J. Heiser, Principal components analysis with nonlinear optimal scaling transformations for ordinal and nominal data. In D.Kaplan (Ed.), Handbook of qunatitative methodology for the social sciences, pp. 49-70, 2004.
  • [52] G. Michailidis, and J. de Leeuw, The Gifi System of Descriptive Multivariate Analysis. Statistical Science, 1998, Vol 13 (No. 4):307–336 (1996).
  • [53] S. Katragada, Multivariate Mixed-Data Mining with Gifi System using Genetic Algorithm and Information Complexity. Ph.D. Thesis under the Supervision of Prof. Bozdogan, Department of Statistics, The University of Tennessee, Knoxville, TN, 2008.
  • [54] S. Katragada, and H. Bozdogan, Multivariate Mixed-Data Mining in High-Dimensions with Gifi System using Genetic Algorithm and Information Complexity. A forthcoming book to appear, 2011.
  • [55] U. Paquet, Bayesian Inference for Latent Variable Models, Ph.D. thesis, Wolfson College, University of Cambridge, Cambridge, U.K., March, 2007.
  • [56] M.H. van Emden, An Analysis of Complexity. Mathematical Centre Tracts, Amsterdam, 35, (1971).
  • 57] H. Cramér, Mathematical Methods of Statistics. Princeton University Press, Princeton, NJ, 1946.
  • [58] C.R. Rao, Information and accuracy attainable in the estimation of statistical parameters. Bull. Calcutta Math Soc., 37, 81 (1945).
  • [59] C.R. Rao, Minimum variance and the estimation of several parameters. Proc. Cam. Phil. Soc., 43, 280 (1947).
  • [60] C.R. Rao, Sufficient statistics and minimum variance estimates. Proc. Cam. Phil. Soc., 45, 213 (1948).
  • [61] J.Rissanen, Modeling by shortest data description. Automatica, 14, 465-471 (1978).
  • [62] G.Schwarz, Estimating the dimension of a model. Ann. Statist., 6, 461-464 (1978).
  • [63] D.S. Poskitt, Precision, Complexity and Bayesian model determination. J. Roy. Statist. Soc. 49, 199-208 (1987).
  • [64] K. Takeuchi, Distribution of information statistics and a criterion of model fitting. Suri-Kagaku (Mathematical Sciences), 153, 12-18 (1976).
  • [65] J.R.M. Hosking, Lagrange-multiplier tests of time-series models. Journal of the Royal Statistics Society, (Series B), 42, 170-181 (1980).
  • [66] R. Shibata, Statistical aspects of model selection. In J.C. Willems (Ed.), From the data to modeling, Berlin: Springer-Verlag, pp. 216-240, 1989.
  • [67] M. Power, M. Bullingen, and A. Hazper, The World Health Organization WHOQOL-100: Tests of universality of quality of life in 15 different cultural groups worldwide. Health Psychology, 18, 495-505 (1999).
  • [68] K.V. Mardia, Measures of multivariate skewness and kurtosis with applications. Biometrika, 57, 519-530 (1970).
  • [69] K.V. Mardia, Applications of some measures of multivariate skewness and kurtosis in testing normality and robustness studies. Sankhya, Ser B 36(2), 115-128(1974).
  • [70] K.V. Mardia, Tests of univariate and multivariate normality. In: P.R. Krishnaiah (ed.), Handbook of Statistics, Vol 1, 279-320, North Holland, 1980.
  • [71] E. Deniz, J.A. Howe, and Bozdogan, Robustifying Structural Equation Modeling Against Nonnormality with Adaptive Kernels, Paper under review in Multivariate Behavioral Research, 2010.
APA HOWE E, BOZDOGAN H, KATRAGADDA S (2011). Structural equation modeling (SEM) of categorical and mixed-data using the novel Gifi transformations and information complexity (ICOMP) criterion. , 86 - 123.
Chicago HOWE Eylem Deniz,BOZDOGAN Hamparsum,KATRAGADDA Suman Structural equation modeling (SEM) of categorical and mixed-data using the novel Gifi transformations and information complexity (ICOMP) criterion. (2011): 86 - 123.
MLA HOWE Eylem Deniz,BOZDOGAN Hamparsum,KATRAGADDA Suman Structural equation modeling (SEM) of categorical and mixed-data using the novel Gifi transformations and information complexity (ICOMP) criterion. , 2011, ss.86 - 123.
AMA HOWE E,BOZDOGAN H,KATRAGADDA S Structural equation modeling (SEM) of categorical and mixed-data using the novel Gifi transformations and information complexity (ICOMP) criterion. . 2011; 86 - 123.
Vancouver HOWE E,BOZDOGAN H,KATRAGADDA S Structural equation modeling (SEM) of categorical and mixed-data using the novel Gifi transformations and information complexity (ICOMP) criterion. . 2011; 86 - 123.
IEEE HOWE E,BOZDOGAN H,KATRAGADDA S "Structural equation modeling (SEM) of categorical and mixed-data using the novel Gifi transformations and information complexity (ICOMP) criterion." , ss.86 - 123, 2011.
ISNAD HOWE, Eylem Deniz vd. "Structural equation modeling (SEM) of categorical and mixed-data using the novel Gifi transformations and information complexity (ICOMP) criterion". (2011), 86-123.
APA HOWE E, BOZDOGAN H, KATRAGADDA S (2011). Structural equation modeling (SEM) of categorical and mixed-data using the novel Gifi transformations and information complexity (ICOMP) criterion. İstanbul Üniversitesi İşletme Fakültesi Dergisi, 40(1), 86 - 123.
Chicago HOWE Eylem Deniz,BOZDOGAN Hamparsum,KATRAGADDA Suman Structural equation modeling (SEM) of categorical and mixed-data using the novel Gifi transformations and information complexity (ICOMP) criterion. İstanbul Üniversitesi İşletme Fakültesi Dergisi 40, no.1 (2011): 86 - 123.
MLA HOWE Eylem Deniz,BOZDOGAN Hamparsum,KATRAGADDA Suman Structural equation modeling (SEM) of categorical and mixed-data using the novel Gifi transformations and information complexity (ICOMP) criterion. İstanbul Üniversitesi İşletme Fakültesi Dergisi, vol.40, no.1, 2011, ss.86 - 123.
AMA HOWE E,BOZDOGAN H,KATRAGADDA S Structural equation modeling (SEM) of categorical and mixed-data using the novel Gifi transformations and information complexity (ICOMP) criterion. İstanbul Üniversitesi İşletme Fakültesi Dergisi. 2011; 40(1): 86 - 123.
Vancouver HOWE E,BOZDOGAN H,KATRAGADDA S Structural equation modeling (SEM) of categorical and mixed-data using the novel Gifi transformations and information complexity (ICOMP) criterion. İstanbul Üniversitesi İşletme Fakültesi Dergisi. 2011; 40(1): 86 - 123.
IEEE HOWE E,BOZDOGAN H,KATRAGADDA S "Structural equation modeling (SEM) of categorical and mixed-data using the novel Gifi transformations and information complexity (ICOMP) criterion." İstanbul Üniversitesi İşletme Fakültesi Dergisi, 40, ss.86 - 123, 2011.
ISNAD HOWE, Eylem Deniz vd. "Structural equation modeling (SEM) of categorical and mixed-data using the novel Gifi transformations and information complexity (ICOMP) criterion". İstanbul Üniversitesi İşletme Fakültesi Dergisi 40/1 (2011), 86-123.