AN EXPERIMENTAL COMPARISON OF TRADITIONAL AND MACHINE LEARNING METHODS PREDICTION PERFORMANCES: A STUDY ON HEALTH OUTCOMES

Cinaroglu, Songul

AN EXPERIMENTAL COMPARISON OF TRADITIONAL AND MACHINE LEARNING METHODS PREDICTION PERFORMANCES: A STUDY ON HEALTH OUTCOMES

Songül ÇINAROĞLU (Hacettepe Üniversitesi, İİBF, Sağlık Yönetimi Bölümü, Ankara, Türkiye)

Hacettepe Sağlık İdaresi Dergisi

2 0

Yıl: 2020 Cilt: 23 Sayı: 1 Sayfa Aralığı: 23 - 40 Metin Dili: İngilizce İndeks Tarihi: 20-10-2020

AN EXPERIMENTAL COMPARISON OF TRADITIONAL AND MACHINE LEARNING METHODS PREDICTION PERFORMANCES: A STUDY ON HEALTH OUTCOMES

Öz:

Machine learning techniques can identify the non-linear patterns in a dataset and can uncover hiddenrelationships. Random forest is one of the modern machine learning techniques that provides an alternative totraditional classification methods such as logistic regression. In this study it is aimed to compare the predictionperformance of logistic regression with that of random forest and to identify the predicting factors of publichealth outcomes at a provincial level. The data representing 81 provinces of Turkey are taken from the TurkishStatistical Institute for the year 2013. Life expectancy at birth and mortality are chosen as the public healthoutcomes. Three different random forest models are constructed by determining the number of trees: 50, 100,and 150. The prediction results of different methods are recorded by changing the “k” parameter from 3 to 20 ink-fold cross validation. The Area Under the ROC Curve (AUC), sensitivity, and specificity are considered asperformance measures. The study results reveal that the differences between the prediction model performancesto predict health outcomes are statistically significant (p<0.000). Moreover, logistic regression outperformedrandom forest models. The decision tree graphs show that the most important predictor variables for mortalityare the total number of beds and for life expectancy at birth, the percentage of higher education graduates. Inthe light of this study, it is highly recommended for health professionals to be more aware about increasingpotential of modern prediction methods in health services research.

Anahtar Kelime:

GELENEKSEL VE MAKİNE ÖĞRENMESİ YÖNTEMLERİNİN TAHMİN PERFORMANSLARININ DENEYSEL KARŞILAŞTIRMASI: SAĞLIK SONUÇLARI ÜZERİNE BİR ÇALIŞMA

Öz:

Makine öğrenmesi teknikleri veri setinde doğrusal olmayan desenleri ve gizli ilişkileri tanımlayabilmektedir. Rastgele orman, modern makine öğrenmesi tekniklerinden birisi olarak lojistik regresyon gibi geleneksel sınıflama yöntemlerine alternatif oluşturmaktadır. Bu çalışmada il düzeyinde halk sağlığı sonuç göstergelerini tahmin etmek üzere lojistik regresyon ve rastgele orman tahmin performanslarının karşılaştırılması amaçlanmıştır. Veriler Türkiye genelinde 81 ili temsil etmek üzere 2013 yılı için Türkiye İstatistik Kurumu’ndan temin edilmiştir. Sağlık sonuç göstergesi olarak doğuşta beklenen yaşam süresi ve mortalite seçilmiştir. Ağaç sayısının 50, 100 ve 150 olarak belirlendiği üç farklı rastgele orman modeli oluşturulmuştur. Tahmin yöntemlerinin karşılaştırılmasında “k” parametresinin 3 ile 20 arasında belirlendiği k-kat çapraz geçerlilik yöntemi kullanılmıştır. Performans ölçüsü olarak ROC Eğrisi altında kalan alan, duyarlılık ve seçicilik kullanılmıştır. Çalışma sonuçları sağlık sonuçlarının tahmininde tahmin modeli performanslarının istatistiksel olarak farklı olduğunu ortaya koymaktadır (p<0,000). Ayrıca, lojistik regresyon yöntemi rastgele orman modellerine göre daha iyi performans sergilemektedir. Karar ağacı grafiği mortalitenin tahmininde en önemli değişkenin toplam yatak sayısı, doğuşta yaşam beklentisinin tahmininde yüksek öğrenim mezun yüzdesi olduğunu göstermektedir. Çalışma sonucunda sağlık profesyonellerine sağlık ile ilgili araştırmalarda modern tahmin yöntemlerinin artan potansiyeli konusundaki farkındalıklarını yükseltmeleri tavsiye edilmektedir.

Anahtar Kelime:

Belge Türü: Makale Makale Türü: Araştırma Makalesi Erişim Türü: Erişime Açık

Acemoglu, D., & Johnson, S. (2006). Disease and development: the effect of life expectancy on economic growth. NBER Working Paper Series, Working Paper, No. 12269. http://www.nber.org/papers/w12269. (25.11.2017).
Atun, R. (2015). Transforming turkey’s health system-lessons for universal coverage. The New England Journal of Medicine, 373(14), 1285-1289.
Baser, O., Burkan, A., Baser, E., Koselerli, R., Ertugay, E., & Altinbas. A. (2013). High cost patients for cardiac surgery and hospital quality in turkey. Health Policy, 109(2), 143-149.
Begueria, S., & Lorente, A. (2002). Landslide hazard mapping by multivariate statistics: comparison of methods and case study in the Spanish pyrenees. Technical report, Instituto Pirenaico de Ecologia, Zaragoza, Spain.
Berkman, N. D., Sheridan, S. L., Donahue, K. E., Halpern, D. J., & Crotty, K. (2011). Low health literacy and health outcomes: an updated systematic review. Annals of Internal Medicine, 155(2): 97- 107.
Breiman, L. (2001). Statistical modeling: the two cultures. Statistical Science, 16(3): 199-231.
Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and Regression Trees, Chapman and Hall/CRC, Taylor and Francis Group, Boca Raton.
Camdeviren, H., Yazici, A.C., Akkus, Z., Bugdayci, R., & Sungur, M. A. (2007). Comparison of logistic regression model and classification tree: an application to postpartum depression data. Expert Systems with Applications, 32(4), 987-994.
Celik, Y., & Hotchkiss, D. R. (2000). The socio-economic determinants of maternal health care utilization in Turkey. Social Science and Medicine, 50(12), 1797-1806.
Cilingiroglu, N., & Yardim, M. S. (2014). Approaching socioeconomic inequalities in Turkey by using self-assessed health. European Journal of Public Health, 24(2), 25-26.
Couronne, R., Probst, P., & Boulesteix, A. L. (2017). Random forest versus logistic regression: a large scale benchmark experiment. Technical Report Number 205, University of Munich, Department of Statistics, http://www.stat.uni-muenchen.de, (29.5.2018).
Crémieux, P. Y., Ouellette, P., & Pilon, C. (1999). Health care spending as determinants of health outcomes. Health Economics, 8(7), 627-639.
Crisp, B. R., Swerissen, H., & Duckett, S. J. (2000). Four approaches to capacity building in health: consequences for measurement and accountability. Health Promotion International, 15(2), 99-107.
Fenton, J. J., Jerant, A. F., Bertakis, K. D., & Franks, P. (2012). The cost of satisfaction a national study of patient satisfaction, health care utilization, expenditures and mortality. Archives of Internal Medicine, 172(5), 405-411.
Gani, A. (2009). Health care financing and health outcomes in pacific island countries. Health Policy and Planning, 24(1), 72-81.
Gilligan, A. M., & Skrepnek, G. H. (2015). Determinants of life expectancy in the eastern mediterranean region. Health Policy and Planning, 30(5), 624-637.
Grömping, U. (2009). Variable importance assessment in regression: linear regression versus random forest. The American Statistician, 63(4), 308-319.
Halicioglu, F. (2011). Modeling life expectancy in Turkey. Economic Modelling, 28(5), 2075-2082.
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning. Data Mining, Inference and Prediction. (2nd ed.). Springer.
Hegelich, S. (2016). Decision trees and random forests: machine learning techniques to classify rare events. European Policy Analysis, 2(1), 98-120.
Hitiris, T., & Posnett, J. (1992). The determinants and effects of health expenditure in developed countries. Journal of Health Economics, 11(2), 173-181.
Khalilia, M., Chakraborty, S., & Popescu, M. (2011). Predicting disease risks from highly imbalanced data using random forest. BMC Medical Informatics and Decision Making, 11(51), 1-13.
Kilic, B., Kalaca, S., Unal, B., Phillimore, P., & Zaman, S. (2015). Health policy analysis for prevention and control of cardiovascular diseases in diabetes mellitus in Turkey. International Journal of Public Health, 60(1), 47-53.
Kurt, I., Ture, M., & Kurum, T. A. (2008). Comparing performances of logistic regression, classification and regression tree, and neural networks for predicting coronary artery disease. Expert Systems with Applications, 34(1), 366-374.
Kyriopoulos, I. I., Zavras, D., Skroumpelos, A., Mylona, K., Athanasakis, K., & Kyriopoulos, J. (2014). Barriers in access to healthcare services for chronic patients in time of austerity: an empirical approach in Greece. International Journal for Equity in Health, 13(54), 1-7.
Lee, R. (2019). Mortality forecasts and linear life expectancy trends. In: Bengtsson T., Keilman N. (eds) Old and New Perspectives on Mortality Forecasting. Demographic Research Monographs (A Series of the Max Planck Institute for Demographic Research). Springer, Cham.
Lehr, S., Liu, H., Klinglesmit, S., Konyha, A., Robaszewska, N., & Medinilla, J. (2016). Use educational data mining to predict undergraduate retention. IEEE 16th International Conference on Advanced Learning Technologies, Austin, TX, USA. https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=andarnumber=7757015, (28.5.2018).
Li, X. B., Sweigart, J., Teng, J., Donohue, J., & Thombs, L. (2001). A dynamic programming based pruning method for decision trees. INFORMS Journal on Computing, 13(4), 332-344.
Liaw, A., & Wiener, M. (2002). Classification and regression by randomforest. R News, 2/3, S18- S22.
Lichtenberg, F. R., Tatar, M., & Caliskan, Z. (2014). The effect of pharmaceutical innovation on longevity, hospitalization and medical expenditure in Turkey. Health Policy, 117(3), 361-373.
Maroco, J., Silva, D., Rodrigues, A., Guerreiro, M., Santana, I., & Mendonça, A. (2011). Data mining methods in the prediction of dementia: a real-data comparison of the accuracy, sensitivity and specificity of linear discriminant analysis, logistic regression, neural networks, support vector machines, classification trees and random forests. BMC Research Notes, 4(299), 1-14.
Muchlinksi, D., Siroky, D., Jingrui, H., & Kocher, M. (2016). Comparing random forest with logistic regression for predicting class-imbalanced civil war onset data. Political Analysis, 24(1), 87- 103.
Ng, A. Y., & Jordan, M. I. (2002). On discriminative vs. Generative classifiers: a comparison of logistic regression and naive bayes, Advances in Neural Information Processing Systems 14 (NIPS 2001), Vancouver, British Columbia, Canada. https://ai.stanford.edu/~ang/papers/nips01- discriminativegenerative.pdf, (28.5.2018).
Omran, A. R., & Roudi, F. (1993). The middle east population puzzle. Population Bulletin, 48(1), 1-40. https://www.ncbi.nlm.nih.gov/pubmed/12318382.
Organization for Economic Cooperation and Development. (OECD) (2016). Better life index. http://www.oecdbetterlifeindex.org/countries/turkey/. (07.06.2016).
Pereira, C., Murphy K., & Herndon, D. (2004). Outcome measures in burn care. Is mortality dead? Burns, 30(8), 761-771.
Republic of Turkey Ministry of Health (MOH). (2017). Health statistics year book-2017. https://dosyasb.saglik.gov.tr/Eklenti/30148,ingilizcesiydijiv1pdf.pdf?0. (10.9.2019).
Rosset, S., Perlich, C., Swirszcz, G., Melville, P., & Liu, Y. (2010). Medical data mining: insights from winning two competitions, Data Mining and Knowledge Discovery, 20(3), 439-468.
Samant, P., & Agarwal, R. (2018). Machine learning techniques for medical diagnosis for diabetes using iris images. Computer Methods and Programs in Biomedicine, 157, 121-128.
Siroky, D. S. (2009). Navigating random forests and related advances in algorithmic modeling. Statistics Surveys, 3, 147-163.
Sozmen, K., Unal, B., Capewell, S., Critchley, J., & O'Flaherty, M. (2015). Estimating diabetes prevalence in turkey in 2025 with and without possible interventions to reduce obesity and smoking prevalence, using a modelling approach. International Journal of Public Health, 60(1), 13-21.
Sut, N., & Simsek, O. (2011). Comparison of regression tree data mining methods for prediction of mortality in head injury. Expert Systems with Applications, 38(12), 15534-15539.
Trigila, A., Iadanza, C., Esposito, C., & Scarascia-Mugnozza, G. (2015). Comparison of logistic regression and random forest techniques for shallow landslide susceptibility assessment in giampilieri (NE Sicily, Italy). Geomorphology, 249(15), 119-136.
Turkish Statistical Institute (TurkStat). http://www.turkstat.gov.tr/UstMenu.do?metod=istgosterge. (07.06.2018).
Van den Eeckhaut, V. D., Vanwalleghem, M. T., Poesen, J., Govers, G., Verstraeten, G., &
Vandekerckhove, L. (2006). Prediction of landslide susceptibility using rare events logistic regression: a case-study in the flemish ardennes (Belgium). Geomorphology, 76(3-4), 392–410.
Wagstaff, A. (2000). Research on equity, poverty and health outcomes: lessons for the developing world, HNP Discussion Paper, 28908. http://siteresources.worldbank.org/HEALTHNUTRITIONANDPOPULATION/Resources/281627- 1095698140167/Wagstaff-ResearchOn-whole.pdf. (10.10.2017).
Wright, J., & Walley, J. (1998). Assessing health needs in developing countries. British Medical Journal, 316(7147), 1819-1823.
Zhanga, G., Hu, M. Y., Patuwob, B. E., & Indrob, D. C. (1999). Artificial neural networks in bankruptcy prediction: general framework and cross-validation analysis, European Journal of Operational Research, 116(1), 16-32.
Zhao, L., Chen, Y., & Schaffner, D. W. (2001). Comparison of logistic regression and linear regression in modeling percentage data. Applied and Environmental Microbiology, 67(5), 2129-2135.

APA	Cinaroglu S (2020). AN EXPERIMENTAL COMPARISON OF TRADITIONAL AND MACHINE LEARNING METHODS PREDICTION PERFORMANCES: A STUDY ON HEALTH OUTCOMES. , 23 - 40.
Chicago	Cinaroglu Songul AN EXPERIMENTAL COMPARISON OF TRADITIONAL AND MACHINE LEARNING METHODS PREDICTION PERFORMANCES: A STUDY ON HEALTH OUTCOMES. (2020): 23 - 40.
MLA	Cinaroglu Songul AN EXPERIMENTAL COMPARISON OF TRADITIONAL AND MACHINE LEARNING METHODS PREDICTION PERFORMANCES: A STUDY ON HEALTH OUTCOMES. , 2020, ss.23 - 40.
AMA	Cinaroglu S AN EXPERIMENTAL COMPARISON OF TRADITIONAL AND MACHINE LEARNING METHODS PREDICTION PERFORMANCES: A STUDY ON HEALTH OUTCOMES. . 2020; 23 - 40.
Vancouver	Cinaroglu S AN EXPERIMENTAL COMPARISON OF TRADITIONAL AND MACHINE LEARNING METHODS PREDICTION PERFORMANCES: A STUDY ON HEALTH OUTCOMES. . 2020; 23 - 40.
IEEE	Cinaroglu S "AN EXPERIMENTAL COMPARISON OF TRADITIONAL AND MACHINE LEARNING METHODS PREDICTION PERFORMANCES: A STUDY ON HEALTH OUTCOMES." , ss.23 - 40, 2020.
ISNAD	Cinaroglu, Songul. "AN EXPERIMENTAL COMPARISON OF TRADITIONAL AND MACHINE LEARNING METHODS PREDICTION PERFORMANCES: A STUDY ON HEALTH OUTCOMES". (2020), 23-40.

APA	Cinaroglu S (2020). AN EXPERIMENTAL COMPARISON OF TRADITIONAL AND MACHINE LEARNING METHODS PREDICTION PERFORMANCES: A STUDY ON HEALTH OUTCOMES. Hacettepe Sağlık İdaresi Dergisi, 23(1), 23 - 40.
Chicago	Cinaroglu Songul AN EXPERIMENTAL COMPARISON OF TRADITIONAL AND MACHINE LEARNING METHODS PREDICTION PERFORMANCES: A STUDY ON HEALTH OUTCOMES. Hacettepe Sağlık İdaresi Dergisi 23, no.1 (2020): 23 - 40.
MLA	Cinaroglu Songul AN EXPERIMENTAL COMPARISON OF TRADITIONAL AND MACHINE LEARNING METHODS PREDICTION PERFORMANCES: A STUDY ON HEALTH OUTCOMES. Hacettepe Sağlık İdaresi Dergisi, vol.23, no.1, 2020, ss.23 - 40.
AMA	Cinaroglu S AN EXPERIMENTAL COMPARISON OF TRADITIONAL AND MACHINE LEARNING METHODS PREDICTION PERFORMANCES: A STUDY ON HEALTH OUTCOMES. Hacettepe Sağlık İdaresi Dergisi. 2020; 23(1): 23 - 40.
Vancouver	Cinaroglu S AN EXPERIMENTAL COMPARISON OF TRADITIONAL AND MACHINE LEARNING METHODS PREDICTION PERFORMANCES: A STUDY ON HEALTH OUTCOMES. Hacettepe Sağlık İdaresi Dergisi. 2020; 23(1): 23 - 40.
IEEE	Cinaroglu S "AN EXPERIMENTAL COMPARISON OF TRADITIONAL AND MACHINE LEARNING METHODS PREDICTION PERFORMANCES: A STUDY ON HEALTH OUTCOMES." Hacettepe Sağlık İdaresi Dergisi, 23, ss.23 - 40, 2020.
ISNAD	Cinaroglu, Songul. "AN EXPERIMENTAL COMPARISON OF TRADITIONAL AND MACHINE LEARNING METHODS PREDICTION PERFORMANCES: A STUDY ON HEALTH OUTCOMES". Hacettepe Sağlık İdaresi Dergisi 23/1 (2020), 23-40.