Yıl: 2020 Cilt: 8 Sayı: 2 Sayfa Aralığı: 336 - 343 Metin Dili: İngilizce DOI: 10.20290/estubtdb.747821 İndeks Tarihi: 08-01-2021

DEEP LEARNING-BASED APPROACH FOR MISSING DATA IMPUTATION

Öz:
The missing values in the datasets are a problem that will decrease the machine learning performance. New methods arerecommended every day to overcome this problem. The methods of statistical, machine learning, evolutionary and deeplearning are among these methods. Although deep learning methods is one of the popular subjects of today, there are limitedstudies in the missing data imputation. Several deep learning techniques have been used to handling missing data, one of themis the autoencoder and its denoising and stacked variants. In this study, the missing value in three different real-world datasetswas estimated by using denoising autoencoder (DAE), k-nearest neighbor (kNN) and multivariate imputation by chainedequations (MICE) methods. The estimation success of the methods was compared according to the root mean square error(RMSE) criterion. It was observed that the DAE method was more successful than other statistical methods in estimating themissing values for large datasets.
Anahtar Kelime:

Belge Türü: Makale Makale Türü: Araştırma Makalesi Erişim Türü: Erişime Açık
  • [1] Şeker A, Diri B, Balık HH. Derin Öğrenme Yöntemleri ve Uygulamaları Hakkında Bir İnceleme. Gazi Mühendislik Bilimleri Dergisi 2017; 3:47-64.
  • [2] Ballard DH. Modular Learning in Neural Networks. In: AAAI, 1987; pp 279-284.
  • [3] Qiu YL, Zheng H, Gavaert O. A deep learning framework for imputing missing values in genomic data. bioRxiv:406066 2018.
  • [4] Ahmed H, Wong M, Nandi A. Intelligent condition monitoring method for bearing faults from highly compressed measurements using sparse over-complete. features Mechanical Systems and Signal Processing 2018; 99:459-477.
  • [5] Ishii T, Komiyama H, Shinozaki T, Horiuchi Y, Kuroiwa S. Reverberant speech recognition based on denoising autoencoder. In: Interspeech 2013; pp 3512-3516.
  • [6] Del Testa D, Rossi M. Lightweight lossy compression of biometric patterns via denoising autoencoders. IEEE Signal Processing Letters 2015; 22:2304-2308.
  • [7] Tan CC, Eswaran C. Using autoencoders for mammogram compression. Journal of medical systems 2011; 35:49-58.
  • [8] Sakurada M, Yairi T. Anomaly detection using autoencoders with nonlinear dimensionality reduction. In: Proceedings of the MLSDA 2014 2nd Workshop on Machine Learning for Sensory Data Analysis 2014; p 4.
  • [9] Chen J, Sathe S, Aggarwal C, Turaga D. Outlier detection with autoencoder ensembles. In: Proceedings of the 2017 SIAM International Conference on Data Mining 2017; pp 90-98.
  • [10] Hinton GE, Salakhutdinov RR. Reducing the dimensionality of data with neural networks science 313:504-507.
  • [11] Lu X, Tsao Y, Matsuda S, Hori C. Speech enhancement based on deep denoising autoencoder. In: Interspeech 2013; pp 436-440.
  • [12] Vincent P, Larochelle H, Bengio Y, Manzagol P-A. Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th international conference on Machine learning 2008; pp 1096-1103.
  • [13] García-Laencina PJ, Sancho-Gómez J-L, Figueiras-Vidal AR. Pattern classification with missing data: a review. Neural Computing and Applications 2010; 19:263-282.
  • [14] Duan Y, Lv Y, Kang W, Zhao Y. A deep learning based approach for traffic data imputation. In: Intelligent Transportation Systems (ITSC), 2014 IEEE 17th International Conference on 2014; IEEE, pp 912-917.
  • [15] Duan Y, Lv Y, Liu Y-L, Wang F-Y. An efficient realization of deep learning for traffic data imputation. Transportation research part C: emerging technologies 2016; 72:168-181.
  • [16] Gondara L, Wang K. Recovering loss to followup information using denoising autoencoders. In: 2017 IEEE International Conference on Big Data (Big Data) 2017; pp 1936-1945.
  • [17] Gondara L, Wang K Mida. Multiple imputation using denoising autoencoders. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining 2018; pp 260-272.
  • [18] Beaulieu-Jones BK, Moore JH. Missing data imputation in the electronic health record using deeply learned autoencoders. In: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017.;World Scientific, pp 207-218.
  • [19] Zhao L, Chen Z, Yang Z, Hu Y, Obaidat MS. Local similarity imputation based on fast clustering for incomplete data in cyber-physical systems. IEEE Systems Journal 2018; 12:1610-1620.
  • [20] Shao M, Ding Z, Fu Y. Sparse low-rank fusion based deep features for missing modality face recognition. In: Automatic Face and Gesture Recognition (FG), 2015 11th IEEE International Conference and Workshops on 2015; pp 1-6.
  • [21] Tran L, Liu X, Zhou J, Jin R. Missing Modalities Imputation via Cascaded Residual Autoencoder. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2017; pp 1405-1414.
  • [22] Malek S, Melgani F, Bazi Y, Alajlan N. Reconstructing Cloud-Contaminated Multispectral Images With Contextualized Autoencoder Neural Networks IEEE Transactions on Geoscience and Remote Sensing 2018; 56:2270-2282.
  • [23] Ning X, Xu Y, Gao X, Li Y. Missing data of quality inspection imputation algorithm base on stacked denoising autoencoder. In: Big Data Analysis (ICBDA), IEEE 2nd International Conference on 2017 IEEE 2017; pp 84-88.
  • [24] Leisch F, Dimitriadou E. Machine Learning Benchmark Problems. R Package, mlbench, 2010.
  • [25] Vincent P, Larochelle H, Bengio Y, Manzagol PA. Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th international conference on Machine learning 2008; pp 1096-1103.
  • [26] Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol PA, Bottou L. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. Journal of Machine Learning Research 2010; 11: 3371–3408.
  • [27] Gondara L, Wang K. Mida: Multiple imputation using denoising autoencoders. In Pacific-Asia Conference on Knowledge Discovery and Data Mining 2018; pp 260-272, Springer, Cham.
  • [28] Batista GE, Monard MC. An analysis of four missing data treatment methods for supervised learning. Applied artificial intelligence 2003; 17(5-6): 519-533.
  • [29] Hron K, Templ M, Filzmoser P. Imputation of missing values for compositional data using classical and robust methods. Computational Statistics & Data Analysis 2010; 54(12): 3095-3107.
  • [30] Templ M, Alfons A, Kowarik A, Prantner B. VIM: Visualization and Imputation of Missing Values. R package version 4.6.0, 2016, URL https://CRAN.R-project.org/package= VIM.
  • [31] White IR, Royston P, Wood AM. Multiple imputation using chained equations: issues and guidance for practice. Statistics in medicine 2011; 30(4): 377-399.
  • [32] Buuren SV, Groothuis-Oudshoorn K. mice: Multivariate imputation by chained equations in R. Journal of statistical software 2010; pp 1-68.
APA Cihan P (2020). DEEP LEARNING-BASED APPROACH FOR MISSING DATA IMPUTATION. , 336 - 343. 10.20290/estubtdb.747821
Chicago Cihan Pınar DEEP LEARNING-BASED APPROACH FOR MISSING DATA IMPUTATION. (2020): 336 - 343. 10.20290/estubtdb.747821
MLA Cihan Pınar DEEP LEARNING-BASED APPROACH FOR MISSING DATA IMPUTATION. , 2020, ss.336 - 343. 10.20290/estubtdb.747821
AMA Cihan P DEEP LEARNING-BASED APPROACH FOR MISSING DATA IMPUTATION. . 2020; 336 - 343. 10.20290/estubtdb.747821
Vancouver Cihan P DEEP LEARNING-BASED APPROACH FOR MISSING DATA IMPUTATION. . 2020; 336 - 343. 10.20290/estubtdb.747821
IEEE Cihan P "DEEP LEARNING-BASED APPROACH FOR MISSING DATA IMPUTATION." , ss.336 - 343, 2020. 10.20290/estubtdb.747821
ISNAD Cihan, Pınar. "DEEP LEARNING-BASED APPROACH FOR MISSING DATA IMPUTATION". (2020), 336-343. https://doi.org/10.20290/estubtdb.747821
APA Cihan P (2020). DEEP LEARNING-BASED APPROACH FOR MISSING DATA IMPUTATION. Eskişehir Teknik Üniversitesi Bilim ve Teknoloji Dergisi b- Teorik Bilimler, 8(2), 336 - 343. 10.20290/estubtdb.747821
Chicago Cihan Pınar DEEP LEARNING-BASED APPROACH FOR MISSING DATA IMPUTATION. Eskişehir Teknik Üniversitesi Bilim ve Teknoloji Dergisi b- Teorik Bilimler 8, no.2 (2020): 336 - 343. 10.20290/estubtdb.747821
MLA Cihan Pınar DEEP LEARNING-BASED APPROACH FOR MISSING DATA IMPUTATION. Eskişehir Teknik Üniversitesi Bilim ve Teknoloji Dergisi b- Teorik Bilimler, vol.8, no.2, 2020, ss.336 - 343. 10.20290/estubtdb.747821
AMA Cihan P DEEP LEARNING-BASED APPROACH FOR MISSING DATA IMPUTATION. Eskişehir Teknik Üniversitesi Bilim ve Teknoloji Dergisi b- Teorik Bilimler. 2020; 8(2): 336 - 343. 10.20290/estubtdb.747821
Vancouver Cihan P DEEP LEARNING-BASED APPROACH FOR MISSING DATA IMPUTATION. Eskişehir Teknik Üniversitesi Bilim ve Teknoloji Dergisi b- Teorik Bilimler. 2020; 8(2): 336 - 343. 10.20290/estubtdb.747821
IEEE Cihan P "DEEP LEARNING-BASED APPROACH FOR MISSING DATA IMPUTATION." Eskişehir Teknik Üniversitesi Bilim ve Teknoloji Dergisi b- Teorik Bilimler, 8, ss.336 - 343, 2020. 10.20290/estubtdb.747821
ISNAD Cihan, Pınar. "DEEP LEARNING-BASED APPROACH FOR MISSING DATA IMPUTATION". Eskişehir Teknik Üniversitesi Bilim ve Teknoloji Dergisi b- Teorik Bilimler 8/2 (2020), 336-343. https://doi.org/10.20290/estubtdb.747821