Yıl: 2019 Cilt: 0 Sayı: 0 Sayfa Aralığı: 292 - 301 Metin Dili: Türkçe DOI: 10.31590/ejosat.638096 İndeks Tarihi: 23-09-2020

ImpSlidingWindow: Kayan Pencere Tabanlı Akan Veri Özetleme Yönteminin Performansını Arttırmaya Yönelik Yeni Bir Model

Öz:
Kayan pencere tabanlı veri özetleme, akan veri kümeleme alanında son gelen verilerin daha önemli olduğu uygulamalarda sıkçakullanılan miktar tabanlı bir veri özetleme yaklaşımıdır. Bu veri özetleme yaklaşımında, her yeni veri gelişinde ön tanımlı bir değişkenolan en son gelen w tane veri özet olarak alınır ve pencere birer birer kaymaktadır. Yani model her yeni veri girişinde veri penceresindebulunan tüm verileri işler. Bu da performansı olumsuz etkilemektedir. Bu nedenle bu probleme çözüm üretecek çalışmalara ihtiyaçduyulmaktadır. Bu çalışmada sözü edilen probleme çözüm olarak ImpSlidingWindow (ISW) isimli yeni bir kayan pencere modeliönerilmektedir. Önerilen modelde her veri girişinde kümeleme modelinin çalışması yerine belirli sayıda veri biriktikçe kümelememodelinin çalışması önerilmektedir. Bu yeni model ile kayan pencere genişliği dört eşit parçaya bölünmekte ve her parçanın sonundakümeleme modelinin çalışması sağlanmaktadır. Sonuç olarak pencere genişliğinde bulunan veri sayısı kadar kümeleme modelininçalışması yerine dört defa çalışması sağlanarak performansta çok önemli bir artış sağlanmaktadır. Önerilen model akan veri kümeleme alanında önerilmiş bir algoritma olan KD-AR Stream algoritmasına uygulandığında çalışma zamanında %80’lara varan iyileştirmelerelde edilmiştir.
Anahtar Kelime:

ImpSlidingWindow: A New Model to Improve the Performance of the Sliding Window Based Streaming Data Summarization Method

Öz:
Sliding window based data summarization which is a quantity based summarization is commonly used in data stream clustering area in which the recent data is more important. In this data summarization method, w which is a predefined variable, of the most recent data is taken as the summary each time a new data arrives and the window slides one by one. This means that the model processes all the data in the data window each time a new data arrives. This approach causes the performance to reduce. Therefore, there is a need of new studies to be proposed in this area. In this study, a new sliding window model named ImpSlidingWindow (ISW) is proposed as a solution to the mentioned problem. In the proposed model, we propose that clustering model to work whenever a certain number of data accumulates instead of each data entry. With this new model, the sliding window width is divided into four equal parts and the clustering model works at the end of each part. As a result, a significant increase in the performance is achieved by enabling the clustering model to run four times instead of working as much as the number of data in the window width. When the proposed model applied to KD-AR Stream algorithm which is a proposed algorithm in the data stream clustering area, it has been found that up to 80% improvement obtained in run-time complexity.
Anahtar Kelime:

Belge Türü: Makale Makale Türü: Araştırma Makalesi Erişim Türü: Erişime Açık
  • Ackermann, M. R., Martens, M., Raupach, C., Swierkot, K., Lammersen, C. ve Sohler, C. (2012). StreamKM++: A clustering algorithm for data streams. J. Exp. Algorithmics, 17, 2.1-2.30. doi:10.1145/2133803.2184450
  • Aggarwal, C. C. (2010). Data Streams: An Overview and Scientific Applications. In M. M. Gaber (Ed.), Scientific Data Mining and Knowledge Discovery: Principles and Foundations (pp. 377-397). Berlin, Heidelberg: Springer Berlin Heidelberg.
  • Aggarwal, C. C., Han, J., Wang, J. ve Yu, P. S. (2003). A framework for clustering evolving data streams. Paper presented at the Proceedings of the 29th international conference on Very large data bases - Volume 29, Berlin, Germany.
  • Ahmed, M. (2019). Buffer-based Online Clustering for Evolving Data Stream. Information Sciences. doi:https://doi.org/10.1016/j.ins.2019.03.022
  • AlNuaimi, N., Masud, M. M., Serhani, M. A. ve Zaki, N. (2019). Streaming feature selection algorithms for big data: A survey. Applied Computing and Informatics. doi:https://doi.org/10.1016/j.aci.2019.01.001
  • Amini, A. ve Wah, T. Y. (2013). LeaDen-Stream: A Leader Density-Based Clustering Algorithm over Evolving Data Stream. Journal of Computer and Communications, 1, 26-31. doi:10.4236/jcc.2013.15005
  • Ankleshwaria, T. B. ve Dhobi, J. S. (2014). Mining Data Streams: A Survey. International Journal of Advance Research in Computer Science and Management Studies, 2(2), 379-386.
  • Antonellis, P., Makris, C. ve Tsirakis, N. (2009). Algorithms for clustering clickstream data. Information Processing Letters, 109(8), 381-385. doi:https://doi.org/10.1016/j.ipl.2008.12.011
  • Badiozamany, S., Orsborn, K. ve Risch, T. (2016). Framework for real-time clustering over sliding windows. Paper presented at the Proceedings of the 28th International Conference on Scientific and Statistical Database Management, Budapest, Hungary.
  • Cao, F., Estert, M., Qian, W. ve Zhou, A. Density-Based Clustering over an Evolving Data Stream with Noise Proceedings of the 2006 SIAM International Conference on Data Mining (pp. 328-339).
  • Chairukwattana, R., Kangkachit, T., Rakthanmanon, T. ve Waiyamai, K. (2013, 4-6 Sept. 2013). Efficient evolution-based clustering of high dimensional data streams with dimension projection. Paper presented at the 2013 International Computer Science and Engineering Conference (ICSEC).
  • Charu, C. A., Jiawei, H., Jianyong, W. ve Philip, S. Y. (2004). A framework for projected clustering of high dimensional data streams Proceedings of the Thirtieth international conference on Very large data bases - Volume 30 %@ 0-12-088469-0 (pp. 852-863). Toronto, Canada: VLDB Endowment.
  • Datar, M., Gionis, A., Indyk, P. ve Motwani, R. (2002). Maintaining stream statistics over sliding windows: (extended abstract). Paper presented at the Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms, San Francisco, California.
  • Diaz-Rozo, J., Bielza, C. ve Larrañaga, P. (2018). Clustering of Data Streams with Dynamic Gaussian Mixture Models. An IoT Application in Industrial Processes. IEEE Internet of Things Journal, 1-1. doi:10.1109/JIOT.2018.2840129
  • Gao, J., Li, J., Zhang, Z. ve Tan, P.-N. (2005). An Incremental Data Stream Clustering Algorithm Based on Dense Units Detection, Berlin, Heidelberg.
  • Görmüş, S., Aydın, H. ve Ulutaş, G. (2018). Nesnelerin interneti teknolojisi için güvenlik: Var olan mekanizmalar, protokoller ve yaşanılan zorlukların araştırılması. Gazi Üniversitesi Mühendislik-Mimarlık Fakültesi Dergisi, 33, 1247-1272.
  • Gravina, R., Alinia, P., Ghasemzadeh, H. ve Fortino, G. (2017). Multi-sensor fusion in body sensor networks: State-of-the-art and research challenges. Information Fusion, 35, 68-80. doi:https://doi.org/10.1016/j.inffus.2016.09.005
  • Guha, S., Rastogi, R. ve Shim, K. (2001). Cure: an efficient clustering algorithm for large databases. Information Systems, 26(1), 35- 58. doi:https://doi.org/10.1016/S0306-4379(01)00008-4
  • Hahsler, M. ve Bolaños, M. (2016). Clustering Data Streams Based on Shared Density between Micro-Clusters. IEEE Transactions on Knowledge and Data Engineering, 28(6), 1449-1461. doi:10.1109/TKDE.2016.2522412
  • Hendricks, D. (2017). Using real-time cluster configurations of streaming asynchronous features as online state descriptors in financial markets. Pattern Recognition Letters, 97, 21-28. doi:https://doi.org/10.1016/j.patrec.2017.06.026
  • Hyde, R., Angelov, P. ve MacKenzie, A. R. (2017). Fully online clustering of evolving data streams into arbitrarily shaped clusters. Information Sciences, 382-383, 96-114. doi:https://doi.org/10.1016/j.ins.2016.12.004
  • Ikonomovska, E., Loskovska, S. ve Gjorgjevik, D. (2007). A survey of stream data mining. Paper presented at the Eighth International Conference with International Participation – ETAI 2007, Ohrid, Republic ofMacedonia.
  • Jia, C., Tan, C. ve Yong, A. (2008, 25-26 Sept. 2008). A Grid and Density-Based Clustering Algorithm for Processing Data Stream. Paper presented at the 2008 Second International Conference on Genetic and Evolutionary Computing.
  • Kanmaz, M. ve Aydin, M. A. (2018). Kablosuz Sensör Ağlarda Konumlandırma Yöntemleri ve K-means++ Kümeleme Yöntemi ile Yeni Yaklaşım. Gazi Üniversitesi Mühendislik-Mimarlık Fakültesi Dergisi, 2018, 0-0.
  • Keim, D. A. ve Heczko, M. (2001). Wavelets and their Applications in Databases. Paper presented at the 17th International Conference on Data Engineering (ICDE'01), Heidelberg, Germany, 2001.
  • King, R. C., Villeneuve, E., White, R. J., Sherratt, R. S., Holderbaum, W. ve Harwin, W. S. (2017). Application of data fusion techniques and technologies for wearable health monitoring. Medical Engineering & Physics, 42, 1-12. doi:https://doi.org/10.1016/j.medengphy.2016.12.011
  • Kranen, P., Assent, I., Baldauf, C. ve Seidl, T. (2011). The ClusTree: indexing micro-clusters for anytime stream mining. Knowledge and Information Systems, 29(2), 249-272. doi:10.1007/s10115-010-0342-8
  • Laohakiat, S., Phimoltares, S. ve Lursinsap, C. (2017). A clustering algorithm for stream data with LDA-based unsupervised localized dimension reduction. Information Sciences, 381, 104-123. doi:https://doi.org/10.1016/j.ins.2016.11.018
  • Li, Z. Q. (2014). A New Data Stream Clustering Approach about Intrusion Detection. Advanced Materials Research, 926-930, 2898- 2901. doi:10.4028/www.scientific.net/AMR.926-930.2898
  • Manzi, A., Dario, P. ve Cavallo, F. (2017). A Human Activity Recognition System Based on Dynamic Clustering of Skeleton Data. Sensors (Basel, Switzerland), 17(5), 1100. doi:10.3390/s17051100
  • Martín, A., Julián, A. B. A. ve Cos-Gayón, F. (2019). Analysis of Twitter messages using big data tools to evaluate and locate the activity in the city of Valencia (Spain). Cities, 86, 37-50. doi:https://doi.org/10.1016/j.cities.2018.12.014
  • Ntoutsi, I., Zimek, A., Palpanas, T., Kröger, P. ve Kriegel, H.-P. (2012). Density-based Projected Clustering over High Dimensional Data Streams. Paper presented at the SIAM International Conference on Data Mining.
  • O'Callaghan, L., Mishra, N., Meyerson, A., Guha, S. ve Motwani, R. (2002, 26 Fe.-1 March 2002). Streaming-data algorithms for highquality clustering. Paper presented at the Proceedings 1st International Conference on Data Engineering, San Jose, CA, USA, USA.
  • Oussous, A., Benjelloun, F.-Z., Ait Lahcen, A. ve Belfkih, S. (2018). Big Data technologies: A survey. Journal of King Saud University - Computer and Information Sciences, 30(4), 431-448. doi:https://doi.org/10.1016/j.jksuci.2017.06.001
  • Reddy, K. S. S. ve Bindu, C. S. (2018). StreamSW: A Density-based Approach for Clustering Data Streams over Sliding Windows. Measurement. doi:https://doi.org/10.1016/j.measurement.2018.11.041
  • Ren, J. ve Ma, R. (2009, 14-16 Aug. 2009). Density-Based Data Streams Clustering over Sliding Windows. Paper presented at the 2009 Sixth International Conference on Fuzzy Systems and Knowledge Discovery.
  • Silva, J. d. A., Hruschka, E. R. ve Gama, J. (2017). An evolutionary algorithm for clustering data streams with a variable number of clusters. Expert Syst. Appl., 67(C), 228-238. doi:10.1016/j.eswa.2016.09.020
  • Şenol, A. ve Karacan, H. (2018). A Survey on Data Stream Clustering Techniques. European Journal of Science and Technology(13), 17-30.
  • Şenol, A. ve Karacan, H. (2019). K-boyutlu ağaç ve uyarlanabilir yarıçap (KD-AR Stream) tabanlı gerçek zamanlı akan veri kümeleme. Gazi Üniversitesi Mühendislik-Mimarlık Fakültesi Dergisi, (Basımda).
  • Tasnim, S., Caldas, J., Pissinou, N., Iyengar, S. S. ve Ding, Z. (2018, 5-8 March 2018). Semantic-Aware Clustering-based Approach of Trajectory Data Stream Mining. Paper presented at the 2018 International Conference on Computing, Networking and Communications (ICNC).
  • Tu, L. ve Chen, Y. (2009). Stream data clustering based on grid density and attraction. ACM Trans. Knowl. Discov. Data, 3(3), 1-27. doi:10.1145/1552303.1552305
  • Udommanetanakit, K., Rakthanmanon, T. ve Waiyamai, K. (2007). E-Stream: Evolution-Based Technique for Stream Clustering, Berlin, Heidelberg.
  • Wan, L., Ng, W. K., Dang, X. H., Yu, P. S. ve Zhang, K. (2009). Density-based clustering of data streams at multiple resolutions. ACM Trans. Knowl. Discov. Data, 3(3), 1-28. doi:10.1145/1552303.1552307
  • Wang, W., Yang, J. ve Muntz, R. R. (1997). STING: A Statistical Information Grid Approach to Spatial Data Mining. Paper presented at the Proceedings of the 23rd International Conference on Very Large Data Bases.
  • Xu, J., Wang, G., Li, T., Deng, W. ve Gou, G. (2017). Fat node leading tree for data stream clustering with density peaks. KnowledgeBased Systems, 120, 99-117. doi:https://doi.org/10.1016/j.knosys.2016.12.025
  • Yin, C., Xia, L. ve Wang, J. (2017, 2017). Application of an Improved Data Stream Clustering Algorithm in Intrusion Detection System. Paper presented at the Advanced Multimedia and Ubiquitous Engineering, Singapore.
  • Yin, C., Xia, L. ve Wang, J. (2018, 2018). Data Stream Clustering Algorithm Based on Bucket Density for Intrusion Detection. Paper presented at the Advances in Computer Science and Ubiquitous Computing, Singapore.
APA ŞENOL A, Karacan H (2019). ImpSlidingWindow: Kayan Pencere Tabanlı Akan Veri Özetleme Yönteminin Performansını Arttırmaya Yönelik Yeni Bir Model. , 292 - 301. 10.31590/ejosat.638096
Chicago ŞENOL Ali,Karacan Hacer ImpSlidingWindow: Kayan Pencere Tabanlı Akan Veri Özetleme Yönteminin Performansını Arttırmaya Yönelik Yeni Bir Model. (2019): 292 - 301. 10.31590/ejosat.638096
MLA ŞENOL Ali,Karacan Hacer ImpSlidingWindow: Kayan Pencere Tabanlı Akan Veri Özetleme Yönteminin Performansını Arttırmaya Yönelik Yeni Bir Model. , 2019, ss.292 - 301. 10.31590/ejosat.638096
AMA ŞENOL A,Karacan H ImpSlidingWindow: Kayan Pencere Tabanlı Akan Veri Özetleme Yönteminin Performansını Arttırmaya Yönelik Yeni Bir Model. . 2019; 292 - 301. 10.31590/ejosat.638096
Vancouver ŞENOL A,Karacan H ImpSlidingWindow: Kayan Pencere Tabanlı Akan Veri Özetleme Yönteminin Performansını Arttırmaya Yönelik Yeni Bir Model. . 2019; 292 - 301. 10.31590/ejosat.638096
IEEE ŞENOL A,Karacan H "ImpSlidingWindow: Kayan Pencere Tabanlı Akan Veri Özetleme Yönteminin Performansını Arttırmaya Yönelik Yeni Bir Model." , ss.292 - 301, 2019. 10.31590/ejosat.638096
ISNAD ŞENOL, Ali - Karacan, Hacer. "ImpSlidingWindow: Kayan Pencere Tabanlı Akan Veri Özetleme Yönteminin Performansını Arttırmaya Yönelik Yeni Bir Model". (2019), 292-301. https://doi.org/10.31590/ejosat.638096
APA ŞENOL A, Karacan H (2019). ImpSlidingWindow: Kayan Pencere Tabanlı Akan Veri Özetleme Yönteminin Performansını Arttırmaya Yönelik Yeni Bir Model. Avrupa Bilim ve Teknoloji Dergisi, 0(0), 292 - 301. 10.31590/ejosat.638096
Chicago ŞENOL Ali,Karacan Hacer ImpSlidingWindow: Kayan Pencere Tabanlı Akan Veri Özetleme Yönteminin Performansını Arttırmaya Yönelik Yeni Bir Model. Avrupa Bilim ve Teknoloji Dergisi 0, no.0 (2019): 292 - 301. 10.31590/ejosat.638096
MLA ŞENOL Ali,Karacan Hacer ImpSlidingWindow: Kayan Pencere Tabanlı Akan Veri Özetleme Yönteminin Performansını Arttırmaya Yönelik Yeni Bir Model. Avrupa Bilim ve Teknoloji Dergisi, vol.0, no.0, 2019, ss.292 - 301. 10.31590/ejosat.638096
AMA ŞENOL A,Karacan H ImpSlidingWindow: Kayan Pencere Tabanlı Akan Veri Özetleme Yönteminin Performansını Arttırmaya Yönelik Yeni Bir Model. Avrupa Bilim ve Teknoloji Dergisi. 2019; 0(0): 292 - 301. 10.31590/ejosat.638096
Vancouver ŞENOL A,Karacan H ImpSlidingWindow: Kayan Pencere Tabanlı Akan Veri Özetleme Yönteminin Performansını Arttırmaya Yönelik Yeni Bir Model. Avrupa Bilim ve Teknoloji Dergisi. 2019; 0(0): 292 - 301. 10.31590/ejosat.638096
IEEE ŞENOL A,Karacan H "ImpSlidingWindow: Kayan Pencere Tabanlı Akan Veri Özetleme Yönteminin Performansını Arttırmaya Yönelik Yeni Bir Model." Avrupa Bilim ve Teknoloji Dergisi, 0, ss.292 - 301, 2019. 10.31590/ejosat.638096
ISNAD ŞENOL, Ali - Karacan, Hacer. "ImpSlidingWindow: Kayan Pencere Tabanlı Akan Veri Özetleme Yönteminin Performansını Arttırmaya Yönelik Yeni Bir Model". Avrupa Bilim ve Teknoloji Dergisi 0 (2019), 292-301. https://doi.org/10.31590/ejosat.638096