Yıl: 2020 Cilt: 12 Sayı: 1 Sayfa Aralığı: 159 - 184 Metin Dili: İngilizce DOI: 10.9756/INT-JECSE/V12I1.201003 İndeks Tarihi: 31-10-2020

Twitter Data Clustering on issues of Children with Special Needs using Hybrid Topic Models with Multi-viewpoints Similarity Metric

Öz:
Social networks are an excellent source for users to share or exchange information ontopics. Twitter is the most prioritized social network concerning the issues of children withspecial needs related topics of social users. Extracting good quality of topics from twittercorpus depends on the quality of text pre-processing and in finding optimal clustertendency. With traditional topic models, cluster tendency identification is difficult becausethey use less frequent words in tweets. In traditional topic models, k value (number ofclusters) decided manually and used Euclidean distance metric in most methods andcosine distance metrics in some methods. Proper Visualization of cluster tendency is alsoessential as corpus consists of a large number of documents and billions of words. In thispaper, hybrid topic models with multi-viewpoints based similarity metric proposed toVisualize topic clouds, to find cluster tendency of various topics related to issues ofchildren with special needs twitter datasets. Experimental evaluation and comparison ofthese proposed hybrid models done with other distance metrics. Empirical analysisperformed with convergence speed and computational complexities. Cluster validity ofproposed models done with external validity indices to quantify the quality of cluster andwith internal validity indices to evaluate clustering structure. Visual Non-MatrixFactorization (VIS NMF) under multi-viewpoints similarity metric performed well thanother models with a more informative assessment.
Anahtar Kelime:

Belge Türü: Makale Makale Türü: Araştırma Makalesi Erişim Türü: Erişime Açık
  • Hassan, A. E. H. (2015). Emotional and behavioral problems of children with learning disabilities. Journal of Educational Policy and Entrepreneurial Research (JEPER), 2(10), 66-74. https://www.researchgate.net/publication/282733 476.
  • Amelio, A., &Pizzuti, C. (2015, August). Is normalized mutual information a fair measure for comparing community detection methods?.In Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015, 1584-1585.
  • Barry Coughlan, and Barry Carpenter (2017). Mental Health & Emotional Wellbeing in Students with Disabilities: Understanding the Complexities Involved.
  • Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003).Latent dirichlet allocation. Journal of machine Learning research, 3(Jan), 993- 1022.
  • Carroll, C., &Sixsmith, J. (2016).Exploring the facilitation of young children with disabilities in research about their early intervention service. Child Language Teaching and Therapy, 32(3), 313-325.
  • Chim, H., & Deng, X. (2008).Efficient phrasebased document similarity for clustering. IEEE Transactions on knowledge and data engineering, 20(9), 1217-1229.
  • Choo, J., Lee, C., Reddy, C. K., & Park, H. (2013). Utopian: User-driven topic modeling based on interactive nonnegative matrix factorization. IEEE transactions on visualization and computer graphics, 19(12), 1992-2001 .
  • Lappa, C., &Mantzikos, C. (2019). Teaching Social Skills In Small Groups of Children With Multiple Disabilities: Motor and Intellectual Disabilities. An Intervention Program. European Journal of Special Education Research, 4(1), 57-77.
  • Dan A.S.,&Szymon J. (2002). An Axiomatization of Partition Entropy.IEEE Transactions on Information Theory, 48(7), 2138-2142.
  • Data Set Keyword Phrases TREC2014 [Online] Available: https://trec.nist.gov/pubs/trec23/trec2014.HT ML
  • Data Set Keyword Phrases TREC2015 [Online] Available:https://trec.nist.gov/pubs/trec24/tre c2015.html
  • Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977).Maximum likelihood from incomplete data via the EMalgorithm. Journal of the Royal statistical Society, 39(1), 1 -38.
  • Dhillon, I. S. (2001 ).Co-clustering documents and words using bipartite spectral graph partitioning.In Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, 269- 274.
  • Dhillon, I. S., &Modha, D. S. (2001).Concept decompositions for large sparse text data using clustering. Machine learning, 42(1 -2), 143-175.
  • Ding, C. H., He, X., Zha, H., Gu, M., & Simon, H. D. (2001 ). A min-max cut algorithm for graph partitioning and data clustering.In Proceedings 2001 IEEE international conference on data mining, 107-114.
  • Reddy, B. E., & Prasad, K. R. (2016).Improving the performance of visualized clustering method. International Journal of System Assurance Engineering and Management, 7(1), 102-111.
  • Gong, Y., Xu, W. (2007).Machine Learning for Multimedia Content Analysis.SpringerVerlag.
  • Hu, Y., John, A., Wang, F., &Kambhampati, S. (2012, July). Et-lda: Joint topic modeling for aligning events and their twitter feedback. In Twenty-Sixth AAAI Conference on Artificial Intelligence, 12, 59±65.
  • Ienco, D., Pensa, R. G., &Meo, R. (2009, August).Context-based distance learning for categorical data clustering.In International Symposium on Intelligent Data AnalysisSpringer, Berlin, Heidelberg, 83-94.
  • Bezdek, J. C., & Hathaway, R. J. (2002). VAT: A tool for visual assessment of (cluster) tendency. In Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No. 02CH37290), 2225-2230.
  • Kelemen, O., Tezel, O., Ozkul, E., Tiryaki, B.K., &Agayev, E. (2017).A comparison of validity indices on a fuzzy C-means clustering algorithm for directional data.Proc. 25th Signal Processing and Communications Applications Conference (SIU).
  • Kumar, D., Bezdek, J. C., Palaniswami, M., Rajasegarar, S., Leckie, C., & Havens, T. C. (2015).A hybrid approach to clustering in big data. IEEE transactions on cybernetics, 46(10), 2372-2385
  • Wu, K. L. (2008). An analysis of robustness of partition coefficient index. In 2008 IEEE International Conference on Fuzzy Systems (IEEE World Congress on Computational Intelligence), 372-376.
  • Lakkaraju, P., Gauch, S., &Speretta, M. (2008, June). Document similarity based on concept tree distance. In Proceedings of the nineteenth ACM conference on Hypertext and hypermedia, 127-132.
  • Lee, D. D., &Seung, H. S. (2001).Algorithms for non-negative matrix factorization.In Advances in neural information processing systems, 556-562.
  • Li, Z., Shang, W., &Yan, M. (2016). News text classification model based on-the topic model.Proc. IEEE/ACIS 15th International Conference on Computer and Information Science (ICIS).
  • Lim, M. S., Hocking, J. S., Hellard, M. E., & Aitken, C. K. (2008). SMS STI: a review of the uses of mobile phone text messaging in sexual health. International journal of STD & AIDS, 19(5), 287-290.
  • McLean, R., Richards, B. H., &Wardman, J. I. (2007). The effect of Web 2.0 on the future of medical practice and education: Darwikinian evolution or folksonomic revolution?. Medical Journal of Australia, 187(3), 174-177.
  • Miller, A. R., & Rosenbaum, P. (2016). Perspectives on ³disease´ and ³disabilit´ in child health: the case of childhood neurodisability. Frontiers in public health, 4, 226-235.
  • Moulana, M., &Noorullah, R.M. (2020).Multi Aspect Topic Model for Twitter Healthcare Recommendation.Proc. 3rd International Conference on Innovative Computing and Communication (ICICC-2020), 1 -5.
  • Nugroho, R., Yang, J., Zhong, Y., Paris, C., & Nepal, S. (2015, June). Deriving topics in twitter by exploiting tweet interactions.In 2015 IEEE International Congress on Big Data, 87-94.
  • Pattanodom, M., Iam-On, N., &Boongoen, T. (2016, January).Clustering data with the presence of missing values by ensemble approach. In 2016 second asian conference on defence technology (acdt), 151 -156.
  • Perrin, J. M. (2002). Health services research for children with disabilities. The Milbank Quarterly, 80(2), 303-324.
  • Prasad, K. R., &Basha, M. S. (2016). Improving the performance of speech clustering method.In 2016 10th International Conference on Intelligent Systems and Control (ISCO), 1 -5.
  • Prasad, K. R., Mohammed, M., &Noorullah, R. M. (2019). Visual topic models for healthcare data clustering. Evolutionary Intelligence, 1 - 18
  • Fauth, R. C., Platt, L., & Parsons, S. (2017). The development of behavior problems among disabled and non-disabled children in England. Journal of Applied Developmental Psychology, 52, 46-58.
  • RobertusNugroho, Jian Yang, Weiliang Zhao, Cecile Paris, and Surya Nepal (2015). What and With Whom? Identifying Topics in Twitter through Both Interactions and Text. Journal of Latex Class Files, 14(8).
  • Rodgers, A., Corbett, T., Bramley, D., Riddell, T., Wills, M., Lin, R. B., & Jones, M. (2005). Do u smoke after txt? Results of a randomised trial of smoking cessation using mobile phone text messaging. Tobacco control, 14(4), 255-261..
  • Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., &Harshman, R. (1990).Indexing by latent semantic analysis. Journal of the American society for information science, 41(6), 391 -407.
  • Shanti, M., Eknath, A., Venkat, S.C.V.,&Moulana, M. (2019).Analysis of text classification of Dataset using NBClassification.International Journal of Innovative Technology and Exploring Engineering (IJITEE), 8(5), 1175-1179.
  • Shi, J., & Malik, J. (2000).Normalized cuts and image segmentation. IEEE Transactions on pattern analysis and machine intelligence, 22(8), 888-905.
  • Singular Value Decomposition [Online] available: web.mit.edu/be.400/www/SVD/Singular_Val ue_Decomposition.htm.
  • Strehl, A., Ghosh, J., &Mooney, R. (2000).Impact of similarity measures on web-page clustering.In Workshop on artificial intelligence for web search (AAAI 2000), 58, 64.
  • Mekaroonkamon, T., &Wongsa, S. (2016).A comparative investigation of the robustness of unsupervised clustering techniques for rotating machine fault diagnosis with poorlyseparated data.In Eighth International Conference on Advanced Computational Intelligence (ICACI), 165-172.
  • Hofmann, T. (1999).Probabilistic latent semantic indexing.In Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, 50-57.
  • Maulik, U., &Bandyopadhyay, S. (2002). Performance evaluation of some clustering algorithms and validity indices. IEEE Transactions on pattern analysis and machine intelligence, 24(12), 1650-1654.
  • Vergani, A. A., &Binaghi, E. (2018).A soft Davies-Bouldin separation measure.In 2018 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), 1 -8.
  • Vilella, A., Bayas, J. M., Diaz, M. T., Guinovart, C., Diez, C., Simó, D., &Cerezo, J. (2004). The role of mobile phones in improving vaccination rates in travelers. Preventive medicine, 38(4), 503-509.
  • Sriurai, W., Meesad, P., &Haruechaiyasak, C. (2010). Hierarchical web page classification based on a topic model and neighboring pages integration.IJCSIS, 7(2), 166-173.
  • Xu, G., Meng, Y., Chen, Z., Qiu, X., Wang, C., & Yao, H. (2019).Research on topic detection and tracking for online news texts. IEEE Access, 7, 58407-58418.
  • Yan, X., Guo, J., Liu, S., Cheng, X. Q., & Wang, Y. (2012).Clustering short text using ncutweighted non-negative matrix factorization.In Proceedings of the 21st ACM international conference on Information and knowledge management, 2259-2262.
  • Yan, X., &Guo, J. (2013).Learning Topics in short text Using Ncut-weighted non-negative matrix Factorization on the term correlation matrix. [Online] Available: http://xiaohuiyan.com/papers/TNMF-SDM- 13.pdf.
  • Yan, X., Guo, J., Liu, S., Cheng, X., & Wang, Y. (2013).Learning topics in short texts by nonnegative matrix factorization on term correlation matrix.In proceedings of the 2013 SIAM International Conference on Data Mining. Society for Industrial and Applied Mathematics, 749-757
  • Zha, H., He, X., Ding, C., Gu, M., & Simon, H. D. (2002). Spectral relaxation for k-means clustering.In Advances in neural information processing systems, 1057-1064
APA NOORULLAH R, MOHAMMED M (2020). Twitter Data Clustering on issues of Children with Special Needs using Hybrid Topic Models with Multi-viewpoints Similarity Metric. , 159 - 184. 10.9756/INT-JECSE/V12I1.201003
Chicago NOORULLAH R.M.,MOHAMMED Moulana Twitter Data Clustering on issues of Children with Special Needs using Hybrid Topic Models with Multi-viewpoints Similarity Metric. (2020): 159 - 184. 10.9756/INT-JECSE/V12I1.201003
MLA NOORULLAH R.M.,MOHAMMED Moulana Twitter Data Clustering on issues of Children with Special Needs using Hybrid Topic Models with Multi-viewpoints Similarity Metric. , 2020, ss.159 - 184. 10.9756/INT-JECSE/V12I1.201003
AMA NOORULLAH R,MOHAMMED M Twitter Data Clustering on issues of Children with Special Needs using Hybrid Topic Models with Multi-viewpoints Similarity Metric. . 2020; 159 - 184. 10.9756/INT-JECSE/V12I1.201003
Vancouver NOORULLAH R,MOHAMMED M Twitter Data Clustering on issues of Children with Special Needs using Hybrid Topic Models with Multi-viewpoints Similarity Metric. . 2020; 159 - 184. 10.9756/INT-JECSE/V12I1.201003
IEEE NOORULLAH R,MOHAMMED M "Twitter Data Clustering on issues of Children with Special Needs using Hybrid Topic Models with Multi-viewpoints Similarity Metric." , ss.159 - 184, 2020. 10.9756/INT-JECSE/V12I1.201003
ISNAD NOORULLAH, R.M. - MOHAMMED, Moulana. "Twitter Data Clustering on issues of Children with Special Needs using Hybrid Topic Models with Multi-viewpoints Similarity Metric". (2020), 159-184. https://doi.org/10.9756/INT-JECSE/V12I1.201003
APA NOORULLAH R, MOHAMMED M (2020). Twitter Data Clustering on issues of Children with Special Needs using Hybrid Topic Models with Multi-viewpoints Similarity Metric. INTERNATIONAL JOURNAL OF EARLY CHILDHOOD SPECIAL EDUCATION, 12(1), 159 - 184. 10.9756/INT-JECSE/V12I1.201003
Chicago NOORULLAH R.M.,MOHAMMED Moulana Twitter Data Clustering on issues of Children with Special Needs using Hybrid Topic Models with Multi-viewpoints Similarity Metric. INTERNATIONAL JOURNAL OF EARLY CHILDHOOD SPECIAL EDUCATION 12, no.1 (2020): 159 - 184. 10.9756/INT-JECSE/V12I1.201003
MLA NOORULLAH R.M.,MOHAMMED Moulana Twitter Data Clustering on issues of Children with Special Needs using Hybrid Topic Models with Multi-viewpoints Similarity Metric. INTERNATIONAL JOURNAL OF EARLY CHILDHOOD SPECIAL EDUCATION, vol.12, no.1, 2020, ss.159 - 184. 10.9756/INT-JECSE/V12I1.201003
AMA NOORULLAH R,MOHAMMED M Twitter Data Clustering on issues of Children with Special Needs using Hybrid Topic Models with Multi-viewpoints Similarity Metric. INTERNATIONAL JOURNAL OF EARLY CHILDHOOD SPECIAL EDUCATION. 2020; 12(1): 159 - 184. 10.9756/INT-JECSE/V12I1.201003
Vancouver NOORULLAH R,MOHAMMED M Twitter Data Clustering on issues of Children with Special Needs using Hybrid Topic Models with Multi-viewpoints Similarity Metric. INTERNATIONAL JOURNAL OF EARLY CHILDHOOD SPECIAL EDUCATION. 2020; 12(1): 159 - 184. 10.9756/INT-JECSE/V12I1.201003
IEEE NOORULLAH R,MOHAMMED M "Twitter Data Clustering on issues of Children with Special Needs using Hybrid Topic Models with Multi-viewpoints Similarity Metric." INTERNATIONAL JOURNAL OF EARLY CHILDHOOD SPECIAL EDUCATION, 12, ss.159 - 184, 2020. 10.9756/INT-JECSE/V12I1.201003
ISNAD NOORULLAH, R.M. - MOHAMMED, Moulana. "Twitter Data Clustering on issues of Children with Special Needs using Hybrid Topic Models with Multi-viewpoints Similarity Metric". INTERNATIONAL JOURNAL OF EARLY CHILDHOOD SPECIAL EDUCATION 12/1 (2020), 159-184. https://doi.org/10.9756/INT-JECSE/V12I1.201003