Sentiment Analysis for Software Engineering Domain in Turkish

tocoglu, mansur alp

doi:10.35377/saucis.03.03.769969

Sentiment Analysis for Software Engineering Domain in Turkish

Mansur Alp TOCOĞLU (Manisa Celal Bayar Üniversitesi, Yazılım Mühendisliği Bölümü, Manisa, Türkiye)

Sakarya University Journal of Computer and Information Sciences (Online)

3 0

Yıl: 2020 Cilt: 3 Sayı: 3 Sayfa Aralığı: 296 - 308 Metin Dili: İngilizce DOI: 10.35377/saucis.03.03.769969 İndeks Tarihi: 16-05-2021

Sentiment Analysis for Software Engineering Domain in Turkish

Öz:

The focus of this study is to provide a model to be used for the identification of sentiments of comments abouteducation and profession life of software engineering in social media and microblogging sites. Such a pre-trainedmodel can be useful to evaluate students’ and software engineers’ feedbacks about software engineering. Thisproblem is considered as a supervised text classification problem, which thereby requires a dataset for the trainingprocess. To do so, a survey is conducted among students of a software engineering department. In the classificationphase, we represent the corpus by using conventional and word-embedding text representation schemes and yieldaccuracy, recall and precision results by using conventional supervised machine learning classifiers and wellknown deep learning architectures. In the experimental analysis, first we focus on achieving classification resultsby using three conventional text representation schemes and three N-gram models in conjunction with fiveclassifiers (i.e., naïve bayes, k-nearest neighbor algorithm, support vector machines, random forest and logisticregression). In addition, we evaluate the performances of three ensemble learners and three deep learningarchitectures (i.e. convolutional neural network, recurrent neural network, and long short-term memory). Theempirical results indicate that deep learning architectures outperform conventional supervised machine learningclassifiers and ensemble learners.

Anahtar Kelime:

Yazılım Mühendisliği Alanında Türkçe Duygu Analizi

Öz:

Bu çalışmanın amacı, sosyal medya ve mikroblog sitelerinde yazılım mühendisliğinin eğitim ve meslek yaşamıyla ilgili yorumların belirlenmesinde kullanılacak bir model sağlamaktır. Bu tür önceden eğitilmiş bir model, öğrencilerin ve yazılım mühendislerinin yazılım mühendisliği hakkındaki geri bildirimlerini değerlendirmek için yararlı olabilir. Bu problem, eğitim süreci için bir veri kümesi gerektiren bir metin sınıflandırma problemi olarak kabul edilmiştir. Veri kümesini oluşturmak için, yazılım mühendisliği bölümü öğrencileri arasında bir anket yapılmıştır. Sınıflandırma aşamasında, geleneksel ve kelime yerleştirme metin gösterme şemalarını kullanılarak ve geleneksel denetimli makine öğrenimi sınıflandırıcıları ve iyi bilinen derin öğrenme mimarilerini kullanılarak doğruluk sonuçları sağlanmıştır. Deneysel analizde, öncelikle beş sınıflandırıcı (Naïve Bayes, k-en yakın komşu algoritması, destek vektör makineleri, rastgele orman ve lojistik regresyon) ile birlikte üç geleneksel metin temsil şeması ve üç N-gram modeli kullanarak doğruluk sonuçları elde edilmiştir. Buna ek olarak, iki ensemble algoritması ve üç derin öğrenme mimarilerinin (convolutional neural network, recurrent neural network, and long short-term memory) performanslarını değerlendirilmiştir. Ampirik sonuçlarda derin öğrenme mimarilerinin geleneksel denetimli makine öğrenimi sınıflandırıcılarından ve ensemble algoritmalarından daha iyi performans gösterdiği tespit edilmiştir.

Anahtar Kelime:

Belge Türü: Makale Makale Türü: Araştırma Makalesi Erişim Türü: Erişime Açık

[1] B. Pang and L. Lee, “Opinion mining and sentiment analysis,” Found. Trends Inf. Retr., pp.1– 135, 2008.
[2] E. Fersini, E. Messina, and F. A. Pozzi, “Sentiment analysis: Bayesian Ensemble Learning,” Decis. Support Syst., vol. 68, pp.26–38, 2014.
[3] B. Lin, F. Zampetti, G. Bavota, M. Di Penta, M. Lanza, and R. Oliveto, “Sentiment Analysis for Software Engineering: How Far CanWe Go?”, Proc. - 40th International Conference on Software Engineering, pp. 94–104, 2018.
[4] E. Guzman, D. Azócar, and Y. Li, “Sentiment Analysis of Commit Comments in GitHub: An Empirical Study,” Proc. - 11thWorking Conference on Mining Software Repositories, pp. 352– 355, 2014.
[5] M. Goul, O. Marjanovic, S. Baxley, and K. Vizecky, “Managing the Enterprise Business Intelligence App Store: Sentiment Analysis Supported Requirements Engineering,” Proc. - 45th Hawaii International Conference on System Sciences, pp. 4168–4177, 2012.
[6] M. Ortu, B. Adams, G. Destefanis, P. Tourani, M. Marchesi, and R. Tonelli, “Are Bullies More Productive? Empirical Study of Affectiveness vs. Issue Fixing Time,” Proc. - 12th Working Conference on Mining Software Repositories, pp. 303–313, 2015.
[7] F. Calefato, F. Lanubile, and N. Novielli, “EmoTxt: A Toolkit for Emotion Recognition from Text,” Proc. - 7th International Conference on Affective Computing and Intelligent Interaction, pp. 79–80, 2017.
[8] M. Goul, O. Marjanovic, S. Baxley, and K. Vizecky, “Managing the Enterprise Business Intelligence App Store: Sentiment Analysis Supported Requirements Engineering,” Proc. - 45th Hawaii International Conference on System Sciences, pp. 4168–4177, 2012.
[9] L. V. G. Carreno and K. Winbladh, “Analysis of User Comments: An Approach for Software Requirements Evolution,” Proc. - 35th International Conference on Software Engineering, pp. 582–591, 2013.
[10] E. Guzman, O. Aly, and B. Bruegge, “Retrieving Diverse Opinions from App Reviews”, Proc. - 9th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, pp.21–30, 2015.
[11] M. Thelwall, K. Buckley, G. Paltoglou, D. Cai, and A. Kappas, “Sentiment in short strength detection informal text,” J. Am. Soc. Inf. Sci. Technol., vol. 61, no. 12, pp. 2544–2558, 2010.
[12] S. Panichella, A. D. Sorbo, E. Guzman, C. A. Visaggio,G. Canfora, and . C. Gall, “How Can I Improve My App? Classifying User Reviews for Software Maintenance and Evolution,” Proc. - 31st International Conference on Software Maintenance and Evolution, pp. 281–290, 2015.
[13] E. Guzman, R. Alkadhi, and N. Seyff, “An exploratory study of Twitter messages about software applications,” Requir. Eng., vol. 22, pp. 387–412, 2017.
[14] F. Calefato, F. Lanubile, F. Maiorano, and N. Novielli, “Sentiment polarity detection for software development,” Empir. Software Eng., vol. 23, pp. 1352–1382, 2018.
[15] L. Zhao, and A Zhao, “Sentiment analysis based requirement evolution prediction,” Future Internet, vol. 11, no. 2, article no. 5, 2019.
[16] F. Sağlam, H. Sever and B. Genç, “Developing Turkish Sentiment Lexicon for Sentiment Analysis using Online News Media,” Proc. - 13th International Conference of Computer Systems and Applications, pp. 1–5, 2016.
[17] K. Bayraktar, U. Yavanoglu and A. Ozbilen, “A Rule-Based Holistic Approach for Turkish Aspect-Based Sentiment Analysis,” Proc. - IEEE International Conference on Big Data, pp. 2154–2158, 2019.
[18] M. Rumelli, D. Akkuş, Ö. Kart and Z. Isik, “Sentiment Analysis in Turkish Text with Machine Learning Algorithms,” Proc. - Innovations in Intelligent Systems and Applications Conference, pp. 1–5, 2019.
[19] B. Ciftci and M. S. Apaydin, “A Deep Learning Approach to Sentiment Analysis in Turkish,” Proc. - International Conference on Artificial Intelligence and Data Processing, pp. 1–5, 2018.
[20] A. A. Karcioğlu and T. Aydin, “Sentiment Analysis of Turkish and English Twitter Feeds Using Word2Vec Model,” Proc. - 27th Signal Processing and Communications Applications Conference, pp. 1–4, 2019.
[21] D. Ayata, M. Saraçlar and A. Özgür, “Turkish Tweet Sentiment Analysis with Word Embedding and Machine Learning,” Proc. - 25th Signal Processing and Communications Applications Conference, pp. 1–4, 2017.
[22] A. Onan, “Mining opinions from instructor evaluation reviews: A deep learning approach,” Comput. Appl. Eng. Educ., vol. 28, no. 1, pp. 117–138, 2020.
[23] E. Stamatatos, “A survey of modern authorship attribution methods,” J. Am. Soc. Inf. Sci. Technol., vol. 60, no. 3, pp. 538–556, 2009.
[24] M. F. Porter, “Snowball: A language for stemming algorithms,” 2001.
[25] S. Bird, and E. Loper, “NLTK : The Natural Language Toolkit NLTK : The Natural Language Toolkit,” Proc. - Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics, pp. 63–70, 2016.
[26] C. C. Aggarwal and C. X. Zhai, “A survey of text clustering algorithms,” in Mining Text Data, pp.77–128, 2012.
[27] T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean, “Distributed representations of words and phrases and their compositionality,” Proc. - Advances in Neural Information Processing Systems, pp. 3111–3119, 2013.
[28] Y. Bengio, R. Ducharme, P. Vincent, and C. Jauvin, “A Neural Probabilistic Language Model,” 2003. J. Mach. Learn. Research, vol. 3, pp. 1137–1155, 2003.
[29] H. Zhang, “The Optimality of Naive Bayes,” Proc. - 17th International Florida Artificial Intelligence Research Society Conference, pp. 562–567, 2004.
[30] C. Cortes and V. Vapnik, “Support-Vector Networks,” Mach. Learn., vol. 20, no. 3, pp. 273– 297, 1995.
[31] L. Breiman, “Random forests,” Mach. Learn., vol. 45, pp. 5–32, 2001.
[32] M. Kantardzic, Data Mining: Concepts, Models, Methods, and Algorithms: Second Edition. Wiley, Hoboken, 2011.
[33] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” Proc. - 25th International Conference on Neural Information Processing Systems, pp. 1097-1105, 2012.
[34] S. Hochreiter and J. Schmidhuber, “Long Short-Term Memory,” Neural Comput., vol. 9, no. 8, pp. 1735–1780, 1997.
[35] X. Li et al., “Long short-term memory neural network for air pollutant concentration predictions: Method development and evaluation,” Environ. Pollut., vol. 231, pp. 997–1004, 2017.
[36] A. Onan, S. Korukoǧlu, and H. Bulut, “Ensemble of keyword extraction methods and classifiers in text classification,” Expert Syst. Appl., vol. 57, pp. 232–247, 2016.
[37] Z.H. Zhou, “Ensemble Methods: Foundations and Algorithm,” UK: CRC Press, 2012.
[38] L. Breiman, “Bagging predictors,” Mach. Learn., vol. 24, pp. 123–140, 1996.
[39] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” arXiv preprint arXiv:1301.3781, 2013.
[40] NLPL word embeddings repository, “word embeddings repository homepage,” 2017. [Online]. Available: http://vectors.nlpl.eu/repository/. [Accessed: 25-Nov-2020].
[41] W. Yin, K. Kann, M. Yu, and H. Schutze, “Comparative study of CNN and RNN for natural language processing,” arXiv preprint arXiv:1702.01923, 2017.
[42] D. Tang, B. Qin, and T. Liu, “Document Modeling with Gated Recurrent Neural Network for Sentiment Classification,” Proc. - Conference on Empirical Methods in Natural Language Processing, pp. 1422–1432, 2015.
[43] Y. N. Dauphin, A. Fan, M. Auli, and D. Grangier, “Language Modeling with Gated Convolutional Networks,” arXiv preprint arXiv:1612.08083, 2016.

APA	tocoglu m (2020). Sentiment Analysis for Software Engineering Domain in Turkish. , 296 - 308. 10.35377/saucis.03.03.769969
Chicago	tocoglu mansur alp Sentiment Analysis for Software Engineering Domain in Turkish. (2020): 296 - 308. 10.35377/saucis.03.03.769969
MLA	tocoglu mansur alp Sentiment Analysis for Software Engineering Domain in Turkish. , 2020, ss.296 - 308. 10.35377/saucis.03.03.769969
AMA	tocoglu m Sentiment Analysis for Software Engineering Domain in Turkish. . 2020; 296 - 308. 10.35377/saucis.03.03.769969
Vancouver	tocoglu m Sentiment Analysis for Software Engineering Domain in Turkish. . 2020; 296 - 308. 10.35377/saucis.03.03.769969
IEEE	tocoglu m "Sentiment Analysis for Software Engineering Domain in Turkish." , ss.296 - 308, 2020. 10.35377/saucis.03.03.769969
ISNAD	tocoglu, mansur alp. "Sentiment Analysis for Software Engineering Domain in Turkish". (2020), 296-308. https://doi.org/10.35377/saucis.03.03.769969

APA	tocoglu m (2020). Sentiment Analysis for Software Engineering Domain in Turkish. Sakarya University Journal of Computer and Information Sciences (Online), 3(3), 296 - 308. 10.35377/saucis.03.03.769969
Chicago	tocoglu mansur alp Sentiment Analysis for Software Engineering Domain in Turkish. Sakarya University Journal of Computer and Information Sciences (Online) 3, no.3 (2020): 296 - 308. 10.35377/saucis.03.03.769969
MLA	tocoglu mansur alp Sentiment Analysis for Software Engineering Domain in Turkish. Sakarya University Journal of Computer and Information Sciences (Online), vol.3, no.3, 2020, ss.296 - 308. 10.35377/saucis.03.03.769969
AMA	tocoglu m Sentiment Analysis for Software Engineering Domain in Turkish. Sakarya University Journal of Computer and Information Sciences (Online). 2020; 3(3): 296 - 308. 10.35377/saucis.03.03.769969
Vancouver	tocoglu m Sentiment Analysis for Software Engineering Domain in Turkish. Sakarya University Journal of Computer and Information Sciences (Online). 2020; 3(3): 296 - 308. 10.35377/saucis.03.03.769969
IEEE	tocoglu m "Sentiment Analysis for Software Engineering Domain in Turkish." Sakarya University Journal of Computer and Information Sciences (Online), 3, ss.296 - 308, 2020. 10.35377/saucis.03.03.769969
ISNAD	tocoglu, mansur alp. "Sentiment Analysis for Software Engineering Domain in Turkish". Sakarya University Journal of Computer and Information Sciences (Online) 3/3 (2020), 296-308. https://doi.org/10.35377/saucis.03.03.769969