Yıl: 2018 Cilt: 6 Sayı: 2 Sayfa Aralığı: 69 - 77 Metin Dili: İngilizce DOI: 10.17694/bajece. 419538 İndeks Tarihi: 08-02-2019

Sentiment Analysis on Twitter Based on Ensemble of Psychological and Linguistic Feature Sets

Öz:
With the advances in information and communication technologies, social media and microblogging platforms serve as an important source of information. In microblogging platforms, people can share their opinions, complaints, sentiments and attitudes towards topics, current issues and products. Sentiment analysis is an important research direction in natural language processing, which aims to identify the sentiment orientation of source materials. Twitter is a popular microblogging platform, where people all over the world can interact by user-generated text messages. Information obtained from Twitter can serve as an essential source for several applications, including event detection, news recommendation and crisis management. In sentiment classification, the identification of an appropriate feature subset plays an important role. LIWC (Linguistic Inquiry and Word Count) is an exploratory text analysis software to extract psycholinguistic features from text documents. In this paper, we present a psycholinguistic approach to sentiment analysis on Twitter. In this scheme, we utilized five main LIWC categories (namely, linguistic processes, psychological processes, personal concerns, spoken categories and punctuation) as feature sets. In the experimental analysis, five LIWC categories and their ensemble combinations are taken into consideration. To explore the predictive performance of different feature engineering schemes, four supervised learning algorithms (namely, Naïve Bayes, support vector machines, k-nearest neighbor algorithm and logistic regression) and three ensemble learning methods (namely, AdaBoost, Bagging and Random Subspace) are utilized. The experimental results indicate that ensemble feature sets yield higher predictive performance compared to the individual feature sets.
Anahtar Kelime:

Konular: Mühendislik, Biyotıp Mühendislik, Elektrik ve Elektronik Bilgisayar Bilimleri, Yazılım Mühendisliği Yeşil, Sürdürülebilir Bilim ve Teknoloji Telekomünikasyon Bilgisayar Bilimleri, Sibernitik Bilgisayar Bilimleri, Bilgi Sistemleri Bilgisayar Bilimleri, Donanım ve Mimari Bilgisayar Bilimleri, Teori ve Metotlar Bilgisayar Bilimleri, Yapay Zeka
Belge Türü: Makale Makale Türü: Araştırma Makalesi Erişim Türü: Erişime Açık
  • A. Onan, “Twitter mesajları üzerinde makine öğrenmesi yöntemlerine dayalı duygu analizi”, Yönetim Bilişim Sistemleri Dergisi, Vol. 3, No. 2, 2017, pp. 1-14.
  • A. Onan, S. Korukoğlu, and H. Bulut, “A multiobjective weighted voting ensemble classifier based on differential evolution algorithm for text sentiment classification”, Expert Systems with Applications, Vol.62, 2016, pp.1-16.
  • A.Onan, “A machine learning based approach to identify geo-location of Twitter users”, in Proceedings of the ICC 2017, UK, 2017, pp.1-7.
  • J. Mahmud, J. Nichols, and C. Drews, “Home location identification of twitter users”, ACM Transactions on Intelligent Systems and Technology, Vol. 5, No.3, 2014, pp.47.
  • Z. Cheng, J. Caverlee, and K.Lee, “You are where you tweet: a content-based approach to geo-location twitter users”, in Proceedings of the 19th ACM International Conference on Information and Knowledge Management, USA, 2010, pp.759-768.
  • B.Hecht, L.Hong, B. Suh and E.D.Chi, “Tweets from Justin Bieber’s heart: the dynamics of the location field in user profiles”, in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, USA, 2011, pp.237-246.
  • A. Onan and S. Korukoğlu, “Makine öğrenmesi yöntemlerinin görüş madenciliğinde kullanılması üzerine bir literatür araştırması”, Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi, Vol. 22, No. 2, 2016, pp. 111-122.
  • W. Medhat, A. Hassan and H. Korashy, “Sentiment analysis algorithms and applications: a survey”, Ain Shams Engineering Journal, Vol. 5, No. 4, 2014, pp. 1093-1113.
  • A. Onan and S. Korukoğlu, “A feature selection model based on genetic rank aggregation for text sentiment classification”, Journal of Information Science, Vol. 43, No.1, 2017, pp.25-38.
  • M.P. Salas-Zarate, E.Lopez-Lopez, R.Valencia-Garcia, N. Gilles, A.Almela and G.Alor-Hernandez, “A study on LIWC categories for opinion mining in Spanish reviews”, Journal of Information Science, Vol.40, No.6, 2014, pp.749-760.
  • A.Go, R. Bhayani, and L. Huang, “Twitter sentiment classification using distant supervision”, CS224N Project Report, 2009.
  • L. Barbosa and J. Feng, “Robust sentiment detection on twitter from biased and noisy data”, in Proceedings of ACL, USA, 2010, pp. 36-44
  • A.Pak and P.Paroubek, “Twitter as a corpus for sentiment analysis and opinion mining”, in Proceedings of LREC 2010, USA, 2010, pp. 1320-1326.
  • E. Kouloumpis, T.Wilson and J.D.Moore, “Twitter sentiment analysis: the good, the bad and the omg!”, in Proceedings of ICWSM 2011, USA, 2011, pp. 538-541.
  • A.Agarwal, B.Xie, I.Vovsha, O.Rambow and R. Passonneau, “Sentiment analysis of twitter data”, in Proceedings of ACL 2011, USA, 2011, pp. 30-38.
  • H.Saif, Y.He and H.Alani, “Semantic sentiment analysis of twitter”, in Proceedings of ISWC 2012, USA, 2012, pp.508-524
  • M.Salas-Zarate, M.A. Paredes-Valverde, M.A.Rodriguez-Garcia, R.Valencia-Garcia and G.Alor-Hernandez, “Automatic detection of satire in Twitter: a psycholinguistic-based approach”, Knowledge-Based Systems, Vol.128, 2017, pp.20-33.
  • J.M.Cotelo, F.L.Cruz, J.A.Troyano and F.J.Ortega, “A modular approach for lexical normalization applied to Spanish tweets”, Expert Systems with Applications, Vol. 42, No.10, 2015,pp. 4743-4754.
  • E.Kontopoulos, C.Berberidis, T.Dergiades and N.Bassiliades, “Ontolog-based sentiment analysis of twitter posts”, Expert Systems with Applications, Vol.40, No.10, 2013, pp.4065-4074.
  • R.Justo, T.Corcoran, S.M.Lukin, M.Walker and M.I.Torres, “Extracting relevant knowledge for the detection of sarcasm and nastiness in the social web”, Knowledge-Based Systems, Vol. 69, 2014, pp.124-133.
  • S.Skalicky and S.Crossley, “A statistical analysis of satirical Amazon.com product reviews”, European Journal of Humour Research, Vol.2, 2015, pp.66-85.
  • J.W.Pennebaker, R.L.Boyd, K.Jordan and K.Blackburn, “The development and psychometric properties of LIWC 2015”.
  • A.Onan, “Classifier and feature set ensembles for web page classification”, Journal of Information Science, Vol. 42, No.2, pp.150-165.
  • A.Onan, “Sarcasm identification on twitter: a machine learning approach”, in Proceedings of CSOC 2017, Germany, 2017, pp.374-383.
  • M.Kantardzic, Data mining: concepts, models, methods and algorithms, John Wiley & Sons, 2011, p.552.
  • L.Breiman, “Bagging predictors”, Machine Learning, Vol.4, No.2, pp.123-140.
  • Y.Freund and R.E.Schapire, “Experiments with a new boosting algorithm”, in Proceedings of the Thirteenth International Conference on Machine Learning, Italy, 1996, pp.148-156.
  • T.K. Ho, “The random subspace method for constructing decision forests”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 20, No.8, pp.832-844.
  • A.Onan, “Artificial immune system based web page classification”, in Proceedings of CSOC 2015, Germany, 2015, pp.189-199.
APA ONAN A (2018). Sentiment Analysis on Twitter Based on Ensemble of Psychological and Linguistic Feature Sets. , 69 - 77. 10.17694/bajece. 419538
Chicago ONAN A. Sentiment Analysis on Twitter Based on Ensemble of Psychological and Linguistic Feature Sets. (2018): 69 - 77. 10.17694/bajece. 419538
MLA ONAN A. Sentiment Analysis on Twitter Based on Ensemble of Psychological and Linguistic Feature Sets. , 2018, ss.69 - 77. 10.17694/bajece. 419538
AMA ONAN A Sentiment Analysis on Twitter Based on Ensemble of Psychological and Linguistic Feature Sets. . 2018; 69 - 77. 10.17694/bajece. 419538
Vancouver ONAN A Sentiment Analysis on Twitter Based on Ensemble of Psychological and Linguistic Feature Sets. . 2018; 69 - 77. 10.17694/bajece. 419538
IEEE ONAN A "Sentiment Analysis on Twitter Based on Ensemble of Psychological and Linguistic Feature Sets." , ss.69 - 77, 2018. 10.17694/bajece. 419538
ISNAD ONAN, A.. "Sentiment Analysis on Twitter Based on Ensemble of Psychological and Linguistic Feature Sets". (2018), 69-77. https://doi.org/10.17694/bajece. 419538
APA ONAN A (2018). Sentiment Analysis on Twitter Based on Ensemble of Psychological and Linguistic Feature Sets. Balkan Journal of Electrical and Computer Engineering, 6(2), 69 - 77. 10.17694/bajece. 419538
Chicago ONAN A. Sentiment Analysis on Twitter Based on Ensemble of Psychological and Linguistic Feature Sets. Balkan Journal of Electrical and Computer Engineering 6, no.2 (2018): 69 - 77. 10.17694/bajece. 419538
MLA ONAN A. Sentiment Analysis on Twitter Based on Ensemble of Psychological and Linguistic Feature Sets. Balkan Journal of Electrical and Computer Engineering, vol.6, no.2, 2018, ss.69 - 77. 10.17694/bajece. 419538
AMA ONAN A Sentiment Analysis on Twitter Based on Ensemble of Psychological and Linguistic Feature Sets. Balkan Journal of Electrical and Computer Engineering. 2018; 6(2): 69 - 77. 10.17694/bajece. 419538
Vancouver ONAN A Sentiment Analysis on Twitter Based on Ensemble of Psychological and Linguistic Feature Sets. Balkan Journal of Electrical and Computer Engineering. 2018; 6(2): 69 - 77. 10.17694/bajece. 419538
IEEE ONAN A "Sentiment Analysis on Twitter Based on Ensemble of Psychological and Linguistic Feature Sets." Balkan Journal of Electrical and Computer Engineering, 6, ss.69 - 77, 2018. 10.17694/bajece. 419538
ISNAD ONAN, A.. "Sentiment Analysis on Twitter Based on Ensemble of Psychological and Linguistic Feature Sets". Balkan Journal of Electrical and Computer Engineering 6/2 (2018), 69-77. https://doi.org/10.17694/bajece. 419538