Deep Combination of Stylometry Features in Forensic Authorship Analysis

Sezer, Ebru Akcapinar; Canbay, Pelin; Sever, Hayri

Deep Combination of Stylometry Features in Forensic Authorship Analysis

Pelin CANBAY, (Sütçü İmam Üniversitesi, Bilgisayar Mühendisliği Bölümü, Kahramanmaraş, Türkiye)

EBRU SEZER, (Hacettepe Üniversitesi, Bilgisayar Mühendisliği Bölümü, Ankara, Türkiye)

Hayri SEVER (Çankaya Üniversitesi, Bilgisayar Mühendisliği Bölümü, Ankara, Türkiye)

INTERNATIONAL JOURNAL OF INFORMATION SECURITY SCIENCE

6 1

Yıl: 2020 Cilt: 9 Sayı: 3 Sayfa Aralığı: 154 - 163 Metin Dili: İngilizce İndeks Tarihi: 22-11-2020

Deep Combination of Stylometry Features in Forensic Authorship Analysis

Öz:

Authorship Analysis (AA) in forensic is a process aim to extract information about an author from his/her writings.Forensic AA is needed for detection characteristics of anonymous authors to make better the security of digital media userswho are exposed to disturbing entries such as threats or harassment emails. To analyze whether two anonymous short textswere written by the same author, we propose a combination of stylometry features from different categories in differentprogress. In the majority of the previous AA studies, the stylometric features from different categories are concatenated in apreprocess. In these studies, during the learning process, no category-specific operations are performed; all categories used areevaluated equally. On the other hand, the proposed approach has a separate learning process for each feature category due totheir qualitative and quantitative characteristics and combines these processes at the decision phase by using a Combination ofDeep Neural Networks (C-DNN). To evaluate the Authorship Verification (AV) performance of the proposed approach, wedesigned and implemented a problem-specific Deep Neural Network (DNN) for each stylometry category we used.Experiments were conducted on two English public datasets. The results show that the proposed approach significantlyimproves the generalization ability and robustness of the solutions, and also have better accuracy than the single DNNs.

Anahtar Kelime:

Belge Türü: Makale Makale Türü: Araştırma Makalesi Erişim Türü: Erişime Açık

[1] N. Pokhriyal, K. Tayal, I. Nwogu, and V. Govindaraju, “Cognitive-Biometric Recognition from Language Usage: A Feasibility Study,” Transactions on Information Forensics and Security, vol. 12, no. 1, pp. 134-143, 2016.
[2] T. Neal, K. Sundararajan, and D. Woodard, “Exploiting linguistic style as a cognitive biometric for continuous verification,” in Proceedings - 2018 International Conference on Biometrics, ICB 2018, pp. 270–276, 2018.
[3] T. Neal, K. Sundararajan, A. Fatima, Y. Yan, Y. Xiang, and D. Woodard, “Surveying stylometry techniques and applications,” ACM Computing Surveys (CSUR), vol. 50, no. 6, pp. 1-36, 2017.
[4] P. Juola, “Authorship attribution,” vol. 3, Now Publishers Inc, 2008.
[5] M. L. Brocardo, I. Traore, S. Saad, and I. Woungang, “Authorship verification for short messages using stylometry,” in 2013 International Conference on Computer, Information and Telecommunication Systems, CITS 2013, pp. 1-6, 2013.
[6] M. Koppel, J. Schlier, and S. Argamon, “Computational methods in authorship attribution,” Journal of the American Society for information Science and Technology, vol. 60, no. 1, pp. 9–26, 2009.
[7] M. Koppel, J. Schler, S. Argamon, and Y. Winter, “The ‘Fundamental Problem’ of Authorship Attribution,” English Studies, vol. 93, no. 3, pp. 284–291, 2012.
[8] F. Iqbal, H. Binsalleeh, B. C. M. Fung, and M. Debbabi, “Mining writeprints from anonymous e-mails for forensic investigation,” Digital Investigation, vol. 7, no. 1-2, pp. 56- 64, 2010.
[9] E. Stamatatos, “Authorship verification: a review of recent advances,” Research in Computing Science, vol. 123, pp. 9-25, 2016.
[10] M. Litvak, “Deep dive into authorship verification of email messages with convolutional neural network,” Annual International Symposium on Information Management and Big Data. Springer, Cham, pp. 129-136, 2018.
[11] S. H. H. Ding, B. C. M. Fung, F. Iqbal, and W. K. Cheung, “Learning stylometric representations for authorship analysis,” IEEE transactions on cybernetics, vol. 49, no. 1, pp. 107-121, 2019.
[12] B. Boenninghoff, R. M. Nickel, S. Zeiler, and D. Kolossa, “Similarity Learning for Authorship Verification in Social Media,” in ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 2457-2461, 2019.
[13] R. Zheng, J. Li, H. Chen, and Z. Huang, “A framework for authorship identification of online messages: Writingstyle features and classification techniques,” Journal of the American society for information science and technology, vol. 57, no. 3, pp. 378-393, 2006.
[14] O. Halvani, C. Winter, and A. Pflug, “Authorship verification for different languages, genres and topics,” Digital Investigation, vol. 16, pp. S33–S43, 2016.
[15] P. Varela, E. Justino, A. Britto, and F. Bortolozzi, “A computational approach for authorship attribution of literary texts using sintatic features,” in Proceedings of the International Joint Conference on Neural Networks (IJCNN). IEEE, pp.4835-4842, 2016.
[16] J. Dunn, S. Argamon, A. Rasooli, and G. Kumar, “Profile-based authorship analysis,” Digital Scholarship in the Humanities, vol. 31, no. 4, pp. 689-710., 2016.
[17] S. Afroz, A. Caliskan-Islam, A. Stolerman, R. Greenstadt, and D. McCoy, “Doppelgänger finder: Taking stylometry to the underground,” in Proceedings - IEEE Symposium on Security and Privacy, pp. 212-226, 2014.
[18] Z. Ahmad and J. Zhang, “Selective combination of multiple neural networks for improving model prediction in nonlinear systems modelling through forward selection and backward elimination,” Neurocomputing, vol. 72, no. 4-6, pp. 1198-1204, 2009.
[19] E. Stamatatos, “A survey of modern authorship attribution methods,” Journal of the American Society for information Science and Technology, vol. 60, no. 3, pp. 538- 556, 2009.
[20] P. Rosso, M. Potthast, B. Stein, E. Stamatatos, F. Rangel, and W. Daelemans, “Evolution of the PAN Lab on Digital Text Forensics,” In Information Retrieval Evaluation in a Changing World, Springer, Cham, pp. 461-485, 2019.
[21] E. Stamatatos, G. Kokkinakis, and N. Fakotakis, “Automatic text categorization in terms of genre and author,” Computational Linguistics, vol. 26, no. 4, pp. 471–495, 2000.
[22] M. Koppel and Y. Winter, “Determining if two documents are written by the same author,” Journal of the Association for Information Science and Technology., vol. 65, no. 1, pp. 178–187, 2014.
[23] S. Adamovic, V. Miskovic, M. Milosavljevic, M. Sarac, and M. Veinovic, “Automated language‐independent authorship verification (for Indo‐European languages),” Journal of the Association for Information Science and Technology, vol. 70, no. 8, pp. 858–871, 2019.
[24] A. Abbasi and H. Chen, “Applying authorship analysis to extremist-group Web forum messages,” IEEE Intelligent Systems., vol. 20, no. 5, pp. 67-75, 2005.
[25] P. Shrestha, S. Sierra, F. A. González, P. Rosso, M. Montes-Y-Gómez, and T. Solorio, “Convolutional neural networks for authorship attribution of short texts,” in 15th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2017, pp. 669-674, 2017.
[26] Jafariakinabad, Fereshteh, S. Tarnpradab, and K. A. Hua, “Syntactic Neural Model for Authorship Attribution,” in The Thirty-Third International Flairs Conference, pp. 234- 239, 2020.
[27] F. Jafariakinabad and K. A. Hua, “Style-aware neural model with application in authorship attribution,” 18th IEEE International Conference on Machine Learning and Applications, ICMLA 2019, pp. 325-328, 2019.
[28] M. L. Brocardo, I. Traore, I. Woungang, and M. S. Obaidat, “Authorship verification using deep belief network systems,” International Journal of Communication Systems, vol. 30, no. 12, e3259, 2017.
[29] M. Koppel and J. Schler, “Authorship verification as a one-class classification problem,” in Proceedings, TwentyFirst International Conference on Machine Learning, ICML 2004, pp. 489–495, 2004.
[30] E. Stamatatos et al., “Overview of the author identification task at PAN 2015,” in CEUR Workshop Proceedings, vol. 1391, pp. 1–8, 2015.
[31] S. Seidman, “Authorship verification using the impostors method: Notebook for PAN at CLEF 2013,” in CEUR Workshop Proceedings, vol. 1179, pp. 23-26, 2013.
[32] C. Sanderson and S. Guenter, “Short text authorship attribution via sequence kernels, Markov chains and author unmasking: An investigation,” in COLING/ACL 2006 - EMNLP 2006: 2006 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference, pp. 482–491, 2006.
[33] W. Liu, Z. Wang, X. Liu, N. Zeng, Y. Liu, and F. E. Alsaadi, “A survey of deep neural network architectures and their applications,” Neurocomputing, vol. 234, pp. 11-26, 2017.

APA	Canbay P, Sezer E, Sever H (2020). Deep Combination of Stylometry Features in Forensic Authorship Analysis. , 154 - 163.
Chicago	Canbay Pelin,Sezer Ebru Akcapinar,Sever Hayri Deep Combination of Stylometry Features in Forensic Authorship Analysis. (2020): 154 - 163.
MLA	Canbay Pelin,Sezer Ebru Akcapinar,Sever Hayri Deep Combination of Stylometry Features in Forensic Authorship Analysis. , 2020, ss.154 - 163.
AMA	Canbay P,Sezer E,Sever H Deep Combination of Stylometry Features in Forensic Authorship Analysis. . 2020; 154 - 163.
Vancouver	Canbay P,Sezer E,Sever H Deep Combination of Stylometry Features in Forensic Authorship Analysis. . 2020; 154 - 163.
IEEE	Canbay P,Sezer E,Sever H "Deep Combination of Stylometry Features in Forensic Authorship Analysis." , ss.154 - 163, 2020.
ISNAD	Canbay, Pelin vd. "Deep Combination of Stylometry Features in Forensic Authorship Analysis". (2020), 154-163.

APA	Canbay P, Sezer E, Sever H (2020). Deep Combination of Stylometry Features in Forensic Authorship Analysis. INTERNATIONAL JOURNAL OF INFORMATION SECURITY SCIENCE, 9(3), 154 - 163.
Chicago	Canbay Pelin,Sezer Ebru Akcapinar,Sever Hayri Deep Combination of Stylometry Features in Forensic Authorship Analysis. INTERNATIONAL JOURNAL OF INFORMATION SECURITY SCIENCE 9, no.3 (2020): 154 - 163.
MLA	Canbay Pelin,Sezer Ebru Akcapinar,Sever Hayri Deep Combination of Stylometry Features in Forensic Authorship Analysis. INTERNATIONAL JOURNAL OF INFORMATION SECURITY SCIENCE, vol.9, no.3, 2020, ss.154 - 163.
AMA	Canbay P,Sezer E,Sever H Deep Combination of Stylometry Features in Forensic Authorship Analysis. INTERNATIONAL JOURNAL OF INFORMATION SECURITY SCIENCE. 2020; 9(3): 154 - 163.
Vancouver	Canbay P,Sezer E,Sever H Deep Combination of Stylometry Features in Forensic Authorship Analysis. INTERNATIONAL JOURNAL OF INFORMATION SECURITY SCIENCE. 2020; 9(3): 154 - 163.
IEEE	Canbay P,Sezer E,Sever H "Deep Combination of Stylometry Features in Forensic Authorship Analysis." INTERNATIONAL JOURNAL OF INFORMATION SECURITY SCIENCE, 9, ss.154 - 163, 2020.
ISNAD	Canbay, Pelin vd. "Deep Combination of Stylometry Features in Forensic Authorship Analysis". INTERNATIONAL JOURNAL OF INFORMATION SECURITY SCIENCE 9/3 (2020), 154-163.