Yıl: 2016 Cilt: 4 Sayı: 3 Sayfa Aralığı: 49 - 56 Metin Dili: İngilizce İndeks Tarihi: 29-07-2022

Using Word Embeddings for Ontology Enrichment

Öz:
Word embeddings, distributed word representations in a reduced linear space, show a lot of promise for accomplishing Natural Language Processing (NLP) tasks in an unsupervised manner. In this study, we investigate if the success of word2vec, a Neural Networks based word embeddings algorithm, can be replicated in an agglutinative language like Turkish. Turkish is more challenging than languages like English for complex NLP tasks because of her rich morphology. We picked ontology enrichment, again a relatively harder NLP task, as our test application. Firstly, we show how ontological relations can be extracted automatically from Turkish Wikipedia to construct a gold standard. Then by running experiments we show that the word vector representations produced by word2vec are useful to detect ontological relations encoded in Wikipedia. We propose a simple but yet effective weakly supervised ontology enrichment algorithm where for a given word a few know ontologically related concepts coupled with similarity scores computed via word2vec models can result in discovery of other related concepts. We argue how our algorithm can be improved and augmented to make it a viable component of an ontology learning and population framework
Anahtar Kelime:

Konular: Bilgisayar Bilimleri, Yapay Zeka
Belge Türü: Makale Makale Türü: Araştırma Makalesi Erişim Türü: Erişime Açık
  • Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P. 2011. Natural language processing (almost) from scratch. The Journal of Machine Learning Research. 12:2493-537.
  • Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. In Proceedings of Workshop at ICLR.
  • Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J. 2013. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems (pp. 3111-3119).
  • Petasis G, Karkaletsis V, Paliouras G, Krithara A, Zavitsanos E. 2011. Ontology Population and Enrichment: State of the Art. In Knowledge-Driven Multimedia Information Extraction and Ontology Evolution (pp. 134-166). Springer- Verlag.
  • Zouaq A, Gasevic D, Hatala M. 2011. Towards Open Ontology Learning and Filtering. Information Systems. 36(7):1064-81.
  • Tanev H, Magnini B. 2008. Weakly supervised approaches for ontology population. In Proceeding of the 2008 conference on Ontology Learning and Population: Bridging the Gap between Text and Knowledge (pp. 129-143).
  • Rong X. 2014. word2vec parameter learning explained. arXiv preprint arXiv:1411.2738.
  • Pennington J, Socher R, Manning CD. 2014. Glove: Global Vectors for Word Representation. In EMNLP 2014 (Vol. 14, pp. 1532-1543).
  • Ji S, Yun H, Yanardag P, Matsushima S, Vishwanathan SV. 2015. WordRank: Learning Word Embeddings via Robust Ranking. arXiv preprint arXiv:1506.02761.
  • Le QV, Mikolov T. 2014. Distributed representations of sentences and documents. arXiv preprint arXiv:1405.4053.
  • Barkan O, Koenigstein N. 2016. Item2Vec: Neural Item Embedding for Collaborative Filtering. arXiv preprint arXiv:1603.04259.
  • Perozzi B, Al-Rfou R, Skiena S. 2014. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 701-710). ACM.
  • Vilnis L, McCallum A. 2015. Word representations via gaussian embedding. In Proceedings of International Conference on Learning Representations 2015.
  • Arora S, Li Y, Liang Y, Ma T, Risteski A. 2015. Random walks on context spaces: Towards an explanation of the mysteries of semantic word embeddings. arXiv preprint arXiv:1502.03520.
  • Levy O, Goldberg Y. 2014. Neural word embedding as implicit matrix factorization. In Advances in Neural Information Processing Systems 2014 (pp. 2177-2185).
  • Tamagawa S, Sakurai S, Tejima T, Morita T, Izumi N, Yamaguchi T. 2010. Learning a large scale of ontology from Japanese wikipedia. In Web Intelligence and Intelligent Agent International Conference on 2010 Aug 31 (Vol. 1, pp. 279- 286). IEEE. (WI-IAT), IEEE/WIC/ACM infobox ontology. In Proceedings of the 17th international conference on World Wide Web (pp. 635-644). ACM.
  • Janik M, Kochut KJ. 2008. Wikipedia in action: Ontological knowledge in text categorization. In Semantic Computing, 2008 IEEE International Conference (pp. 268-275). IEEE.
  • Kim HJ, Hong KJ. 2015. Building Semantic Concept Networks by Wikipedia-Based Formal Concept Analysis. Advanced Science Letters. 21(3):435-8.
  • Lehmann J, Isele R, Jakob M, Jentzsch A, Kontokostas D, Mendes PN, Hellmann S, Morsey M, van Kleef P, Auer S, Bizer C. 2015. DBpedia-a large-scale, multilingual knowledge base extracted from Wikipedia. Semantic Web. 6(2):167-95.
  • Hoffart J, Suchanek FM, Berberich K, Weikum G. 2013. YAGO2: A spatially and temporally enhanced knowledge base from Wikipedia. Artificial Intelligence. 194:28-61.
  • Hearst MA. 1992. Automatic acquisition of hyponyms from large text corpora. InProceedings of the 14th conference on Computational linguistics-Volume 2 (pp. 539-545). Association for Computational Linguistics.
  • Maynard D, Funk A, Peters W. 2008. Using lexico-syntactic ontology design patterns for ontology creation and population. In Proc. of the Workshop on Ontology Patterns.
  • Yeh E, Ramage D, Manning CD, Agirre E, Soroa A. 2009. WikiWalk: random walks on Wikipedia for semantic relatedness. InProceedings of the 2009 Workshop on Graph- based Methods for Natural Language Processing (pp. 41-49). Association for Computational Linguistics.
  • Zesch T, Gurevych I. 2007. Analysis of the Wikipedia category graph for NLP applications. InProceedings of the TextGraphs-2 Workshop (NAACL-HLT 2007) (pp. 1-8).
  • Van der Maaten L, Hinton G. 2008. Visualizing High- Dimensional Data Using t-SNE. Journal of Machine Learning Research. 9(2579-2605):85.
  • Rehurek R., Sojka P. 2010. Software framework for topic modelling with large corpora. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks.
APA PEMBECİ İ (2016). Using Word Embeddings for Ontology Enrichment. , 49 - 56.
Chicago PEMBECİ İzzet Using Word Embeddings for Ontology Enrichment. (2016): 49 - 56.
MLA PEMBECİ İzzet Using Word Embeddings for Ontology Enrichment. , 2016, ss.49 - 56.
AMA PEMBECİ İ Using Word Embeddings for Ontology Enrichment. . 2016; 49 - 56.
Vancouver PEMBECİ İ Using Word Embeddings for Ontology Enrichment. . 2016; 49 - 56.
IEEE PEMBECİ İ "Using Word Embeddings for Ontology Enrichment." , ss.49 - 56, 2016.
ISNAD PEMBECİ, İzzet. "Using Word Embeddings for Ontology Enrichment". (2016), 49-56.
APA PEMBECİ İ (2016). Using Word Embeddings for Ontology Enrichment. International Journal of Intelligent Systems and Applications in Engineering, 4(3), 49 - 56.
Chicago PEMBECİ İzzet Using Word Embeddings for Ontology Enrichment. International Journal of Intelligent Systems and Applications in Engineering 4, no.3 (2016): 49 - 56.
MLA PEMBECİ İzzet Using Word Embeddings for Ontology Enrichment. International Journal of Intelligent Systems and Applications in Engineering, vol.4, no.3, 2016, ss.49 - 56.
AMA PEMBECİ İ Using Word Embeddings for Ontology Enrichment. International Journal of Intelligent Systems and Applications in Engineering. 2016; 4(3): 49 - 56.
Vancouver PEMBECİ İ Using Word Embeddings for Ontology Enrichment. International Journal of Intelligent Systems and Applications in Engineering. 2016; 4(3): 49 - 56.
IEEE PEMBECİ İ "Using Word Embeddings for Ontology Enrichment." International Journal of Intelligent Systems and Applications in Engineering, 4, ss.49 - 56, 2016.
ISNAD PEMBECİ, İzzet. "Using Word Embeddings for Ontology Enrichment". International Journal of Intelligent Systems and Applications in Engineering 4/3 (2016), 49-56.