Learning Syntactic Tagging of Macedonian Language

Martin Bonchanoski1 and Katerina Zdravkova2

  1. University Ss Cyril and Methodius, Faculty of Computer Science and Engineering
    1000 Skopje, Macedonia, Dublin, Ireland
    martinboncanoski@gmail.com
  2. University Ss Cyril and Methodius, Faculty of Computer Science and Engineering
    1000 Skopje, Macedonia
    katerina.zdravkova@finki.ukim.mk

Abstract

This paper presents the creation of machine learning based systems for Part-of-speech tagging of Macedonian language. Four well-known PoS tagger systems implemented for English and Slavic languages: TnT, cyclic dependency network, guided learning framework for bidirectional sequence classification, and dynamic features induction were trained. Orwell’s novel “1984” was manually tagged from the authors and it was used split into training and test set. After the training of the models, a comparison between the models was made. At the end, a POS tagger with an accuracy that reaches 97.5% was achieved, making it very appropriate for the future grammatical tagging of the National corpus of Macedonian language, which is currently in its initial stage. The Part-of-speech tagger that was create is published online and free to use.

Key words

Part-of-speech tagging, TnT tagger, Cyclic dependency network, Guided learning for bidirectional sequence classification, Dynamic features induction

Digital Object Identifier (DOI)

https://doi.org/10.2298/CSIS180310027B

Publication information

Volume 15, Issue 3 (October 2018)
Year of Publication: 2018
ISSN: 2406-1018 (Online)
Publisher: ComSIS Consortium

Full text

DownloadAvailable in PDF
Portable Document Format

How to cite

Bonchanoski, M., Zdravkova, K.: Learning Syntactic Tagging of Macedonian Language. Computer Science and Information Systems, Vol. 15, No. 3, 799–820. (2018), https://doi.org/10.2298/CSIS180310027B