A Pilot Study of Multi-Method Evaluation of Machine Translation in Macedonian
-
Faculty of Computer Science and Engineering
jana.kuzmanova@finki.ukim.mk -
Faculty of Computer Science and Engineering
katerina.zdravkova@finki.ukim.mk (corresponding author)
Abstract
This pilot study offers a linguistic evaluation of six machine translation systems: GPT-4o, GPT-5, Gemini 2.5 Flash, Google Translate, Microsoft Translator, and NLLB-600M applied to the translation of a short excerpt of Orwell’s “1984” into Macedonian. The analysis consisted of three interconnected experiments: manual annotation of translation errors and comparison with human output, evaluation using eight popular MT metrics, and sentence-level similarity analysis via cosine similarity, Jaccard similarity, and Levenshtein distance. Manual annotation revealed that stylistic errors (48.47%) and linguistic errors (34.54%) were the most common. The LLMs outperformed other systems, particularly GPT-5, while NLLB-600M performed poorly, often introducing incomprehensible sentences or non-existent words. Metrics-based evaluation showed that lexical metrics sometimes penalized fluent and accurate translations that deviated from the reference. Sentence similarity analysis confirmed that accurate translations were more consistent, while wrong–wrong sentence pairs were more divergent, especially in Levenshtein scores. The findings underscore the importance of combining manual and metric-based evaluation to fully understand MT quality, particularly in low-resource settings.
Key words
Machine Translation, Manual Evaluation and Annotation, Linguistic Similarity, Low Resource Language
Digital Object Identifier (DOI)
https://doi.org/10.2298/CSIS250910201K
Publication information
Volume 23, Issue 2 (April 2026)
Year of Publication: 2026
ISSN: 2406-1018 (Online)
Publisher: ComSIS Consortium
Full text
Available in PDF
Portable Document Format
How to cite
Kuzmanova, J., Zdravkova, K.: A Pilot Study of Multi-Method Evaluation of Machine Translation in Macedonian. Computer Science and Information Systems, 23(2), 827–860 (2026). https://doi.org/10.2298/CSIS250910201K
Journal's Facebook page