Using Part-of-Speech Tags as Deep-Syntax Indicators in Determining Short-Text Semantic Similarity
- School of Electrical Engineering
Bulevar kralja Aleksandra 73, 11120 Belgrade, Serbia
bv115045p@student.etf.bg.ac.rs - School of Electrical Engineering
Bulevar kralja Aleksandra 73, 11120 Belgrade, Serbia
bojic@etf.rs
Abstract
This paper presents POST STSS, a method of determining short-text semantic similarity in which part-of-speech tags are used as indicators of the deeper syntactic information usually extracted by more advanced tools like parsers and semantic role labelers. Our model employs a part-of-speech weighting scheme and is based on a statistical bag-of-words approach. It does not require either hand-crafted knowledge bases or advanced syntactic tools, which makes it easily applicable to languages with limited natural language processing resources. By using a paraphrase recognition test, we demonstrate that our system achieves a higher accuracy than all existing statistical similarity algorithms and solutions of a more structural kind.
Key words
short-text semantic similarity, statistical similarity, corpus-based measures, part-of-speech tags, POS weighting, syntactic information, bag-of-words model, natural language processing
Digital Object Identifier (DOI)
https://doi.org/10.2298/CSIS131127082B
Publication information
Volume 12, Issue 1 (January 2015)
Year of Publication: 2015
ISSN: 2406-1018 (Online)
Publisher: ComSIS Consortium
Full text
Available in PDF
Portable Document Format
How to cite
Batanović, V., Bojić, D.: Using Part-of-Speech Tags as Deep-Syntax Indicators in Determining Short-Text Semantic Similarity. Computer Science and Information Systems, Vol. 12, No. 1, 1–31. (2015), https://doi.org/10.2298/CSIS131127082B