Topic-Sensitive Multi-document Summarization Algorithm
- School of Information Science & Engineering, Dalian Polytechnic University
liuna@dlpu.edu.cn
svesna@agrif.bg.ac.rs, paja@agrif.bg.ac.rs - School of Computer and Information Technology, Liaoning Normal University
116029 Dalian, China
Abstract
Latent Dirichlet Allocation (LDA) has been used to generate text corpora topics recently. However, not all the estimated topics are of equal importance or correspond to genuine themes of the domain. Some of the topics can be a collection of irrelevant words or represent insignificant themes. This paper proposed a topic-sensitive algorithm for multi-document summarization. This algorithm uses LDA model and weight linear combination strategy to identify significance topic which is used in sentence weight calculation. Each topic is measured by three different LDA criteria. Significance topic is evaluated by using weight linear combination to combine the multi-criteria. In addition to topic features, the proposed approach also considered some statistics features, such as term frequency, sentence position, sentence length, etc. It not only highlights the advantages of statistics features, but also cooperates with topic model. The experiments showed that the proposed algorithm achieves better performance than the other state-of-the-art algorithms on DUC2002 corpus.
Key words
multi-document summarization, LDA, topic model, weighted linear combination
Digital Object Identifier (DOI)
https://doi.org/10.2298/CSIS140815060N
Publication information
Volume 12, Issue 4 (November 2015)
Special Issue on Recent Advances in Information Processing, Parallel and Distributed Computing
Year of Publication: 2015
ISSN: 2406-1018 (Online)
Publisher: ComSIS Consortium
Full text
Available in PDF
Portable Document Format
How to cite
Na, L., Tang, D., Ying, L., Xiao-jun, T., Hai-wen, W.: Topic-Sensitive Multi-document Summarization Algorithm. Computer Science and Information Systems, Vol. 12, No. 4, 1375–1389. (2015), https://doi.org/10.2298/CSIS140815060N