Topic-Sensitive Multi-document Summarization Algorithm

Liu Na1, Di Tang2, Lu Ying1, Tang Xiao-jun1 and Wang Hai-wen1

  1. School of Information Science & Engineering, Dalian Polytechnic University
    liuna@dlpu.edu.cn
    svesna@agrif.bg.ac.rs, paja@agrif.bg.ac.rs
  2. School of Computer and Information Technology, Liaoning Normal University
    116029 Dalian, China

Abstract

Latent Dirichlet Allocation (LDA) has been used to generate text corpora topics recently. However, not all the estimated topics are of equal importance or correspond to genuine themes of the domain. Some of the topics can be a collection of irrelevant words or represent insignificant themes. This paper proposed a topic-sensitive algorithm for multi-document summarization. This algorithm uses LDA model and weight linear combination strategy to identify significance topic which is used in sentence weight calculation. Each topic is measured by three different LDA criteria. Significance topic is evaluated by using weight linear combination to combine the multi-criteria. In addition to topic features, the proposed approach also considered some statistics features, such as term frequency, sentence position, sentence length, etc. It not only highlights the advantages of statistics features, but also cooperates with topic model. The experiments showed that the proposed algorithm achieves better performance than the other state-of-the-art algorithms on DUC2002 corpus.

Key words

multi-document summarization, LDA, topic model, weighted linear combination

Digital Object Identifier (DOI)

https://doi.org/10.2298/CSIS140815060N

Publication information

Volume 12, Issue 4 (November 2015)
Special Issue on Recent Advances in Information Processing, Parallel and Distributed Computing
Year of Publication: 2015
ISSN: 2406-1018 (Online)
Publisher: ComSIS Consortium

Full text

DownloadAvailable in PDF
Portable Document Format

How to cite

Na, L., Tang, D., Ying, L., Xiao-jun, T., Hai-wen, W.: Topic-Sensitive Multi-document Summarization Algorithm. Computer Science and Information Systems, Vol. 12, No. 4, 1375–1389. (2015), https://doi.org/10.2298/CSIS140815060N