Research on Key Technology of Network Information Extraction Oriented to Web Topic Detection for Big Data

Mo Chen

  1. Business College of Beijing Union University
    A3, Yanjingdongli, Chaoyang District, Beijing, 100025, P.R. China
    mo.chen@buu.edu.cn

Abstract

In the context of today’s big data and numerical intelligence era, this study explores an incremental network information extraction technology for Web topic detection characterized by the semi-structured or unstructured big data as important research object to promote network information detection application. This study takes Web big data as the main research object and proposes an incremental network information extraction idea for Web topic detection. In this idea, the designed algorithm of theme similarity measurement for incremental network information extraction can extract Web instances related to theme, and calculate importance of Web instances related to theme, furthermore, the designed algorithm of incremental instance extraction for Web topic detection can analyze Pattern and BasePattern according to extracted Web instance URL, and conduct segmentation for Web instance title and text content, extract keywords, which are capable of describing Web topic. Experimental results demonstrate that the framework, method, and algorithm proposed in this paper significantly outperform traditional methods in network information extraction. Particularly, the accuracy rate of extracted Web instances that are similar to the theme can reach 0.833, the F-Measure value of extracted Web instances that are similar to the theme under different threshold adjustment is close to 0.83, the accuracy rate of topic detection under the condition of determining the number of Web news instances extracted, the threshold and the parameter value is close to 0.82. The study concludes that the incremental network information extraction idea proposed in this paper is feasible, verifiable, and superior, and can play an important role in reconfiguring numerical intelligence warehouses for detecting Web topic, inferring the Web hierarchical big data propagation path.

Key words

Incremental Network Information Extraction, Big Data, Web Topic Detection

Digital Object Identifier (DOI)

https://doi.org/10.2298/CSIS251030023C

Publication information

Volume 23, Issue 2 (April 2026)
Year of Publication: 2026
ISSN: 2406-1018 (Online)
Publisher: ComSIS Consortium

Full text

Download Available in PDF
Portable Document Format

How to cite

Chen, M.: Research on Key Technology of Network Information Extraction Oriented to Web Topic Detection for Big Data. Computer Science and Information Systems, 23(2), 775–800 (2026). https://doi.org/10.2298/CSIS251030023C