Distributed and Collaborative Web Change Detection System
- Communication and Information Technologies Department
University of A Coruna, Campus de Elvi˜na s/n - 15071 (A Coruna)
{victor.prieto, manuel.alvarez, victor.carneiro, fidel.cacheda}@udc.es
Abstract
Search engines use crawlers to traverse the Web in order to download web pages and build their indexes. Maintaining these indexes up-to-date is an essential task to ensure the quality of search results. However, changes in web pages are unpredictable. Identifying the moment when a web page changes as soon as possible and with minimal computational cost is a major challenge. In this article we present theWeb Change Detection system that, in a best case scenario, is capable to detect, almost in real time, when a web page changes. In a worst case scenario, it will require, on average, 12 minutes to detect a change on a low PageRank web site and about one minute on a web site with high PageRank. Meanwhile, current search engines require more than a day, on average, to detect a modification in a web page (in both cases).
Key words
Content refresh, Incremental crawling, Crawling systems and Search engines
Digital Object Identifier (DOI)
https://doi.org/10.2298/CSIS131120081P
Publication information
Volume 12, Issue 1 (January 2015)
Year of Publication: 2015
ISSN: 2406-1018 (Online)
Publisher: ComSIS Consortium
Full text
Available in PDF
Portable Document Format
How to cite
Prieto, V. M., Álvarez, M., Carneiro, V., Cacheda, F.: Distributed and Collaborative Web Change Detection System. Computer Science and Information Systems, Vol. 12, No. 1, 91-114. (2015), https://doi.org/10.2298/CSIS131120081P