DOI: 10.2298/CSIS100322028W
Research on Discovering Deep Web Entries
- College of Computer Science and Technology, Jilin University
130012 Changchun, China - Key Laboratory of Computation and Knowledge Engineering,
Ministry of Education, China
{wangying2010, zuowl, hefl}@jlu.edu.cn, Chenke0616@163.com - College of Mathematics, Jilin University
130012 Changchun, China
lihuilai@jlu.edu.cn - College of Software, Changchun Institute of Technology,
130012 Changchun, China
wangxccs@126.com
Abstract
Ontology plays an important role in locating Domain-Specific Deep Web contents, therefore, this paper presents a novel framework WFF for efficiently locating Domain-Specific Deep Web databases based on focused crawling and ontology by constructing Web Page Classifier(WPC), Form Structure Classifier(FSC) and Form Content Classifier(FCC) in a hierarchical fashion. Firstly, WPC discovers potentially interesting pages based on ontology-assisted focused crawler. Then, FSC analyzes the interesting pages and determines whether these pages subsume searchable forms based on structural characteristics. Lastly, FCC identifies searchable forms that belong to a given domain in the semantic level, and stores these URLs of Domain-Specific searchable forms to a database. Through a detailed experimental evaluation, WFF framework not only simplifies discovering process, but also effectively determines Domain-Specific databases.
Key words
Deep Web, ontology, WPC, FSC, FCC
Digital Object Identifier (DOI)
https://doi.org/10.2298/CSIS100322028W
Publication information
Volume 8, Issue 3 (June 2011)
Year of Publication: 2011
ISSN: 2406-1018 (Online)
Publisher: ComSIS Consortium
Full text
Available in PDF
Portable Document Format
How to cite
Wang, Y., Li, H., Zuo, W., He, F., Wang, X., Chen, K.: Research on Discovering Deep Web Entries. Computer Science and Information Systems, Vol. 8, No. 3, 779-799. (2011), https://doi.org/10.2298/CSIS100322028W