Modeling a generic web classification system using design patterns
Journal article
Authors/Editors
Strategic Research Themes
No matching items found.
Publication Details
Author list: Sukakanya U., Porkaew K.
Publication year: 2011
Volume number: 6
Issue number: 10
Start page: 2212
End page: 2220
Number of pages: 9
ISSN: 1796-203X
eISSN: 1796-203X
Languages: English-Great Britain (EN-GB)
Abstract
In order to save time in extracting specific information from high volume of data in web documents, this paper proposes an architectural model of generic web document classification system using design patterns for classifying web documents. This work implements two classification techniques for classifying Thai web documents, namely centroid classification and neural network classification, based on the proposed model and compares their classification effectiveness empirically. The training data sets in this experiment consist of 500 web documents of the following five categories (100 documents for each category): mobile phone sales, book sales, travel sales, education information and company profile. Another two hundred and fifty web documents were then used to test the two classifiers. The experiment results showed that the centroid classifier outperforms the neural network classifier both in term of efficiency and effectiveness. ฉ 2011 ACADEMY PUBLISHER.
Keywords
Centroid, Document analyzer, text classification, Web classification modeling