Distance based Clustering of Class Association Rules to Build a Compact, Accurate and Descriptive Classifier
- University of Primorska
Glagoljaška 8, 6000 Koper, Slovenia
jamolbek.mattiev@famnit.upr.si, branko.kavsek@upr.si - Jožef Stefan Institute
Jamova cesta 39, 1000 Ljubljana, Slovenia
Branko.kavsek@ijs.si - Urgench State University
Khamid Alimjan str 14, 220100 Urgench, Uzbekistan
jamolbek_1992@mail.ru
Abstract
Huge amounts of data are being collected and analyzed nowadays. By using the popular rule-learning algorithms, the number of rule discovered on those “big” datasets can easily exceed thousands. To produce compact, understandable and accurate classifiers, such rules have to be grouped and pruned, so that only a reasonable number of them are presented to the end user for inspection and further analysis. In this paper, we propose new methods that are able to reduce the number of class association rules produced by “classical” class association rule classifiers, while maintaining an accurate classification model that is comparable to the ones generated by state-of-the-art classification algorithms. More precisely, we propose new associative classifiers, called DC, DDC and CDC, that use distance-based agglomerative hierarchical clustering as a post-processing step to reduce the number of its rules, and in the rule-selection step, we use different strategies (based on database coverage and cluster center) for each algorithm. Experimental results performed on selected datasets from the UCI ML repository show that our classifiers are able to learn classifiers containing significantly fewer rules than state-of-the-art rule learning algorithms on datasets with a larger number of examples. On the other hand, the classification accuracy of the proposed classifiers is not significantly different from state-of-the-art rule-learners on most of the datasets.
Key words
Frequent Itemset, Class Association Rules (CAR), Associative Classification, Agglomerative Clustering
Digital Object Identifier (DOI)
https://doi.org/10.2298/CSIS200430037M
Publication information
Volume 18, Issue 3 (June 2021)
Year of Publication: 2021
ISSN: 2406-1018 (Online)
Publisher: ComSIS Consortium
Full text
Available in PDF
Portable Document Format
How to cite
Mattiev, J., Kavšek, B.: Distance based Clustering of Class Association Rules to Build a Compact, Accurate and Descriptive Classifier. Computer Science and Information Systems, Vol. 18, No. 3, 791–811. (2021), https://doi.org/10.2298/CSIS200430037M