A New Approximate Method For Mining Frequent Itemsets From Big Data

Timur Valiullin¹, Joshua Zhexue Huang¹, Chenghao Wei¹, Jianfei Yin¹, Dingming Wu¹ and Iuliia Egorova¹

Big Data Institute, College of Computer Science and Software Engineering
Shenzhen Univresity, 518000 Shenzhen, China
{timur,zx.huang}@szu.edu.cn

Abstract

Mining frequent itemsets in transaction databases is an important task in many applications. It becomes more challenging when dealing with a large transaction database because traditional algorithms are not scalable due to the memory limit. In this paper, we propose a new approach for approximately mining of frequent itemsets in a big transaction database. Our approach is suitable for mining big transaction databases since it produces approximate frequent itemsets from a subset of the entire database, and can be implemented in a distributed environment. Our algorithm is able to efficiently produce high-accurate results, however it misses some true frequent itemsets. To address this problem and reduce the number of false negative frequent itemsets we introduce an additional parameter to the algorithm to discover most of the frequent itemsets contained in the entire data set. In this article, we show an empirical evaluation of the results of the proposed approach.

Key words

Approximate Method, Frequent Itemsets Mining, Random Sample Partition, Big Transaction Database

Digital Object Identifier (DOI)

https://doi.org/10.2298/CSIS200124015V

Publication information

Volume 18, Issue 3 (June 2021)
Year of Publication: 2021
ISSN: 2406-1018 (Online)
Publisher: ComSIS Consortium

Full text

Download Available in PDF
Portable Document Format

How to cite

Valiullin, T., Huang, J. Z., Wei, C., Yin, J., Wu, D., Egorova, I.: A New Approximate Method For Mining Frequent Itemsets From Big Data. Computer Science and Information Systems, Vol. 18, No. 3, 641–656. (2021), https://doi.org/10.2298/CSIS200124015V