Signature-based Tree for Finding Frequent Itemsets
Abstract
The efficiency of a data mining process depends on the data structure used to find frequent itemsets. Two approaches are possible: use the original transaction dataset or transform it into another more compact structure. Many algorithms use trees as compact structure, like FP-Tree and the associated algorithm FP-Growth. Although this structure reduces the number of scans (only 2), its efficiency depends on two criteria: (i) the size of the support (small or large); (ii) the type of transaction dataset (sparse or dense). But these two criteria can generate very large trees. In this paper, we propose a new tree-based structure that emphasizes on transactions and not on itemsets. Hence, we avoid the problem of support values that have a negative impact on the generated tree.
Keywords
Data Mining, Data compression, Data storage, Tree structure, Signature
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
M. El Hadi Benelhadj, M. Mahmoud Deye and Y. Slimani, "Signature-based Tree for Finding Frequent Itemsets," in Journal of Communications Software and Systems, vol. 19, no. 1, pp. 70-80, March 2023, doi: https://doi.org/10.24138/jcomss-2022-0065
@article{el-hadi-benelhadj2023signaturebased,
author = {Mohamed El Hadi Benelhadj and Mohamed Mahmoud Deye and Yahya Slimani},
title = {Signature-based Tree for Finding Frequent Itemsets},
journal = {Journal of Communications Software and Systems},
month = {3},
year = {2023},
volume = {19},
number = {1},
pages = {70--80},
doi = {https://doi.org/10.24138/jcomss-2022-0065},
url = {https://doi.org/https://doi.org/10.24138/jcomss-2022-0065}
}