基于FPMAX的最大频繁项目集挖掘改进算法被引量：9

Mining Maximal Frequent Item Sets with Improved Algorithm of FPMAX

在线阅读下载PDF

导出

摘要挖掘事务数据库中的最大频繁项目集是数据挖掘领域一个重要的研究方向。基于FP-tree的FPMAX算法是目前较为高效与稳定的最大频繁项目集挖掘算法之一。然而对于稠密数据库中的挖掘,FPMAX会产生大量的冗余递归过程,导致额外的条件FP-tree构造开销。而且在支持度较低时,FPMAX则会因用于超集检测的全局MFItree较为庞大而导致超集检测的性能下降。为此提出FPMAX的改进算法FPMAX-reduce,其通过采用基于事务共同后缀的前瞻剪枝策略来减少挖掘过程中的冗余递归过程。当递归过程中产生的新条件FP-tree规模较小时,FPMAX-reduce通过构造条件MFI-tree来减小后续超集检测遍历的开销。性能试验表明,FPMAX-reduce算法通过有效的前瞻剪枝,在稠密事务数据库以及低支持度的情况下至多可将递归过程减少至原算法的一半以下,进而有效地提高了FPMAX算法的效率。 Finding maximal frequent itemsets is an important issue in data mining research field. The FPMAX algo- rithm, which is based on the FP-tree structure, has been proved to be one of the high-performance algorithms on maxi- mal frequent itemsets mining. But for data mining task in dense datasets, FPMAX algorithm will construct a large num- ber of redundant conditional FP-tree. What＇s more, if the quantity of frequent itemsets is large, the MFI-tree structure used for subset testing in FPMAX will become quite big, decreasing the efficiency of subset testing in the algorithm. Therefore, this paper proposed the FPMAX-reduce algorithm to overcome those drawbacks of FPMAX. This novel al- gorithm uses a pruning technique based on the common suffix of transactions and greatly reduces the construction of re- dundant conditional FP-tree. Besides,when the scale of the newly constructed conditional FP-tree is small, FPMAX-re- duce constructs a corresponding conditional MFI-tree, which deletes the redundant information, to improves the efficien- cy of subset testing in the following recursive calls. Experimental results show that FPMAX-reduce algorithm effective- ly improves the efficiency of FPMAX and outperforms many existing available algorithms in dense datasets.

作者牛新征佘堃

机构地区电子科技大学计算机科学与工程学院

出处《计算机科学》 CSCD 北大核心 2013年第12期223-228,共6页 Computer Science

基金国家自然科学基金(61300192) 四川省科技厅科技支撑计划项目(2012GZ0061) 中央高校基本科研业务费电子科技大学项目(ZYGX2010J075)资助

关键词频繁项目集最大频繁项目集 FP-TREE FPMAX FP-GROWTH Frequent itemset, Maximal frequent itemset, FP-tree, FPMAX, FP-growth

分类号 TP391 [自动化与计算机技术—计算机应用技术]

作者简介牛新征（1978--），男，博士，副教授，主要研究方向为数据挖掘、网络计算，E-mail：xinzhengniu@uestc．edu．cn 佘堃（1968--），男，博士，教授，博士生导师，主要研究方向为网络计算、人工智能。

引文网络
相关文献

参考文献11

1BayardoJr R J. Efficiently mining long patterns from databases [C] // Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data. New York, 1998:85-93.
2Burdick D, Calimlim M,Flannick J, et al. Mafia: A maximal fre-quent itemsetalgorithm[J]. IEEE Transactions on Knowledge and Data Engineering, 2005,17(11) : 1490-1504.
3Gouda K, Zaki M J. Efficiently mining maximal frequent item- sets[C]//Proceedings of the 2001 IEEE International Confe- rence on Data Mining. San Jose, 2001:163-170.
4吉根林,杨明,宋余庆,孙志挥.最大频繁项目集的快速更新[J].计算机学报,2005,28(1):128-135. 被引量：47
5Uno T, Kiyomi M, Arimura H. LCM vet. 2:Efficient mining al- gorithms for frequent/closed/maximal itemsets [C] // Procee- dings of the 2004.IEEE ICDM Workshop on Frequent Itemset Mining Implementations. Brighton, 2004:1-11.
6Uno T,Kiyomi M, Arimura H. Lcm vero 3: Collaboration of ar- ray,bitmap and prefix tree for frequent itemset mining[C]/// Proceedings of the 1st International Workshop on Open Source Data Minlng: Frequent Pattern Mining Implementations. NewYork, 2005,77-86.
7Grahne G,Zhu J. High performance mining of maximal frequent itemsets[C]Proceedings of the 6th International Workshop on High Performance Data Mining. 2003:1-10.
8Grahne G, Zhu J. Efficiently using prefix-trees in mining fre- quent itemsets[C]Proeeedings of the Third FIMI Workshop on Frequent Itemset Mining Implementations. Florida, 2003: 123-132.
9Goethals B, Zaki M J. Advances in frequent itemset mining im- p]ementations: report on FIMI' 03 [J]. ACM SIGKDD Explora- tions Newsletter, 2004,6 (1) : 109-1 17.
10Han J, Pei J, Yin Y. Mining frequent patterns without candidate generation[J]. ACM SIGMOD Record, 2000,29(2) : 1-12.

二级参考文献22

1Wikipedia. K-Means clustering [EB/OL]. http://en, wikipedia. org/wiki/K-Means.
2Kantabutra S, Couch A L Parallel K-Means Clustering Algo- rithm on NOWS[J]. Technical Journal, 2000,6 (1) : 243-247.
3Forman G, Zhang B. Distributed Data Clustering can be Efficient and Exact[J]. SIGKDD Explorations, 2000,2 (2) : 34-38.
4Boutsinas B, Gnardellis T. On Distributing the Clustering Pro- cess[J]. Patter Recognition Letters, 2002,23(4) : 999-1008.
5梁红李伟生.XML文档的并行聚类算法.计算机科学,2004,31(10):243-245.
6Quinn M J. ParM: Pallel Programming in C with MPI and OpenMP[S]. Beijing: Tsinghua University Press, 2005.
7Han J.W.,Kamber M..Data Mining:Concepts and Techniques.Beijing:Higher Education Press,2001.
8Agrawal R.,ImielinSki T.,Swami A..Mining association rules between sets of items in large database.In:Proceedings of the ACM SIGMOD International Conference on Managementof Data,Washington,DC,1993,2:207-216.
9Srikant A.R..Fast algorithms for mining association rules.In:Proceedings of the 20th International Conference Very Large Data Bases(VLDB’94).Santiago,Chile,1994,487-499.
10Han J.W.,Pei J.,Yin Y..Mining partial periodicity using frequent pattern tree.Simon Fraser University:Technical Report TR-99-10,1999.

共引文献67

1谢志强,朱孟杰,杨静.基于改进FP-树的最大项目集挖掘算法[J].计算机应用研究,2009,26(2):502-505. 被引量：1
2王涛伟.基于Web日志的频繁访问页面挖掘研究[J].计算机系统应用,2006,15(10):30-34. 被引量：1
3卓月明,覃遵跃,胡斌.基于Rough集的单维布尔关联规则的挖掘算法[J].吉首大学学报（自然科学版）,2006,27(4):64-67.
4胡斌,蒋外文,黄天强,陈生萍,施渊.一种最大频繁项集快速更新算法[J].计算机应用研究,2006,23(12):81-83.
5王涛伟,胡锡伟,柴本成.基于SQL的频繁项目集的研究[J].计算机工程与设计,2006,27(23):4494-4497. 被引量：4
6DONG Yihong,ZHUANG Yueting,TAI Xiaoying.A Novel Incremental Mining Algorithm of Frequent Patterns for Web Usage Mining[J].Wuhan University Journal of Natural Sciences,2007,12(5):777-782. 被引量：1
7李清峰,王莉,周伟林,陈火旺.一种挖掘最大频繁集的算法[J].模式识别与人工智能,2007,20(5):661-666.
8李清峰,周鲜成,王莉,周伟林.最小组合法挖掘最大频繁集[J].计算机应用研究,2008,25(3):702-704. 被引量：4
9杨炳儒,秦奕青,宋泽锋.免疫进化机制及其在时序模式挖掘中的应用研究[J].中国工程科学,2008,10(4):84-89.
10方刚,刘雨露.基于事务互补的双向挖掘算法[J].计算机工程与应用,2008,44(36):168-170.

同被引文献61

1吕杰林,陈是维.基于相关性度量的关联规则挖掘[J].浙江大学学报（理学版）,2012,39(3):284-288. 被引量：15
2秦亮曦,史忠植.SFPMax——基于排序FP树的最大频繁模式挖掘算法[J].计算机研究与发展,2005,42(2):217-223. 被引量：26
3宋卫林,徐惠民.基于最大频繁项目序列集挖掘DMFIA算法的改进[J].计算机工程与设计,2007,28(7):1493-1496. 被引量：1
4KantardzicM.数据挖掘:概念、模型、方法和算法[M].王晓海,吴志刚.译.北京:清华大学出版社,2013:4-13.
5Agrawal R, ImielinskiT, Swami A.Mining association rules between sets of items in large databases [J]. Proceedings of ACMSIGMOD Conference on Management of Data,1993:207- 216.
6HanJ, Pei without j, YinY candidate Mining frequent patterns generation [J] ACMSIGMOD international conference The onmanagement of data.2000,29(2): 1 - 12.
7YaoH, HamiltonHJ, ButzCJ. A foundational approach to miningitemset utilities from databases conference [J]. The fourth SIAM international on datamining ,2004: 482- 486.
8ChenMS, Han J,YuPS. Data mining: An overview from a databaseperspective [J]. IEEE Transactions on Knowledge and Data Engineering,1996:866 - 883.
9LiuY, LiaoWK, Choudhary A. A fast high utility itemsets miningalgorithm [J]. The first international workshop on utility-based data mining,2005:90 - 99.
10GU D, XIA L. A novel and improved apriori algorithm[J].Vehicle mechanical & electrical engineering, 2015, 721.

引证文献9

1白玲玲,韩天鹏,王峰.基于改进FP-tree的高实用项目集挖掘研究[J].赤峰学院学报（自然科学版）,2015,31(8):21-25.
2蒋倩倩,王逊,黄树成.MNWAP-mine:一种改进的频繁模式挖掘算法[J].江苏科技大学学报（自然科学版）,2016,30(1):59-64.
3宁慧,王素红,崔立刚,郭笑语,徐丽.基于改进的FP-tree最大频繁模式挖掘算法[J].应用科技,2016,43(2):37-43. 被引量：4
4张海清,李代伟,刘胤田,龚程,于曦.最大模糊频繁模式挖掘算法[J].计算机应用,2017,37(5):1424-1429. 被引量：1
5胡法奎,陈高云,龚程,张海清.面向大规模医疗数据的模糊频繁模式挖掘研究[J].信息通信,2017,30(3):14-16. 被引量：2
6赵群礼,郭玉堂,史君华.基于改进频繁模式树的最大频繁项目集更新挖掘算法[J].井冈山大学学报（自然科学版）,2018,39(4):43-48.
7刘莉萍,章新友,牛晓录,郭永坤,丁亮.基于Spark的并行关联规则挖掘算法研究综述[J].计算机工程与应用,2019,55(9):1-9. 被引量：30
8付泽强,王晓锋,孔军.高性能网络安全告警信息的关联分析方法[J].计算机科学,2019,46(5):116-121. 被引量：8
9刘丽娜,姜利群.基于Spark字典表压缩存储的关联规则算法优化[J].计算机应用与软件,2021,38(8):37-43. 被引量：6

二级引证文献51

1夏正龙,姚蓉,朱亮,钟艳雯.基于Spark的探空逆温识别算法实现[J].湖北农业科学,2021,60(S01):335-339. 被引量：1
2余彪,刘守全.基于FP-growth算法的改进关联规则挖掘算法[J].计算机与网络,2017,43(14):68-71. 被引量：7
3赵群礼,郭玉堂,史君华.基于改进频繁模式树的最大频繁项目集更新挖掘算法[J].井冈山大学学报（自然科学版）,2018,39(4):43-48.
4冯宇.基于模糊规则预测模型的急性高血糖诊断[J].计算机技术与发展,2019,29(2):177-180.
5杜媛,张世伟.基于重构的改进自然排序树算法[J].计算机应用,2019,39(2):441-445.
6高权,万晓冬.基于负载均衡的并行FP-Growth算法[J].计算机工程,2019,45(3):32-35. 被引量：7
7杨海霞,李晨宇,章玲,卜玉华.基于FP-tree算法的评价指标关联信息挖掘和指标重要程度确定[J].系统工程,2019,37(3):141-150. 被引量：6
8张婷.基于Apache Spark的移动APP用户访问路径分析[J].海南大学学报（自然科学版）,2019,37(3):209-218. 被引量：1
9叶符明,李雯婷.内存数据库中图论频繁模式挖掘方法仿真[J].计算机仿真,2019,36(10):458-461. 被引量：2
10李军,刘举庆,游林,俞艳,张晓盼,董恒.时空大数据支持的土地储备智能决策体系与应用研究[J].中国土地科学,2019,33(9):111-120. 被引量：17

1马丽生,邓辉文,齐逸.基于FP-tree的最大频繁项目集挖掘算法[J].计算机工程与设计,2008,29(2):385-388. 被引量：4
2钱进.最大频繁项目集挖掘技术研究与展望[J].微计算机应用,2005,26(6):652-654. 被引量：7
3钱进.最大频繁项目集挖掘技术研究[J].江苏技术师范学院学报,2004,10(4):61-64.
4马丽生,邓辉文,齐逸.一种新的最大频繁项目集挖掘算法[J].计算机应用,2006,26(11):2670-2673. 被引量：6
5宋余庆,朱玉全,孙志挥,杨鹤标.一种基于频繁模式树的约束最大频繁项目集挖掘及其更新算法[J].计算机研究与发展,2005,42(5):777-783. 被引量：21
6赵鹏.海量高维数据下的频繁项目集挖掘算法研究[J].计算机应用与软件,2012,29(7):150-153. 被引量：2
7李陶深,李新仕.一种新的基于投影的频繁模式树构造算法[J].计算机科学,2006,33(B12):136-138.
8宋余庆,朱玉全,孙志挥,陈耿.基于FP-Tree的最大频繁项目集挖掘及更新算法[J].软件学报,2003,14(9):1586-1592. 被引量：164
9刘杰,葛晓玢,姚珺.基于矩阵的最大频繁项目集挖掘算法研究[J].电脑知识与技术（过刊）,2011,17(10X):7234-7236. 被引量：1
10陈耿,朱玉全,宋余庆,陆介平,孙志挥.基于频繁模式树的约束最大频繁项目集挖掘算法研究[J].应用科学学报,2006,24(1):64-69. 被引量：4

计算机科学

2013年第12期

浏览历史

内容加载中请稍等...

基于FPMAX的最大频繁项目集挖掘改进算法被引量：9

参考文献11

二级参考文献22

共引文献67

同被引文献61

引证文献9

二级引证文献51

相关作者

相关机构

相关主题

浏览历史

基于FPMAX的最大频繁项目集挖掘改进算法 被引量：9

参考文献11

二级参考文献22

共引文献67

同被引文献61

引证文献9

二级引证文献51

相关作者

相关机构

相关主题

浏览历史

基于FPMAX的最大频繁项目集挖掘改进算法被引量：9