期刊文献+

基于FPMAX的最大频繁项目集挖掘改进算法 被引量:9

Mining Maximal Frequent Item Sets with Improved Algorithm of FPMAX
在线阅读 下载PDF
导出
摘要 挖掘事务数据库中的最大频繁项目集是数据挖掘领域一个重要的研究方向。基于FP-tree的FPMAX算法是目前较为高效与稳定的最大频繁项目集挖掘算法之一。然而对于稠密数据库中的挖掘,FPMAX会产生大量的冗余递归过程,导致额外的条件FP-tree构造开销。而且在支持度较低时,FPMAX则会因用于超集检测的全局MFItree较为庞大而导致超集检测的性能下降。为此提出FPMAX的改进算法FPMAX-reduce,其通过采用基于事务共同后缀的前瞻剪枝策略来减少挖掘过程中的冗余递归过程。当递归过程中产生的新条件FP-tree规模较小时,FPMAX-reduce通过构造条件MFI-tree来减小后续超集检测遍历的开销。性能试验表明,FPMAX-reduce算法通过有效的前瞻剪枝,在稠密事务数据库以及低支持度的情况下至多可将递归过程减少至原算法的一半以下,进而有效地提高了FPMAX算法的效率。 Finding maximal frequent itemsets is an important issue in data mining research field. The FPMAX algo- rithm, which is based on the FP-tree structure, has been proved to be one of the high-performance algorithms on maxi- mal frequent itemsets mining. But for data mining task in dense datasets, FPMAX algorithm will construct a large num- ber of redundant conditional FP-tree. What's more, if the quantity of frequent itemsets is large, the MFI-tree structure used for subset testing in FPMAX will become quite big, decreasing the efficiency of subset testing in the algorithm. Therefore, this paper proposed the FPMAX-reduce algorithm to overcome those drawbacks of FPMAX. This novel al- gorithm uses a pruning technique based on the common suffix of transactions and greatly reduces the construction of re- dundant conditional FP-tree. Besides,when the scale of the newly constructed conditional FP-tree is small, FPMAX-re- duce constructs a corresponding conditional MFI-tree, which deletes the redundant information, to improves the efficien- cy of subset testing in the following recursive calls. Experimental results show that FPMAX-reduce algorithm effective- ly improves the efficiency of FPMAX and outperforms many existing available algorithms in dense datasets.
作者 牛新征 佘堃
出处 《计算机科学》 CSCD 北大核心 2013年第12期223-228,共6页 Computer Science
基金 国家自然科学基金(61300192) 四川省科技厅科技支撑计划项目(2012GZ0061) 中央高校基本科研业务费电子科技大学项目(ZYGX2010J075)资助
关键词 频繁项目集 最大频繁项目集 FP-TREE FPMAX FP-GROWTH Frequent itemset, Maximal frequent itemset, FP-tree, FPMAX, FP-growth
作者简介 牛新征(1978--),男,博士,副教授,主要研究方向为数据挖掘、网络计算,E-mail:xinzhengniu@uestc.edu.cn 佘堃(1968--),男,博士,教授,博士生导师,主要研究方向为网络计算、人工智能。
  • 相关文献

参考文献11

  • 1BayardoJr R J. Efficiently mining long patterns from databases [C] // Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data. New York, 1998:85-93.
  • 2Burdick D, Calimlim M,Flannick J, et al. Mafia: A maximal fre-quent itemsetalgorithm[J]. IEEE Transactions on Knowledge and Data Engineering, 2005,17(11) : 1490-1504.
  • 3Gouda K, Zaki M J. Efficiently mining maximal frequent item- sets[C]//Proceedings of the 2001 IEEE International Confe- rence on Data Mining. San Jose, 2001:163-170.
  • 4吉根林,杨明,宋余庆,孙志挥.最大频繁项目集的快速更新[J].计算机学报,2005,28(1):128-135. 被引量:47
  • 5Uno T, Kiyomi M, Arimura H. LCM vet. 2:Efficient mining al- gorithms for frequent/closed/maximal itemsets [C] // Procee- dings of the 2004.IEEE ICDM Workshop on Frequent Itemset Mining Implementations. Brighton, 2004:1-11.
  • 6Uno T,Kiyomi M, Arimura H. Lcm vero 3: Collaboration of ar- ray,bitmap and prefix tree for frequent itemset mining[C]/// Proceedings of the 1st International Workshop on Open Source Data Minlng: Frequent Pattern Mining Implementations. NewYork, 2005,77-86.
  • 7Grahne G,Zhu J. High performance mining of maximal frequent itemsets[C]Proceedings of the 6th International Workshop on High Performance Data Mining. 2003:1-10.
  • 8Grahne G, Zhu J. Efficiently using prefix-trees in mining fre- quent itemsets[C]Proeeedings of the Third FIMI Workshop on Frequent Itemset Mining Implementations. Florida, 2003: 123-132.
  • 9Goethals B, Zaki M J. Advances in frequent itemset mining im- p]ementations: report on FIMI' 03 [J]. ACM SIGKDD Explora- tions Newsletter, 2004,6 (1) : 109-1 17.
  • 10Han J, Pei J, Yin Y. Mining frequent patterns without candidate generation[J]. ACM SIGMOD Record, 2000,29(2) : 1-12.

二级参考文献22

  • 1Wikipedia. K-Means clustering [EB/OL]. http://en, wikipedia. org/wiki/K-Means.
  • 2Kantabutra S, Couch A L Parallel K-Means Clustering Algo- rithm on NOWS[J]. Technical Journal, 2000,6 (1) : 243-247.
  • 3Forman G, Zhang B. Distributed Data Clustering can be Efficient and Exact[J]. SIGKDD Explorations, 2000,2 (2) : 34-38.
  • 4Boutsinas B, Gnardellis T. On Distributing the Clustering Pro- cess[J]. Patter Recognition Letters, 2002,23(4) : 999-1008.
  • 5梁红 李伟生.XML文档的并行聚类算法.计算机科学,2004,31(10):243-245.
  • 6Quinn M J. ParM: Pallel Programming in C with MPI and OpenMP[S]. Beijing: Tsinghua University Press, 2005.
  • 7Han J.W.,Kamber M..Data Mining:Concepts and Techniques.Beijing:Higher Education Press,2001.
  • 8Agrawal R.,ImielinSki T.,Swami A..Mining association rules between sets of items in large database.In:Proceedings of the ACM SIGMOD International Conference on Managementof Data,Washington,DC,1993,2:207-216.
  • 9Srikant A.R..Fast algorithms for mining association rules.In:Proceedings of the 20th International Conference Very Large Data Bases(VLDB’94).Santiago,Chile,1994,487-499.
  • 10Han J.W.,Pei J.,Yin Y..Mining partial periodicity using frequent pattern tree.Simon Fraser University:Technical Report TR-99-10,1999.

共引文献67

同被引文献61

  • 1吕杰林,陈是维.基于相关性度量的关联规则挖掘[J].浙江大学学报(理学版),2012,39(3):284-288. 被引量:15
  • 2秦亮曦,史忠植.SFPMax——基于排序FP树的最大频繁模式挖掘算法[J].计算机研究与发展,2005,42(2):217-223. 被引量:26
  • 3宋卫林,徐惠民.基于最大频繁项目序列集挖掘DMFIA算法的改进[J].计算机工程与设计,2007,28(7):1493-1496. 被引量:1
  • 4KantardzicM.数据挖掘:概念、模型、方法和算法[M].王晓海,吴志刚.译.北京:清华大学出版社,2013:4-13.
  • 5Agrawal R, ImielinskiT, Swami A.Mining association rules between sets of items in large databases [J]. Proceedings of ACMSIGMOD Conference on Management of Data,1993:207- 216.
  • 6HanJ, Pei without j, YinY candidate Mining frequent patterns generation [J] ACMSIGMOD international conference The onmanagement of data.2000,29(2): 1 - 12.
  • 7YaoH, HamiltonHJ, ButzCJ. A foundational approach to miningitemset utilities from databases conference [J]. The fourth SIAM international on datamining ,2004: 482- 486.
  • 8ChenMS, Han J,YuPS. Data mining: An overview from a databaseperspective [J]. IEEE Transactions on Knowledge and Data Engineering,1996:866 - 883.
  • 9LiuY, LiaoWK, Choudhary A. A fast high utility itemsets miningalgorithm [J]. The first international workshop on utility-based data mining,2005:90 - 99.
  • 10GU D, XIA L. A novel and improved apriori algorithm[J].Vehicle mechanical & electrical engineering, 2015, 721.

引证文献9

二级引证文献51

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部