摘要
高效用项集挖掘在数据挖掘领域中受到了广泛的关注,但是高效用项集挖掘并没有考虑项集长度对效用值的影响,所以高平均效用项集挖掘被提出;而目前的一些高平均效用项集挖掘算法需要耗费大量的时间才能挖掘出有效的高平均效用项集。针对此问题,给出了一个高平均效用项集挖掘的改进算法——FHAUI。FHAUI算法将效用信息保存到效用列表中,通过效用列表的比较来挖掘出所有的高平均效用值,同时FHAUI算法还采用了一个二维矩阵来有效减少二项效用值的连接比较次数。最后将FHAUI算法在多个经典的数据集上测试。实验结果表明,FHAUI算法在效用列表的连接比较次数上有了极大的降低,同时其时间性能也有非常大提高。
In the field of data mining, high utility itemset mining has been widely studied. However, high utility itemset mining does not consider the effect of the itemset length. To address this issue, high average-utility itemset mining has been proposed. At present, the proposed high average utility itemset mining algorithms take a lot of time to dig out the high averageutility itemset. To solve this problem, an improved high average itemset mining algorithm, named FHAUI ( Fast High Average Utility Itemset), was proposed. FHAUI stored the utility information in the utility-list and mined all the high average-utility itemsets from the utility-list structure. At the same time, FHAUI adopted a two-dimensional matrix to effectively reduce the number of join-operations. Finally, the experimental results on several classical datasets show that FHAUI has greatly reduced the number of join-operations, and reduced its cost in time consumption.
出处
《计算机应用》
CSCD
北大核心
2016年第11期3062-3066,共5页
journal of Computer Applications
基金
国家自然科学基金资助项目(61370108)~~
关键词
平均效用
高效用
模式挖掘
数据挖掘
频繁模式
average utility
high utility
pattern mining
data mining
frequent pattern
作者简介
王敬华(1965-),男,湖北红安人,副教授,硕士,主要研究方向:数据挖掘、现代信息系统;
通信作者电子邮箱wwwlxzwww@163.com罗相洲(1991-),男,湖北武汉人,硕士研究生,主要研究方向:数据库、数据挖掘;
吴倩(1990-),女,湖北汉川人,硕士研究生,主要研究方向:数据挖掘、复杂网络。