摘要
高平均效用项集挖掘是当前研究的热点之一。针对高平均效用项集挖掘算法产生大量无意义的候选项集,而导致高内存消耗和运行时间长的问题,提出了dMHAUI算法。首先定义了集成矩阵Q,并提出了4种基于垂直数据库表示的紧凑平均效用上界及3种有效的修剪策略;将高平均效用项集挖掘所需的信息存储于IDUL结构树,利用改进的diffset技术快速计算项集的平均效用和上界;最后通过递归调用搜索函数得到高平均效用项集。与EHAUPM算法和MHAI算法进行仿真比较,结果表明,dMHAUI算法在运行时间、连接比较次数和可扩展性等方面都有较优的性能。
Mining high average-utility itemsets is one of the hotspots in the current research.Aiming at the problem that the high average-utility itemsets mining algorithm generates a large number of meaningless candidate itemsets,which results in high memory consumption and long running time,the dMHAUI algorithm is proposed.Firstly,the algorithm defines the integration matrix Q,and proposes four compact average-utility upper bounds based on vertical database representation and three effective pru-ning strategies.Secondly,the information needed for high average-utility itemsets mining is stored in the IDUL structure tree,and the improved diffset technique is used to quickly calculate the average-utility and upper bound of itemsets.Finally,the high average-utility itemsets are obtained by recursively calling the search algorithm.Simulation results show that the dMHAUI function has better performance than the EHAUPM algorithm and the MHAI algorithm in terms of running time,join operation number and scalability.
作者
浦蓉
邵剑飞
胡常礼
曲坤
PU Rong;SHAO Jian-fei;HU Chang-li;QU Kun(Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500,China)
出处
《计算机工程与科学》
CSCD
北大核心
2020年第5期931-937,共7页
Computer Engineering & Science
作者简介
浦蓉(1994-),女,云南宣威人,硕士生,研究方向为数据挖掘,E-mail:1661210225@qq.com;邵剑飞(1970-),男,云南昆明人,硕士,副教授,研究方向为数据库和数据挖掘,E-mail:siee@163.com。