摘要
为了提高数据挖掘效率,提出了一种基于图的频繁闭项集挖掘算法GFCG(graph basedfrequentcloseditemsetgeneration).该算法采用位矢量技术构造有向图,表示项与项之间的频繁关系,并在有向图的基础上递归产生频繁闭项集,从而只需扫描数据库2次,不产生候选集;引入扩展频繁项集的概念,大大减小了检查频繁项集是否闭的搜索空间.用1个真实数据库和2个合成数据库对GFCG进行了测试,并与A close和CLOSET算法的结果进行了比较,结果表明,该算法具有良好的速度和可伸缩性性能.
To raise the efficiency of data mining, a graph-based algorithm for mining frequent closed itemsets, called GFCG (graph-based frequent closed itemset generation) was proposed. In this algorithm, the bit vector technique is used to construct a directed graph to represent the frequent relationship between items, and frequent closed itemsets are generated recursively from this graph. As a result, the GFCG scans the database for only two times, and does not generate candidate sets. Furthermore, the concept of an expanded frequent itemset is introduced to greatly decrease the searching range for adjusting whether a frequent itemset is closed or not. In addition, this algorithm was tested by using one actual and two synthetical databases. The testing result shows that compared with the A-close and CLOSET algorithms, the proposed algorithm has good speed and scale-up properties.
出处
《西南交通大学学报》
EI
CSCD
北大核心
2004年第3期385-389,共5页
Journal of Southwest Jiaotong University
基金
国家中西药管理局项目(2000 J P 54)
关键词
数据库
图
数据挖掘
频繁闭集
位向量
database
graph
data mining
frequent closed itemset
bit vector