摘要
在数据挖掘中关联规则是一个重要的研究方向。Apriori算法是关联规则中最著名的算法。本文针对Apriori算法的缺陷,提出一种改进算法(SAVM)。新算法采用向量运算来实现项集支持度计数,大大减少扫描数据库的次数;运用hash函数直接生成频繁2项集;利用基于前缀的划分方法对频繁项集Lk-1进行划分,在相对较小的独立子空间内进行连接生成候选k-项集,减少连接过程中模式匹配的次数,提高连接速度。实验证明,改进算法大幅提高了原算法的效率。
Association rule is an important research direction in data mining. The Apriori algorithm is a classical algorithm in mining association rules. In this paper, we point out the shortage of Apriori algorithm and present an improved one (SAVM). This improved Apriori algorithm uses the operation of vector to get supporting degree of items, applys the Hash function to generate 2-item sets directly, reduces the frequency of database scanning and divides frequent item sets Lk-1 based on prefix to generate candidate k-item sets in a relative small subspace, which can reduce the times of mode matching and improve the speed of joint. The experiments show that the improved Apriori algorithm is high efficiency.
出处
《微计算机信息》
2010年第18期154-156,共3页
Control & Automation
关键词
关联规则
SAVM算法
频繁项集
Association Rules
SAVM algorithm
Frequent item-set
作者简介
姚亮,男,汉族,安徽合肥人,硕士,主要研究方向:数据挖掘。通讯地址:(230061皖合肥市蒙城路109号安徽省地方税务局)