摘要
压缩频繁序列模式集是针对频繁序列模式的全集太大这个问题的一种解决方法.为了得到高质量的压缩效果,先对频繁序列模式聚簇,再从每个簇中挑选出有代表性的序列模式,使这些有代表性的序列模式的数目尽可能地少.一个贪婪算法和一个基于候选集的快速算法是压缩频繁序列模式集的有效算法.有代表性的序列模式集合是频繁序列模式的一种子集,实验结果表明它能取得很好的压缩效果.
Compressing the set of frequent sequential patterns is a method in order to address the problem of explosive number of output sequential patterns. In order to get high-quality compression, it first clusters frequent sequential patterns, and then select and output only a representative sequential pattern for each cluster such that the number of these representative sequential patterns is minimized. A greedy algorithm and an efficient candidate_based algorithm are proposed. The set of representative sequential patterns is a kind of subset of frequent sequential patterns. Experimental results show that it can achieve very good compression effect.
出处
《小型微型计算机系统》
CSCD
北大核心
2008年第3期503-507,共5页
Journal of Chinese Computer Systems
关键词
数据挖掘
序列模式
有代表性的序列模式
data mining
sequential pattern
representative sequential pattern
作者简介
王涛,女,1969年生,博士,研究方向为数据挖掘、特种数据库、软件测试.E—mail:imwt@21cn.com