基于全置信度关联分析的web层次聚类方法被引量：2

Hierarchical clustering Web pages method based on all-confidence association analysis

在线阅读下载PDF

导出

摘要为了便于用户浏览网页信息,基于全置信度关联分析,提出了一种网页层次聚类的方法。该方法采用向量空间模型表示网页文档,将文档看成事务,文档的词汇视为事务中的项,根据关联挖掘算法发现文档之间的强关联规则产生基本类,然后利用图划分的算法完成网页文档的层次聚类。在关联规则产生过程中采用全置信度量发现强关联模式,规则的产生不受支持度阈值设置的影响,即使支持度阈值设置为零,也能发现强关联模式,有效地消除了弱相关的交叉支持模式。 In order to facilitate users to browse web pages, an algorithm based on all-confidence association analysis is proposed. In this algorithm, Vector Space Model （VSM） is employed to represent web documents, in which web documents are represented as transactions and words in the web documents are considered as items of the transactions. According to the strong affinity association rules produced by association mining algorithms, base clusters are generated, and finally web pages are grouped in a hierarchical fashion by using graph partition method. During the process of association rules generation, all-confidence is used to discover strong affinity pattern, by which cross-support patterns are efficiently avoided and the support threshold has little influence to the association rules even if the threshold is set to zero.

作者史庆伟赵政鲍虎

机构地区天津大学计算机科学与技术学院

出处《辽宁工程技术大学学报（自然科学版）》 EI CAS 北大核心 2007年第6期892-894,共3页 Journal of Liaoning Technical University (Natural Science)

基金天津市科技发展计划基金资助项目(07JCZDJC067007)

关键词关联规则层次聚类网页文档文本挖掘 association rules hierarchical clustering Web documents text mining

分类号 TP311 [自动化与计算机技术—计算机软件与理论]

作者简介史庆伟（1973-），男，辽宁阜新人，讲师，博士研究生，主要研究方向为Web挖掘，信息抽取

引文网络
相关文献

参考文献7

1史庆伟,赵政,朝柯.一种基于后缀树的中文网页层次聚类方法[J].辽宁工程技术大学学报（自然科学版）,2006,25(6):890-892. 被引量：11
2M. Sanderson and W.B.Crofl. Deriving concept hierarchies from text [C]. Proceedings of SIGIR, 1999:206-213.
3Lawrie D, Croft W B, Rosenberg A. Finding topic words for hierarchical summarization[C].Proceedings of SIGIR, 2001: 349-357.
4HJ Zeng K, QC He, Chen Z, ec al. Learning to Cluster Web Search Results[C].Proceedings of SIGIR, 2004:210-217.
5宋擒豹,沈钧毅.基于关联规则的Web文档聚类算法[J].软件学报,2002,13(3):417-423. 被引量：41
6H Xiong, PN Tan and V Kumar. Mining strong affinity association patterns in data sets with skewed support distribution[C].Proceedings of the Third IEEE International Conference on Data Mining, 2002:387-394.
7ER Omiecinski.Alternative interest measures for mining associations in databases [J]. IEEE Transactions on Knowledge and Data Engineering, 2003, 15(1):59-67.

二级参考文献14

1郭伟,唐晓君,刘万军.一种基于划分的聚类算法分析与改进[J].辽宁工程技术大学学报（自然科学版）,2004,23(6):826-828. 被引量：4
2[1]Broder,A.Z.,Glassman,S.C.,Manasse,M.S.Syntactic clustering of the Web.Technical Report,1997-015,Palo Alto,CA:Digital Systems Research Center (Digital),1997.
3[2]Chang,C.H.,Hsu,C.C.Customizable multi-engine search tool with clustering.Computer Network and ISDN Systems,1997,29(8-13):1217~1224.
4[3]Chen,L.,Katya,S.Webmate:a personal agent browsing and searching.In:Sycara,K.P.,Wooldridge,M.,eds.Proceedings of the 2nd International Conference on Autonomous Agents.New York:ACM Press,1998.132~139.
5[4]Ron,W.,Bienvenido,V.,Mark,A.S.,et al.Hypursuit:a hierarchical network search engine that exploits content-link hypertext clustering.In:ACM,ed.Proceedings of the 7th ACM Conference on Hypertext.New York:ACM Press,1996.180~193.
6[5]Ackerman,M.,Billsus,D.,Gaffney,S.,et al.Learning probabilistic user profiles.AI Magazine,1997,18(2):47~56.
7[6]Cheeseman,P.,Stutz,J.Bayesian classification (autoclass):theory and results.In:Fayyad,U.M.,Piatetsky-Shapiro,G.,Smyth,P.,et al.,eds.Advances in Knowledge Discovery and Data Mining.Menlo Park,CA:AAAI/MIT Press,1996.153~180.
8[7]Agrawal,R.,Srikant,R.Fast algorithm for mining association rules.In:Jorge,B.B,Matthias,J.,Carlo,Z.,eds.Proceedings of the 20th International Conference on Very Large Databases.Santiago:Morgan Kaufmann Publishers,Inc.,1994.487~499.
9Karypis G,Han EH,Kumar V.Chameleon:hierarchical clustering using dynamic modeling[J].Computer,1999(32):68-75.
10Krishnapuram R,Kummamuru K.Automatic taxonomy generation:Issues and possibilities[J].LNCS:In:Proceedings of Fuzzy Sets and Systems (IFSA),Springer-Verlag Heidelberg,2003,27(15):52-63.

共引文献50

1张培颖.基于模糊隶属度的个性化网页推荐系统[J].计算机系统应用,2008,17(11):11-13.
2王建会,申展,胡运发.一种实用高效的聚类算法[J].软件学报,2004,15(5):697-705. 被引量：26
3邢东山,沈钧毅,宋擒豹.集成Web使用挖掘和内容挖掘的用户浏览兴趣迁移挖掘算法[J].小型微型计算机系统,2004,25(7):1170-1173. 被引量：5
4朱克斌,唐菁,杨炳儒.Web文本挖掘系统及聚类分析算法[J].计算机工程,2004,30(13):138-139. 被引量：7
5王楠.基于聚类的全文检索系统后处理[J].情报杂志,2005,24(1):112-114. 被引量：4
6谌志群,张国煊.文本挖掘研究进展[J].模式识别与人工智能,2005,18(1):65-74. 被引量：53
7马国俊,贠卫国.基于潜在语义索引的中文文本聚类的研究[J].现代电子技术,2005,28(10):58-59. 被引量：4
8来升强,朱建平.数据挖掘中关联规则算法的发展趋势[J].统计与信息论坛,2005,20(3):16-20. 被引量：3
9杨峰,周宁,吴佳鑫.基于信息可视化技术的文本聚类方法研究[J].情报学报,2005,24(6):679-683. 被引量：18
10蔡江辉,张继福,赵旭俊.二阶段模糊聚类方法研究[J].哈尔滨工程大学学报,2006,27(B07):552-557.

同被引文献27

1刘川,方思行.基于FPclose算法挖掘强亲密度关联模式[J].计算机工程与设计,2005,26(5):1149-1151. 被引量：1
2马建庆,钟亦平,张世永.基于兴趣度的关联规则挖掘算法[J].计算机工程,2006,32(17):121-122. 被引量：20
3朱明.数据挖掘[M].合肥:中国科学技术大学出版社,2008.
4Ke Y, Cheng J, Ng W. Mic Irarnework: An information-theoretic approach to quantitative association rule mining [C] // Proceeding of the 22nd International Con{erenceon Data Engineering IEEE Computer Society, Los Alamitos, CA, 2006: 112.
5KE Yiping, James Cheng, Wifred NgW. An information-theoretic approach to quantitative association rule ming [J]. Knowl Inform Syst J, 2008, 16 (2): 213-244.
6Ke Y, Cheng J, Ng W. Mining quantitative correlated patterns using an information theoretic approach [C] // Proceeding of the 12th ACM SIGKDD International Conference on Knowledge Disco- very and Data Mining New York: ACM, 2006: 227-236.
7Ke Y, Cheng J, Ng W. Correlation search in graph databases [C] // Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2007: 277-236.
8Williams D W, HUAN J, WANG W. Graph database indexing using structured graph decomposition [C] // Iatanbul, Turkey:Proceedings of the 2ard IEEE International Conference on Data Engineering, 2007: 976-985.
9Ke Y, KE YiPing, James Cheng, et al. Correlated Pattern Ming in Quantitative Databases [J]. ACM Transactions on Database Systems, 2008, 33 (3): 20-23.
10ZHANG J, Feigenbaum J. Finding highly correlated pairs efficiently with powerful pruning [C]// Proceedings of the 10th ACM SIGKDD International Conference on Information and Knowledge Management. New York. ACM, 2006: 152-161.

引证文献2

1沈西挺,龚彩虹.基于量化相关模式的多值关联规则挖掘算法[J].计算机工程与设计,2013,34(7):2422-2425.
2温海波.近似频繁集合强亲密关联性的矩阵挖掘方法[J].安徽建筑工业学院学报（自然科学版）,2013,21(3):56-60. 被引量：1

二级引证文献1

1陈喜华,黄海宁,黄沛杰.基于Apriori算法的学生成绩分析在课程关联性的应用研究[J].北京城市学院学报,2018(4):60-65. 被引量：7

1马慧麟,吴晓洁.关于大型企业数据仓库建设中的几个关键问题的研究[J].制造技术与机床,2005(5):92-94. 被引量：1
2黄强,张德华,汪伦伟.可信计算硬件设备虚拟化关键保障机制研究[J].信息网络安全,2015(9):70-73. 被引量：3
3何胜文,周绍梅,姚华.基于二叉树结构的关联挖掘算法[J].福建电脑,2007,23(9):76-77. 被引量：1
4赵阳.一种改进的关联规则挖掘算法[J].福建电脑,2006(5):111-112.
5高正红,毛林.基于散列表的关联挖掘算法[J].科技信息,2010(10):138-138.
6罗冬梅.Apriori关联挖掘的研究与应用[J].铜仁学院学报,2012,14(5):141-143.
7顾红其.频繁项集挖掘算法的研究[J].电脑知识与技术,2010,6(8X):6858-6860.
8顾红其.关联挖掘Apriori算法的研究与应用[J].计算机与信息技术,2009(9):92-93. 被引量：1
9李新良.基于关联规则算法的改进研究[J].娄底职业技术学院学报（职教与经济研究）,2007,5(3):54-58.
10包磊.一种云计算环境下的改进FP-GROWTH算法[J].软件导刊,2016,15(1):57-60.

辽宁工程技术大学学报（自然科学版）

2007年第6期

浏览历史

内容加载中请稍等...

基于全置信度关联分析的web层次聚类方法被引量：2

参考文献7

二级参考文献14

共引文献50

同被引文献27

引证文献2

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

基于全置信度关联分析的web层次聚类方法 被引量：2

参考文献7

二级参考文献14

共引文献50

同被引文献27

引证文献2

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

基于全置信度关联分析的web层次聚类方法被引量：2