期刊文献+

基于全置信度关联分析的web层次聚类方法 被引量:2

Hierarchical clustering Web pages method based on all-confidence association analysis
在线阅读 下载PDF
导出
摘要 为了便于用户浏览网页信息,基于全置信度关联分析,提出了一种网页层次聚类的方法。该方法采用向量空间模型表示网页文档,将文档看成事务,文档的词汇视为事务中的项,根据关联挖掘算法发现文档之间的强关联规则产生基本类,然后利用图划分的算法完成网页文档的层次聚类。在关联规则产生过程中采用全置信度量发现强关联模式,规则的产生不受支持度阈值设置的影响,即使支持度阈值设置为零,也能发现强关联模式,有效地消除了弱相关的交叉支持模式。 In order to facilitate users to browse web pages, an algorithm based on all-confidence association analysis is proposed. In this algorithm, Vector Space Model (VSM) is employed to represent web documents, in which web documents are represented as transactions and words in the web documents are considered as items of the transactions. According to the strong affinity association rules produced by association mining algorithms, base clusters are generated, and finally web pages are grouped in a hierarchical fashion by using graph partition method. During the process of association rules generation, all-confidence is used to discover strong affinity pattern, by which cross-support patterns are efficiently avoided and the support threshold has little influence to the association rules even if the threshold is set to zero.
出处 《辽宁工程技术大学学报(自然科学版)》 EI CAS 北大核心 2007年第6期892-894,共3页 Journal of Liaoning Technical University (Natural Science)
基金 天津市科技发展计划基金资助项目(07JCZDJC067007)
关键词 关联规则 层次聚类 网页文档 文本挖掘 association rules hierarchical clustering Web documents text mining
作者简介 史庆伟(1973-),男,辽宁阜新人,讲师,博士研究生,主要研究方向为Web挖掘,信息抽取
  • 相关文献

参考文献7

  • 1史庆伟,赵政,朝柯.一种基于后缀树的中文网页层次聚类方法[J].辽宁工程技术大学学报(自然科学版),2006,25(6):890-892. 被引量:11
  • 2M. Sanderson and W.B.Crofl. Deriving concept hierarchies from text [C]. Proceedings of SIGIR, 1999:206-213.
  • 3Lawrie D, Croft W B, Rosenberg A. Finding topic words for hierarchical summarization[C].Proceedings of SIGIR, 2001: 349-357.
  • 4HJ Zeng K, QC He, Chen Z, ec al. Learning to Cluster Web Search Results[C].Proceedings of SIGIR, 2004:210-217.
  • 5宋擒豹,沈钧毅.基于关联规则的Web文档聚类算法[J].软件学报,2002,13(3):417-423. 被引量:41
  • 6H Xiong, PN Tan and V Kumar. Mining strong affinity association patterns in data sets with skewed support distribution[C].Proceedings of the Third IEEE International Conference on Data Mining, 2002:387-394.
  • 7ER Omiecinski.Alternative interest measures for mining associations in databases [J]. IEEE Transactions on Knowledge and Data Engineering, 2003, 15(1):59-67.

二级参考文献14

  • 1郭伟,唐晓君,刘万军.一种基于划分的聚类算法分析与改进[J].辽宁工程技术大学学报(自然科学版),2004,23(6):826-828. 被引量:4
  • 2[1]Broder,A.Z.,Glassman,S.C.,Manasse,M.S.Syntactic clustering of the Web.Technical Report,1997-015,Palo Alto,CA:Digital Systems Research Center (Digital),1997.
  • 3[2]Chang,C.H.,Hsu,C.C.Customizable multi-engine search tool with clustering.Computer Network and ISDN Systems,1997,29(8-13):1217~1224.
  • 4[3]Chen,L.,Katya,S.Webmate:a personal agent browsing and searching.In:Sycara,K.P.,Wooldridge,M.,eds.Proceedings of the 2nd International Conference on Autonomous Agents.New York:ACM Press,1998.132~139.
  • 5[4]Ron,W.,Bienvenido,V.,Mark,A.S.,et al.Hypursuit:a hierarchical network search engine that exploits content-link hypertext clustering.In:ACM,ed.Proceedings of the 7th ACM Conference on Hypertext.New York:ACM Press,1996.180~193.
  • 6[5]Ackerman,M.,Billsus,D.,Gaffney,S.,et al.Learning probabilistic user profiles.AI Magazine,1997,18(2):47~56.
  • 7[6]Cheeseman,P.,Stutz,J.Bayesian classification (autoclass):theory and results.In:Fayyad,U.M.,Piatetsky-Shapiro,G.,Smyth,P.,et al.,eds.Advances in Knowledge Discovery and Data Mining.Menlo Park,CA:AAAI/MIT Press,1996.153~180.
  • 8[7]Agrawal,R.,Srikant,R.Fast algorithm for mining association rules.In:Jorge,B.B,Matthias,J.,Carlo,Z.,eds.Proceedings of the 20th International Conference on Very Large Databases.Santiago:Morgan Kaufmann Publishers,Inc.,1994.487~499.
  • 9Karypis G,Han EH,Kumar V.Chameleon:hierarchical clustering using dynamic modeling[J].Computer,1999(32):68-75.
  • 10Krishnapuram R,Kummamuru K.Automatic taxonomy generation:Issues and possibilities[J].LNCS:In:Proceedings of Fuzzy Sets and Systems (IFSA),Springer-Verlag Heidelberg,2003,27(15):52-63.

共引文献50

同被引文献27

  • 1刘川,方思行.基于FPclose算法挖掘强亲密度关联模式[J].计算机工程与设计,2005,26(5):1149-1151. 被引量:1
  • 2马建庆,钟亦平,张世永.基于兴趣度的关联规则挖掘算法[J].计算机工程,2006,32(17):121-122. 被引量:20
  • 3朱明.数据挖掘[M].合肥:中国科学技术大学出版社,2008.
  • 4Ke Y, Cheng J, Ng W. Mic Irarnework: An information-theoretic approach to quantitative association rule mining [C] // Proceeding of the 22nd International Con{erenceon Data Engineering IEEE Computer Society, Los Alamitos, CA, 2006: 112.
  • 5KE Yiping, James Cheng, Wifred NgW. An information-theoretic approach to quantitative association rule ming [J]. Knowl Inform Syst J, 2008, 16 (2): 213-244.
  • 6Ke Y, Cheng J, Ng W. Mining quantitative correlated patterns using an information theoretic approach [C] // Proceeding of the 12th ACM SIGKDD International Conference on Knowledge Disco- very and Data Mining New York: ACM, 2006: 227-236.
  • 7Ke Y, Cheng J, Ng W. Correlation search in graph databases [C] // Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2007: 277-236.
  • 8Williams D W, HUAN J, WANG W. Graph database indexing using structured graph decomposition [C] // Iatanbul, Turkey:Proceedings of the 2ard IEEE International Conference on Data Engineering, 2007: 976-985.
  • 9Ke Y, KE YiPing, James Cheng, et al. Correlated Pattern Ming in Quantitative Databases [J]. ACM Transactions on Database Systems, 2008, 33 (3): 20-23.
  • 10ZHANG J, Feigenbaum J. Finding highly correlated pairs efficiently with powerful pruning [C]// Proceedings of the 10th ACM SIGKDD International Conference on Information and Knowledge Management. New York. ACM, 2006: 152-161.

引证文献2

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部