期刊文献+

用于Web文档聚类的基于相似度的软聚类算法 被引量:6

A Similarity-based Soft Clustering Algorithm for Web Documents
在线阅读 下载PDF
导出
摘要 提出了一种基于相似度的软聚类算法用于文本聚类,这是一种基于相似性度量的有效的软聚类算法,实验表明通过比较SISC和诸如K-means的硬聚类算法,SISC的聚类速度快、效率高。最后展望了文本挖掘在信息技术中的发展前景。 This paper proposes similarity-based soft clustering (SlSC), an efficient soft clustering algorithm based on a given similarity measure used in document clustering. Comparison with existing hard clustering algorithlns like K-means, the experiment indicates SISC is both efficient and effective and this algorithm is available for docurnent clustering. In the end, it highlights the upcoming challenges of document mining and the opportunities it offers.
出处 《计算机工程》 CAS CSCD 北大核心 2006年第2期59-61,共3页 Computer Engineering
基金 教育部博士点基金项目(20030486045)"遥感影像数据库语义生成中的层次差别方法"
关键词 WEB文本挖掘 文本聚类 软聚类 相似度 Web document mining Document clustering Soft clustering Similarity
作者简介 姜亚莉(1979-),女,硕士生,主研方向:地理信息系统,空间分析,数据挖掘;E—mail.jylsmile@163.com 关泽群,教授、博导
  • 相关文献

参考文献7

  • 1Zhou Haofeng, Lou Yubo. Refining Web Authoritative Resource by Frequent Structures[C]. In: Proceedings of the Seventh International Database Engineering and Applications Symposium(IDEAS2003),2003.
  • 2Wu Fei, Gardarin G. Gradual Clustering Algorithm[C]. In: Proceedings of Seventh Intematkmal Conference on Database Systems for Advanced Applications, 2001: 48-55.
  • 3Lin K I, Kondadadi R. A Similarity-based Soft Clustering Algorithm for Documents[C]. In: Proceedings of Seventh International Conference on Database Systems for Advanced Applications, 2001:40-47.
  • 4王继成,潘金贵,张福炎.Web文本挖掘技术研究[J].计算机研究与发展,2000,37(5):513-520. 被引量:275
  • 5杨靖涛,王学林,胡于进.一种基于相似性的文档聚类算法[J].华中科技大学学报(自然科学版),2002,30(12):59-61. 被引量:2
  • 6梅馨,邢桂芬.文本挖掘技术综述[J].江苏大学学报(自然科学版),2003,24(5):72-76. 被引量:29
  • 7杨斌,孟志青.一种文本分类数据挖掘的技术[J].湘潭大学自然科学学报,2001,23(4):34-37. 被引量:10

二级参考文献31

  • 1[3]Feldman R,Dagan I. Knowledge discovery in textual databases(KDT)[C]. In:Proc of the 1st Int'l Conf on Knowledge Discovery. Montreal,1995.112- 117.
  • 2[4]Wuthrich B,Permunetilleke D,Leung S et al. Daily prediction of major stock indices from textual WWW data[C].In:Proc of the 4th Int'l C onf on Knowledge Discovery.New York,1998.
  • 3[6]Koller D,Sahami M.Hierarchically classifying documents using very few wor ds[J].ICML97,1997.170-178.
  • 4Filippo Neri , Lorenza Saitta . Machine Learing for Information Extraction[J ]. Information Extraction: A Mulidisciplinary Approach to an Emergine Information Technology, 1997,1299 : 171 - 191.
  • 5Dieter Merkl, A Min Tjoa. Data Mining in Large Free Text Document Archives[C]. In: Proceedings of the International Symposium on Cooperative Database Systems for Advanced Applications ( CODAS'96 ).Kyoto,Japan, 1996.
  • 6Wendy Lehnert. A Performance Evaluation of Text Analysis Technologies[ J ]. AI Magazine, 1991,81 - 94.
  • 7Heikki Mannila. Methods and Problem in Data Mining[C]. In: Afrati F, Kolaitis P. Proceedings of International Conference on Database Theory. Greece: Springer-Verlag, 1997.
  • 8Kanagasa R, Tan A H. Topic Detection, Tracking and Trend Analysis Using Self-organizing Neural Networks[ C]. In: Fifth Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD'01 ). Hong Kong, 2001.
  • 9Gerald DeJong. An Overview of the Frump System[C].In: Lehnert W B, Ringle M H. Strategies for NaturalLanguage Processing. Erlbaum, 1982.
  • 10Helena Ahonen, Oskari Heinonen, Mika Klemettinen,Inkeri Verkamo A. Mining in the Phrasal Frontier[C].In: Proceedings of PKDD' 97 - 1st European Symposium on Principles of Data Mining and Knowledge Discovery. Norway: Trondheim, 1997.

共引文献309

同被引文献40

引证文献6

二级引证文献69

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部