期刊文献+

基于KL准则层次文本聚类

KL-based criterion hierarchical text clustering
在线阅读 下载PDF
导出
摘要 聚类已经被用来提高文本检索或文本分类效率和效果的一种手段,我们在本文中提出层次聚类算法是依据KL测度构造一组聚类,其实质是最小条件熵聚类,通过用结构α-熵代替香农熵推广最小条件熵准则,当α=2时,基于结构α-熵最小熵测度与最近邻方法的误差率相等.实验结果表明,HKLC算法比其它算法在文本聚类中具有良好性能. Text clustering has been used as a means of improving both the efficiency and effectiveness of text retrieval/categorization. In this paper, we propose a hierarchical clustering algorithm that con- structs a set of clusters having Kullback-Liebler criterion, which is the minimum conditional entropy of clustering. We generalize the criterion by replacing shanno's entropy with structural a-entropy. The minimum entropy criterion based on structural a-entropy is equal to the probability error of the nearest neighbor method when a=2. The experimental results show that KLHC offers better performance than other algorithm in the text clustering.
作者 曲皎 李白桦
出处 《辽宁师范大学学报(自然科学版)》 CAS 北大核心 2008年第1期17-20,共4页 Journal of Liaoning Normal University:Natural Science Edition
关键词 文本聚类 KL测度 NMI text clustering KL criterion NMI
作者简介 曲皎(1971-),女,辽宁本溪人,辽宁师范大学教师,硕士.
  • 相关文献

参考文献7

  • 1ROBERTS S J, EVERSON R, REZEK I. Maximum certainty data partitioning[J]. Pattern Recognition, 2000,33:833-839.
  • 2HAVRDA J, CHARVAT F. Charvat. Quantification method of classification processes: Concept of structurala-entropy[J]. Kybernetika, 1967,3:30-35.
  • 3DHILLON I S, MODHA D S. Concept decompositions for large sparse text data using clustering[J]. Machine Learning, 2001, 42:143-175.
  • 4STREHL A, CHOSH J. Cluster ensembles-a knowledge reuse framework for combining partitions[J]. Journal of Machine Learning Research, 2002,3:583-617.
  • 5MCCALLUM A, NIGAM K. A comparison of event models for naive Bayes text classification[J], AAAI Workshop on Learning for Text Categorization, 1998:41-48.
  • 6VAITHYANATHAN S, DOM B, Model-based hierarchical elustering[C]//Proc 16th Conf Uncertainty in Artificial Intelligence, 2000: 599-608.
  • 7DHILLON. I S Co-clustering documents and words using bipartite spectral graph partitioning[C]//Proc 7th ACM SIGKDD Int Cord Knowledge Discovery and Data Mining, 2001:269-274.

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部