摘要
聚类已经被用来提高文本检索或文本分类效率和效果的一种手段,我们在本文中提出层次聚类算法是依据KL测度构造一组聚类,其实质是最小条件熵聚类,通过用结构α-熵代替香农熵推广最小条件熵准则,当α=2时,基于结构α-熵最小熵测度与最近邻方法的误差率相等.实验结果表明,HKLC算法比其它算法在文本聚类中具有良好性能.
Text clustering has been used as a means of improving both the efficiency and effectiveness of text retrieval/categorization. In this paper, we propose a hierarchical clustering algorithm that con- structs a set of clusters having Kullback-Liebler criterion, which is the minimum conditional entropy of clustering. We generalize the criterion by replacing shanno's entropy with structural a-entropy. The minimum entropy criterion based on structural a-entropy is equal to the probability error of the nearest neighbor method when a=2. The experimental results show that KLHC offers better performance than other algorithm in the text clustering.
出处
《辽宁师范大学学报(自然科学版)》
CAS
北大核心
2008年第1期17-20,共4页
Journal of Liaoning Normal University:Natural Science Edition
作者简介
曲皎(1971-),女,辽宁本溪人,辽宁师范大学教师,硕士.