摘要
文本分类中通常采用向量空间模型(VSM)来表示文本特征,如何选择最能够表达文本主题的特征词,从而减少特征空间维数,降低时空复杂度,是一个十分重要的问题。针对此问题本文提出了采用截集模糊C-均值(S2FCM)聚类进行类间特征降维,该方法以最大隶属度原则为指导,在保持模糊聚类的同时,提高收敛速度,并且能够提高特征选择的正确性。同时在算法中使用改进的隶属度、聚类中心计算方法并使用非随机方法确定初始聚类中心。最后实验表明采用该方法选择的文本特征项进行文本分类能够收到比较好的分类结果。
Vector Space Model is often used to denote text feature in text classify. It is an important problem how to choice the feature words which can express the topic exactly, and consequently reduce space dimension and time complexity. For this, we put forward a method using Sectional Set Fuzzy C -means(S2FCM) clustering meth- od to reduce feature dimension. This method guides with the most subjection. On one hand it keeps fuzzy clustering effect, and on the other hand it can enhance the constringency pace and improve the correctness of feature selection. Here we also apply the ameliorated subjection degree and clustering center calculation and the no random method search to determine the initial cluster centers. At the end, the experiment testify by this method can receive good classifying result.
出处
《南昌大学学报(工科版)》
CAS
2008年第1期87-90,共4页
Journal of Nanchang University(Engineering & Technology)
基金
江西省教育厅计划资助项目(2006[36])
关键词
截集
特征词
VSM
模糊聚类
sectional set
feature words
VSM
fuzzy clustering
作者简介
白似雪(1957-),男,教授。