摘要
根据科技文献的结构特点搭建了一个四层挖掘模式,并结合K-medoids算法提出了一个特征选择方法.该选择方法首先依据科技文献的结构将其分为4个层次,然后通过K-medoids算法聚类对前3层逐层实现特征词提取,紧接着再使用Aprori算法找出第4层的最大频繁项集,并作为第4层的特征词集合.同时,由于K-medoids算法的精度受初始中心点影响较大,为了改善该算法在特征选择中的效果,论文又对K-medoids算法的初始中心点选择进行优化.实验结果表明,结合优化K-medoids的四层挖掘模式在科技文献分类方面有较高的准确率.
According to the structural characteristics of the scientific literature, the paper set up a four-level mining mode, and combined K-medoids algorithm to propose a feature selection method of scientific literatures. The proposed feature selection method firstly divided scientific literature into four layers according to its structure, and then selected features progressively for the former three layers by K-medoids algorithm, finally found out the maximum frequent itemsets of fourth layer by Aprori algorithm to act as a collection of Features fourth layer. Meanwhile, because the clustering accuracy of Kmedoids algorithm is influenced by the initial centers, in order to improve the effect of feature selection, the paper also optimized K-medoids algorithm which it firstly used information entropy empower the clustering objects to correct the distance function, and then employed empowerment function value to select the optimal initial clustering cen ter. Experimental results show that the four-level mining mode combined optimized K medoids has higher accuracy in scientific literature classification.
出处
《华中师范大学学报(自然科学版)》
CAS
北大核心
2015年第4期541-545,共5页
Journal of Central China Normal University:Natural Sciences
作者简介
E-mail: lijunzhou0724@163.com.