摘要
在经典的k-means聚类算法中,聚类数k必须事先给定,然而在现实中k很难被精确的确定.本文提出了一种改进的遗传k-means聚类算法,并构造了一个用来评价分类程度好坏的适应度函数,该适应度函数考虑的是在提高紧凑度(类内距)和分离度(类间距)的同时使得分类个数尽可能少.最后采用两个人工数据集和三个UCI数据集对k-means聚类算法(KM),遗传聚类算法(GA),遗传k-means聚类算法(GKM)和改进的遗传k-means聚类算法(IGKM)进行比较研究,比较的指标有类间距、类内距和分类正确率.研究证明改进的遗传k-means算法能够自动获取最佳聚类数k并且保持较高的正确率.
In the classical k-means algorithm,the value of k must be confirmed in advance.It is difficulty to confirm accurately the value of k in reality.This paper proposals an improved genetic k-means algorithm(IGKM) and constructs a fitness function defined as a product of three factors,maximization of which ensures the formation of a small number of compact clusters with large separation between at least two clusters.At last,two artificial and three real-life data sets are considered for experiments that compare IGKM with k-means algorithm,genetic cluster algorithm and genetic k-means algorithm by inter-cluster,inner-cluster and rate of right.The experiments show that IGKM can get the optimal value of k automatically and keep the high accuracy.
出处
《数学的实践与认识》
CSCD
北大核心
2007年第8期104-111,共8页
Mathematics in Practice and Theory
基金
国家自然科学基金(70273044
70573101)
教育部人文社科基金项目(06JA880668