摘要
传统聚类算法研究都是在假设数据集的对象、属性等方面满足独立性且服从同一分布的基础上进行的.然而现实中的数据往往是非独立同分布的,即属性之间或多或少都会存在一些交互关系.传统K-means算法随机地选择初始聚类中心,对于中心点的选取比较敏感,容易陷入局部最优且准确率低. Min_max方法针对这一缺点进行了改进,但原始的和改进后的Kmeans算法都忽略了属性之间存在的交互关系.因此本文利用Pearson相关系数公式来计算属性之间的交互关系,并映射于原始数据集.同时利用双领域思想对Min_max方法进行了优化.实验结果表明该方法能够得到较高的准确率、较好的聚类效果以及相对较少的迭代次数.
Traditional clustering algorithms studies are built on the basis of the assumption that the objects,attributes and other aspects of the data sets are independent and subject to the same distribution. However,data in reality are often non-independent and identically distributed,that is,there are more and less interactions between attributes. The traditional K-means algorithm randomly selects the initial clustering center,which is sensitive to the selection of the center point,easy to fall into the local optimal and low accuracy. Min_max method improves on this shortcoming,but both the original and improved K-means algorithms ignore the interaction between attributes.Therefore,this paper uses Pearson correlation coefficient formula to calculate the interactions between attributes and map these to the original data set. Meanwhile,the Min_max method is optimized with the idea of dual domain. Experimental results show that this method can achieve higher accuracy,better clustering effect and relatively fewer iterations.
作者
潘品臣
姜合
吕奕锟
PAN Pin-chen;JIANG He;LV Yi-kun(School of Computer Science & Technology,Qilu University of Technology(Shandong Academy of Sciences),Jinan 250353,China)
出处
《小型微型计算机系统》
CSCD
北大核心
2019年第6期1254-1259,共6页
Journal of Chinese Computer Systems
基金
国家自然科学青年基金项目(61502259)资助
作者简介
潘品臣,男,1994年生,硕士研究生,研究方向为数据挖掘技术;通讯作者:姜合,男,1964年生,硕士,教授,CCF会员,研究方向为数据挖掘技术、数据仓库技术,E-mail:jianghe09@126.com;吕奕锟,女,1993年生,硕士研究生,研究方向为数据挖掘技术.