期刊文献+

基于密度峰值的网格聚类算法 被引量:12

Grid clustering algorithm based on density peaks
在线阅读 下载PDF
导出
摘要 2014年提出的密度峰值聚类算法,思想简洁新颖,所需参数少,不需要进行迭代求解,而且具有可扩展性。基于密度峰值聚类算法提出了一种网格聚类算法,能够高效地对大规模数据进行处理。首先,将N维空间粒化为不相交的长方形网格单元;然后,统计单元空间的信息,利用密度峰值聚类寻找中心点的思想确定中心单元,即中心网格单元被一些低局部密度的数据单元包围,而且与比自身局部密度高的网格单元的距离相对较大;最后,合并与中心网格单元相近网格单元,从而得出聚类结果。在UCI人工数据集上的仿真实验结果表明,所提算法能够较快得出聚类中心,有效处理大规模数据的聚类问题,具有较高的效率,与原始的密度峰值聚类算法相比,在不同数据集上时间损耗降低至原来的1/100~1/10,而精度损失维持在5%~8%。 The Density Peak Clustering (DPC) algorithm which required few parameters and no iteration was proposed in 2014, it was simple and novel. In this paper, a grid clustering algorithm which could efficiently deal with large-scale data was proposed based on DPC. Firstly, the N dimensional space was divided into disjoint rectangular units, and the unit space information was counted. Then the central cells of space was found based on DPC, namely, the central cells were surrounded by other grid cells of low local density, and the distance with grid cells of high local density was relatively large. Finally, the grid cells adjacent to their central cells were merged to obtain the clustering results. The experimental results on UCI artificial data set show that the proposed algorithm can quickly find the clustering centers, and effectively deal with the clustering problem of large-scale data, which has a higher efficiency compared with the original density peak clustering algorithm on different data sets, reducing the loss of time 10 to 100 times, and maintaining the loss of accuracy at 5% to 8%.
出处 《计算机应用》 CSCD 北大核心 2017年第11期3080-3084,共5页 journal of Computer Applications
基金 国家自然科学基金资助项目(61572091) 重庆市研究生科研创新项目(CYB16106) 高端人才项目(RC2016005) 贵州省级重点学科(黔学位办[2013]18号)~~
关键词 密度峰值 网格粒化 大规模数据 聚类 density peak grid granulation large-scale data clustering
作者简介 杨洁(1987-),男,贵州遵义人,博士研究生,主要研究方向:粒计算、粗糙集、数据挖掘. 通信作者电子邮箱wanggy@ieee.org王国胤(1970-),男,重庆人,教授,博士,CCF会员,主要研究方向:粒计算、软计算、认知计算. 王飞(1989-),男,河南开封人,硕士研究生,主要研究方向:数据挖掘、粒计算。
  • 相关文献

参考文献3

二级参考文献31

  • 1江小平,李成华,向文,张新访,颜海涛.k-means聚类算法的MapReduce并行化实现[J].华中科技大学学报(自然科学版),2011,39(S1):120-124. 被引量:80
  • 2王伦文.聚类的粒度分析[J].计算机工程与应用,2006,42(5):29-31. 被引量:19
  • 3KAUFMAN L, ROUSSEEUW P J. Finding groups in data: an introduc- tion to duster analysis [ M]. New York: Wiley, 1990:126 - 163.
  • 4PARK H S, JUN C H. A simple and fast algorithm for K-medoids clustering [ J]. Expert Systems with Applications, 2009, 36(2) :3336 -3341.
  • 5ZADEH L A. Fuzzy sets and information granularity [ M]// Fuzzy Sets, Fuzzy Logic and Fuzzy Systems. River Edge, NJ: Word Sei- entitle, 1996:433-448.
  • 6DINGS F, XU L, ZHU H, et al. Research and progress of cluster algorithms based on granular computing [J]. International Journal of Digital Content Technology and its Applications, 2010, 4(5): 96 - 104.
  • 7XIE X L, BENI G. A validity measure for fuzzy clustering [ J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1991, 13(8): 841-847.
  • 8FRANK A, ASUNCION A. UCI machine learning repository [ EB/ OL]. [ 2011 - 11 - 02]. http://archive, ics. uci. edu/ml.
  • 9WILKINSONB ALLENM 陆鑫达 汤勇平 增志勇 译.并行程序设计[M].北京:机械工业出版社,2002..
  • 10Sheikholeslami G,Proceedings of the 2 4th VL DB Conference,1998年,428页

共引文献106

同被引文献92

引证文献12

二级引证文献22

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部