摘要
为了解决海量数据分析中的非凸状等复杂聚类问题,同时兼顾聚类算法运算速度,提出了一种新的基于竞争思想的快速分级聚类算法.首先,根据给定邻域半径对数据进行第1级分类;然后,在第1级聚类的基础上,基于数据竞争的思想,以簇间数据密度为依据,设立第1级聚类生成的小簇之间小簇联系性权重的增加准则;最后,依据该准则计算有联系的小簇之间联系权重,对达到权重阈值的小簇进行合并,从而解决非凸状等复杂聚类问题.仿真实验表明,算法的聚类精度和抗噪声能力均优于传统的K-means算法和基于密度的DBSCAN(densitybased spatial clustering of applications with noise)算法.由于算法复杂度较低,算法对于大数据的聚类分析将会具有更好的适用性.
We propose a new hierarchical clustering algorithm based on competition theory to solve the issue of nonconvex and other complex clustering for massive data analysis with efficient computation.First,we separate the data into a number of sub-clusters according to a given rudimentary clustering radius.Then,on the basis of the first-level clustering,we establish a criterion for strengthening the inter-cluster association weight based on the idea of data competition depending on the data density between the sub-clusters.Finally,the sub-clusters with qualified association weights are grouped into resultant clusters to solve complex clustering problems,such as non-convex clustering.The clustering accuracy and anti-noise capability of the new hierarchical clustering algorithm are superior to those of the traditional K-means algorithm and density-based DBSCAN clustering algorithms.Given the low complexity of the algorithm,the proposed algorithm can be used in clustering analysis of big data.
出处
《信息与控制》
CSCD
北大核心
2017年第5期614-619,626,共7页
Information and Control
基金
浙江省公益技术研究社会发展项目(2013C33069)
浙江省科技项目(2013C33083)
三门县科技计划项目(12401)
关键词
分级聚类
复杂聚类
竞争算法
联系性权重
类合并
hierarchical clustering
complex clustering
competition algorithm
link weight
class merging