摘要
针对密度聚类算法在聚类过程中存在的参数设置敏感、收敛时间长等问题,提出了一种改进密度聚类算法。首先使用自定义密度公式计算样本密度,得出候选代表点集合;再选取与其它候选代表点距离之和最小对象为首个初始聚类中心,使用最大乘积法完成初始中心选择;在簇中心更新环节,将与簇内均值最小距离的对象作为该簇的临时中心,使用最小距离法划分样本至所属簇中;重复该环节,直到收敛。在UCI数据集上的测试结果表明,改进密度算法相对K-means算法和其它两种改进算法具有更好的稳定性、更高的聚类准确率和更少的聚类耗时。
Aiming at the problems of parameter setting sensitivity and long convergence in the clustering process of density clustering algorithm,an improved density clustering algorithm is proposed.Firstly,the sample density is calculated by using the user-defined density formula to get the set of candidate representative points.Then the object with the minimum distance from other candidate representative points is selected as the first initial clustering center with the maximum product method.In the process of updating the cluster center,the object with the minimum distance from the average value in the cluster is taken as the temporary center of the cluster,and the sample is divided into the cluster by the minimum distance method.The process is repeated until convergence.The test results on UCI dataset show that the improved density algorithm has better stability,higher clustering accuracy and less clustering time than k-means algorithm and the other two improved algorithms.
作者
段桂芹
DUAN Guiqin(Computer And the Information Engineering Institute,Guangdong Songshan Polytechnic College,Shaoguan 512126,China)
出处
《智能计算机与应用》
2021年第12期82-86,共5页
Intelligent Computer and Applications
基金
广东省普通高校特色创新项目(2021KTSCX227)
韶关市科技计划项目(200811224533986)
韶关市科技计划项目(210718114531595)
广东省普通高校重点领域专项(2021ZDZX1124)。
关键词
聚类
密度聚类
簇内均值最近点
候选代表点
clustering
density clustering
nearest point of mean in cluster
candidate representative points
作者简介
段桂芹(1979-),女,硕士,讲师,主要研究方向:数据挖掘、计算机教育。