基于邻域标准差的密度调整谱聚类算法

Density Adjusted Spectral Clustering Algorithm Based on Neighborhood Standard Deviation

在线阅读下载PDF

导出

摘要针对谱聚类在尺度参数计算时需要人为设置近邻参数及聚类结果不稳定等问题,本文将初始类中心值和尺度参数作为决策变量,重点对谱聚类算法进行自适应优化与改进。首先,将样本邻域标准差的倒数作为度量样本局部密度的参数,与密度峰值思想相结合,设计了一种基于密度峰值的初始类中心决策值选择方法(initial class center decision value algorithm based on density peak,DP_KD),解决密度调整谱聚类中聚类结果不稳定的问题。其次,利用样本间的平均距离计算相应的邻域半径,并根据样本标准差自适应地求解每个样本的尺度参数,构造样本间的相似度矩阵,实现了近邻参数的自适应设置,解决尺度参数需要人为设置的问题。然后,基于优化后的初始类中心决策值和近邻参数方法,进一步调整高斯核函数,提出一种基于邻域标准差的密度调整谱聚类算法(density adjusted spectral clustering algorithm based on neighborhood standard deviation,DSSD),通过构建特征向量空间实现了密度谱聚类。最后,将提出的算法与其他聚类算法在多个数据集上进行了对比。结果表明,与其他谱聚类算法相比,本文提出的DSSD算法不仅具有更好的聚类效果,且聚类结果更加稳定,尤其是在类内密集且类间边缘明确的DIM512数据集中,DSSD算法可以正确地进行聚类分簇;在准确率、兰德系数和F-measure上较其他算法至少提升了0.0268、0.0136和0.0247,这表明DSSD算法不仅聚类效果较好且更适合大规模数据集的聚类分析。 Objective Clustering involves partitioning given data into multiple classification clusters without requiring knowledge of the true labels.In other words,clustering is an unsupervised learning process that ensures that samples within the same cluster are as similar as possible while those in different clusters are as dissimilar as possible.Traditional clustering algorithms can be categorized into five primary types:hierarchical,densitybased,grid-based,model-based,and partition-based algorithms.However,these traditional models are sensitive to the cluster number K,outliers,and noise points.In addition,the selection of the initial cluster center is set artificially,and the cluster shape cannot extend arbitrarily.Spectral clustering,a clustering algorithm based on spectral graph theory,transforms the clustering problem into an optimal partition problem of the graph.This approach has emerged as one of the most widely adopted techniques in various domains,including data mining,face recognition,and image segmentation.Methods The spectral clustering algorithm provides simplicity in implementation,robustness to high-dimensional data,and flexibility in handling diverse sample space shapes.This enables global optimization and often results in superior performance compared to traditional clustering algorithms.However,existing spectral clustering methods encounter challenges,such as determining the scale parameter,suboptimal clustering results in multi-scale datasets,and instability in the obtained clusters.A density-adjusted spectral clustering algorithm based on neighborhood standard deviation is proposed to address the unstable clustering results caused by the random selection of initial cluster centers and the artificial setting of scale parameters in spectral clustering.This approach incorporates both global and local data information.The proposed algorithm treats the initial class center value and scale parameter as decision variables to achieve adaptive optimization and enhance spectral clustering.The neighborhood labeling difference is employed to determine the initial cluster center value,while the scale parameter is optimized to improve the expression of the Gaussian kernel function.The calculation methods for the scale parameter,sample local density,and initial cluster center of the density peak are redefined using the neighborhood standard deviation.Therefore,the proposed spectral clustering algorithm eliminates the need for neighborhood parameter input and produces stable clustering results.First,an initial class center decision value algorithm based on density peak(DP_KD)is designed to resolve the issue of unstable clustering results in density-adjusted spectral clustering.First,the initial cluster center value determination method replaces the artificial setting of scale parameters.In this process,K points are randomly selected from the dataset as initial clustering centers and different classes are assigned based on each selected point.Then,the reciprocal of the sample neighborhood standard deviation is used as a parameter to measure the local density of the sample,while the average distance between samples is utilized to calculate the corresponding neighborhood radius of each sample.Simultaneously,the neighborhood standard deviation inverse is defined as the local sample density,and the sample with the highest density is assigned the minimum Euclidean distance among the samples.Finally,the product of sample local density and sample distance is defined as the initial class center determination valueλ.The samples with the largestλK are selected as the initial class centers for this algorithm,forming the center value Decision algorithm DP_KD.Results and Discussions The change of the scale parameter influences the construction of the similarity matrix,which affects the Laplacian matrix and the formation of the eigenvector space.In the spectral clustering algorithm,the similarity matrix is critical in determining the quality of the final clustering outcomes.Therefore,the Euclidean distance between a sample and its p-th nearest neighbor is introduced as the scale parameter for each sample,utilizing the neighbor information to enhance the similarity matrix and reduce the sensitivity of the spectral clustering algorithm to the scale parameter.In addition,based on the optimized initial class center decision value and the nearest neighbor parameter method,a densityadjusted spectral clustering algorithm based on neighborhood standard deviation(DSSD)is proposed to address the instability of clustering results caused by the random selection of initial cluster centers and the artificial setting of scale parameters.The proposed algorithm effectively finetunes the similarity between sample points in regions of varying density for multi-scale datasets.Conclusions Experimental results indicated that,compared to other spectral clustering algorithms,the proposed algorithm achieves improved clustering performance and yields more stable clustering results.When compared to IK_DM,IIEFA,OFMMK-means,classical clustering algorithms such as K-means and K-medoids,as well as the AP and SNN-MSC algorithms,the clustering performance of the DP_KD algorithm across seven UCI datasets is superior to that of the other algorithms.These findings demonstrate the feasibility and effectiveness of integrating the DP_KD algorithm into density-adjusted spectral clustering as a replacement for classical clustering algorithms.The experiments are conducted on seven UCI datasets:Iris,Wine,Seeds,Yeast,Glass,Ecoli,and Waveform.Four evaluation metrics,Accuracy(ACC),Purity,Rand Index(RI),and F-measure(F1),are employed to assess the clustering performance.A higher value for these evaluation indices signifies the superior performance of the proposed DP_KD algorithm.In general,the clustering effectiveness of the DP_KD algorithm across the seven UCI datasets surpasses that of the other algorithms.In addition,the DSSD algorithm is compared to four spectral clustering algorithms:ADSC,WDSC,SC,and PRSC.The experiments are conducted on seven UCI datasets:Sonar,Musk,Breast,Statlog,Madelon,Segmentation,and Waveform.The ACC index is utilized to evaluate the clustering effect of these spectral clustering algorithms.Experimental results demonstrated that the DSSD algorithm enhances the clustering effectiveness of density-adjusted spectral clustering and outperforms other advanced spectral clustering algorithms.

作者郭笑雨刘金金陈亚军李豪杰袁培燕赵晓焱 GUO Xiaoyu;LIU Jinjin;CHEN Yajun;LI Haojie;YUAN Peiyan;ZHAO Xiaoyan(College of Computer and Information Engineering,Henan Normal University,Xinxiang 453007,China;Engineering Laboratory of Intellectual Business and Internet of Things Technologies,Xinxiang 453007,China;National Digital Switching System Engineering&Technological R&D Center,Zhengzhou 450000,China)

机构地区河南师范大学计算机与信息工程学院智慧商务与物联网技术河南省工程实验室国家数字交换系统工程技术研究中心

出处《工程科学与技术》北大核心 2025年第2期40-53,共14页 Advanced Engineering Sciences

基金国家自然科学基金项目(62072159,61902112) 河南省科技攻关项目(222102210011,232102211061)。

关键词谱聚类密度调整邻域标准差自适应密度峰值 spectral clustering density adjustment neighborhood standard deviation adaptive density peak

分类号 TP391 [自动化与计算机技术—计算机应用技术]

作者简介郭笑雨(1995-),女,硕士生.研究方向:数据挖掘.E-mail:2108283044@stu.htu.edu.cn;通信作者:赵晓焱,副教授,E-mail:zhaoxiaoyan@htu.edu.cn。

引文网络
相关文献

参考文献18

1白璐,赵鑫,孔钰婷,张正航,邵金鑫,钱育蓉.谱聚类算法研究综述[J].计算机工程与应用,2021,57(14):15-26. 被引量：30
2孙吉贵,刘杰,赵连宇.聚类算法研究[J].软件学报,2008(1):48-61. 被引量：1085
3张朋,李小林,王李妍.基于DBSCAN的动态邻域密度聚类算法[J].计算机科学,2023,50(S01):599-605. 被引量：12
4祝鹏,郭艳光.基于K-medoids聚类算法的多源信息数据集成算法[J].吉林大学学报（理学版）,2023,61(3):665-670. 被引量：10
5张荣国,曹俊辉,胡静,张睿,刘小君.基于非负正交矩阵分解的多视图聚类图像分割算法[J].模式识别与人工智能,2023,36(6):556-571. 被引量：5
6谢娟英,屈亚楠.密度峰值优化初始中心的K-medoids聚类算法[J].计算机科学与探索,2016,10(2):230-247. 被引量：28
7邓玉芳,张继福.一种基于标准差的K-medoids聚类算法[J].计算机技术与发展,2020,30(8):53-60. 被引量：5
8Yongqiu Liu,Shaohui Zhong,Nasreen Kausar,Chunwei Zhang,Ardashir Mohammadzadeh,Dragan Pamucar.A Stable Fuzzy-Based Computational Model and Control for Inductions Motors[J].Computer Modeling in Engineering & Sciences,2024,138(1):793-812. 被引量：1
9马兆辉,侯冀超,谢成心,温秀梅.基于用户特征的谱聚类算法研究[J].信息技术与信息化,2023(10):104-108. 被引量：4
10王雅琳,陈斌,王晓丽,桂卫华.基于密度调整的改进自适应谱聚类算法[J].控制与决策,2014,29(9):1683-1687. 被引量：13

二级参考文献149

1周丽娟,王慧,王文伯,张宁.面向海量数据的并行KMeans算法[J].华中科技大学学报（自然科学版）,2012,40(S1):150-152. 被引量：33
2赵卓翔,王轶彤,田家堂,周泽学.社会网络中基于标签传播的社区发现新算法[J].计算机研究与发展,2011,48(S3):8-15. 被引量：37
3张敏,于剑.基于划分的模糊聚类算法[J].软件学报,2004,15(6):858-868. 被引量：176
4岳士弘,李平,郭继东,周水庚.A statistical information-based clustering approach in distance space[J].Journal of Zhejiang University-Science A(Applied Physics & Engineering),2005,6(1):71-78. 被引量：9
5张惟皎,刘春煌,李芳玉.聚类质量的评价方法[J].计算机工程,2005,31(20):10-12. 被引量：61
6李洁,高新波,焦李成.基于特征加权的模糊聚类新算法[J].电子学报,2006,34(1):89-92. 被引量：115
7普运伟,金炜东,朱明,胡来招.核模糊C均值算法的聚类有效性研究[J].计算机科学,2007,34(2):207-210. 被引量：28
8CHIANG Chingsan,CHU Shuchua,John F. Roddick,PAN Jengshyang.New Search Strategies and New Derived Inequality for Efficient K-Medoids-Based Algorithms[J].Chinese Journal of Electronics,2007,16(1):82-87. 被引量：3
9田铮,李小斌,句彦伟.谱聚类的扰动分析[J].中国科学（E辑）,2007,37(4):527-543. 被引量：33
10王玲,薄列峰,焦李成.密度敏感的谱聚类[J].电子学报,2007,35(8):1577-1581. 被引量：61

共引文献1266

1陈西江,安庆,班亚,王德欣,李坤,刘海鹏.融合高斯核及指数函数聚类的点云目标物提取[J].应用科学学报,2022,40(3):411-422.
2丁小军,陈杰,李霖,徐碧通,朱晓姝.一种基于聚类结果稳定性来确定聚类数的方法[J].玉林师范学院学报,2020(3):43-47. 被引量：1
3刘露,吴珏,杨雷,杨福军.基于谱聚类的Web多级缓存替换策略[J].计算机系统应用,2022,31(11):380-386. 被引量：1
4钟德超,陈洋宏.基于分位数算法的隧道预留变形量研究[J].现代隧道技术,2024,61(S01):362-368.
5王玥,李文权,梁爽,余静财.基于改进聚类算法的共享汽车网点选址研究[J].武汉理工大学学报,2021,43(2):79-85. 被引量：1
6林耿堃,盛积良.乡村振兴时代背景下农民消费结构变迁研究[J].农业农村部管理干部学院学报,2021(2):76-81. 被引量：3
7高显义,林欣晖.基于文本聚类的变电工程变更特征识别研究[J].建筑经济,2020,41(S02):200-203. 被引量：2
8毛颖颖,杨新凯.融合拓扑势的自适应层次聚类算法研究[J].计算机应用研究,2020,37(S01):37-39.
9张睿恺,吴克河.基于优化特征集的LeNet-5攻击检测模型的态势感知技术[J].计算机应用研究,2020,37(S01):287-289. 被引量：4
10李对红,王裴岩 ,张桂平,张少阳.基于字簇的多模型中文分词方法研究[J].计算机应用研究,2020,37(2):355-359. 被引量：2

1刘金金.优化初始类中心的自适应K-medoids算法[J].河南师范大学学报（自然科学版）,2025,53(1):106-115.
2刘召军.计算机网络入侵检测系统的优化与改进研究[J].微型计算机,2025(4):142-144.
3朵润民,王丽敏,陈志坚,唐涛.既有结构鉴定中混凝土强度取值问题的研究[J].广州建筑,2024,52(8):56-59.
4李可,储世伟,顾杰斐,宿磊,薛志钢.基于IDACL深度度量学习的零件表面缺陷检测[J].华中科技大学学报（自然科学版）,2024,52(6):32-38. 被引量：1
5杨彬,李勇.HT-L航天炉输煤工序大皮带改造小结[J].氮肥与合成气,2025,53(3):30-33.
6季铎,刘云钊,彭如香,孔华锋.基于主题词向量中心点的K-means文本聚类算法[J].计算机应用与软件,2024,41(10):282-286.
7Lu Shengyi.Lu Guo Helps:The Quirky Math of Odd Elementary School(Grades 1-2)[J].China Book International,2025(1):112-115.
8赵海军,陈华月,崔梦天.一种基于核数据变换方法的遥感图像谱聚类算法[J].林业工程学报,2025,10(2):130-137.
9黄何,张波.基于Mask R-CNN的管道裂缝检测[J].中文科技期刊数据库(全文版)自然科学,2020(11):00362-00363.
10杨治刚,郝建军,刘子涛.单激光双PSD深孔直线度检测系统设计[J].工具技术,2024,58(11):144-148.

工程科学与技术

2025年第2期

浏览历史

内容加载中请稍等...

基于邻域标准差的密度调整谱聚类算法

参考文献18

二级参考文献149

共引文献1266

相关作者

相关机构

相关主题

浏览历史