摘要
针对谱聚类在尺度参数计算时需要人为设置近邻参数及聚类结果不稳定等问题,本文将初始类中心值和尺度参数作为决策变量,重点对谱聚类算法进行自适应优化与改进。首先,将样本邻域标准差的倒数作为度量样本局部密度的参数,与密度峰值思想相结合,设计了一种基于密度峰值的初始类中心决策值选择方法(initial class center decision value algorithm based on density peak,DP_KD),解决密度调整谱聚类中聚类结果不稳定的问题。其次,利用样本间的平均距离计算相应的邻域半径,并根据样本标准差自适应地求解每个样本的尺度参数,构造样本间的相似度矩阵,实现了近邻参数的自适应设置,解决尺度参数需要人为设置的问题。然后,基于优化后的初始类中心决策值和近邻参数方法,进一步调整高斯核函数,提出一种基于邻域标准差的密度调整谱聚类算法(density adjusted spectral clustering algorithm based on neighborhood standard deviation,DSSD),通过构建特征向量空间实现了密度谱聚类。最后,将提出的算法与其他聚类算法在多个数据集上进行了对比。结果表明,与其他谱聚类算法相比,本文提出的DSSD算法不仅具有更好的聚类效果,且聚类结果更加稳定,尤其是在类内密集且类间边缘明确的DIM512数据集中,DSSD算法可以正确地进行聚类分簇;在准确率、兰德系数和F-measure上较其他算法至少提升了0.0268、0.0136和0.0247,这表明DSSD算法不仅聚类效果较好且更适合大规模数据集的聚类分析。
Objective Clustering involves partitioning given data into multiple classification clusters without requiring knowledge of the true labels.In other words,clustering is an unsupervised learning process that ensures that samples within the same cluster are as similar as possible while those in different clusters are as dissimilar as possible.Traditional clustering algorithms can be categorized into five primary types:hierarchical,densitybased,grid-based,model-based,and partition-based algorithms.However,these traditional models are sensitive to the cluster number K,outliers,and noise points.In addition,the selection of the initial cluster center is set artificially,and the cluster shape cannot extend arbitrarily.Spectral clustering,a clustering algorithm based on spectral graph theory,transforms the clustering problem into an optimal partition problem of the graph.This approach has emerged as one of the most widely adopted techniques in various domains,including data mining,face recognition,and image segmentation.Methods The spectral clustering algorithm provides simplicity in implementation,robustness to high-dimensional data,and flexibility in handling diverse sample space shapes.This enables global optimization and often results in superior performance compared to traditional clustering algorithms.However,existing spectral clustering methods encounter challenges,such as determining the scale parameter,suboptimal clustering results in multi-scale datasets,and instability in the obtained clusters.A density-adjusted spectral clustering algorithm based on neighborhood standard deviation is proposed to address the unstable clustering results caused by the random selection of initial cluster centers and the artificial setting of scale parameters in spectral clustering.This approach incorporates both global and local data information.The proposed algorithm treats the initial class center value and scale parameter as decision variables to achieve adaptive optimization and enhance spectral clustering.The neighborhood labeling difference is employed to determine the initial cluster center value,while the scale parameter is optimized to improve the expression of the Gaussian kernel function.The calculation methods for the scale parameter,sample local density,and initial cluster center of the density peak are redefined using the neighborhood standard deviation.Therefore,the proposed spectral clustering algorithm eliminates the need for neighborhood parameter input and produces stable clustering results.First,an initial class center decision value algorithm based on density peak(DP_KD)is designed to resolve the issue of unstable clustering results in density-adjusted spectral clustering.First,the initial cluster center value determination method replaces the artificial setting of scale parameters.In this process,K points are randomly selected from the dataset as initial clustering centers and different classes are assigned based on each selected point.Then,the reciprocal of the sample neighborhood standard deviation is used as a parameter to measure the local density of the sample,while the average distance between samples is utilized to calculate the corresponding neighborhood radius of each sample.Simultaneously,the neighborhood standard deviation inverse is defined as the local sample density,and the sample with the highest density is assigned the minimum Euclidean distance among the samples.Finally,the product of sample local density and sample distance is defined as the initial class center determination valueλ.The samples with the largestλK are selected as the initial class centers for this algorithm,forming the center value Decision algorithm DP_KD.Results and Discussions The change of the scale parameter influences the construction of the similarity matrix,which affects the Laplacian matrix and the formation of the eigenvector space.In the spectral clustering algorithm,the similarity matrix is critical in determining the quality of the final clustering outcomes.Therefore,the Euclidean distance between a sample and its p-th nearest neighbor is introduced as the scale parameter for each sample,utilizing the neighbor information to enhance the similarity matrix and reduce the sensitivity of the spectral clustering algorithm to the scale parameter.In addition,based on the optimized initial class center decision value and the nearest neighbor parameter method,a densityadjusted spectral clustering algorithm based on neighborhood standard deviation(DSSD)is proposed to address the instability of clustering results caused by the random selection of initial cluster centers and the artificial setting of scale parameters.The proposed algorithm effectively finetunes the similarity between sample points in regions of varying density for multi-scale datasets.Conclusions Experimental results indicated that,compared to other spectral clustering algorithms,the proposed algorithm achieves improved clustering performance and yields more stable clustering results.When compared to IK_DM,IIEFA,OFMMK-means,classical clustering algorithms such as K-means and K-medoids,as well as the AP and SNN-MSC algorithms,the clustering performance of the DP_KD algorithm across seven UCI datasets is superior to that of the other algorithms.These findings demonstrate the feasibility and effectiveness of integrating the DP_KD algorithm into density-adjusted spectral clustering as a replacement for classical clustering algorithms.The experiments are conducted on seven UCI datasets:Iris,Wine,Seeds,Yeast,Glass,Ecoli,and Waveform.Four evaluation metrics,Accuracy(ACC),Purity,Rand Index(RI),and F-measure(F1),are employed to assess the clustering performance.A higher value for these evaluation indices signifies the superior performance of the proposed DP_KD algorithm.In general,the clustering effectiveness of the DP_KD algorithm across the seven UCI datasets surpasses that of the other algorithms.In addition,the DSSD algorithm is compared to four spectral clustering algorithms:ADSC,WDSC,SC,and PRSC.The experiments are conducted on seven UCI datasets:Sonar,Musk,Breast,Statlog,Madelon,Segmentation,and Waveform.The ACC index is utilized to evaluate the clustering effect of these spectral clustering algorithms.Experimental results demonstrated that the DSSD algorithm enhances the clustering effectiveness of density-adjusted spectral clustering and outperforms other advanced spectral clustering algorithms.
作者
郭笑雨
刘金金
陈亚军
李豪杰
袁培燕
赵晓焱
GUO Xiaoyu;LIU Jinjin;CHEN Yajun;LI Haojie;YUAN Peiyan;ZHAO Xiaoyan(College of Computer and Information Engineering,Henan Normal University,Xinxiang 453007,China;Engineering Laboratory of Intellectual Business and Internet of Things Technologies,Xinxiang 453007,China;National Digital Switching System Engineering&Technological R&D Center,Zhengzhou 450000,China)
出处
《工程科学与技术》
北大核心
2025年第2期40-53,共14页
Advanced Engineering Sciences
基金
国家自然科学基金项目(62072159,61902112)
河南省科技攻关项目(222102210011,232102211061)。
关键词
谱聚类
密度调整
邻域标准差
自适应
密度峰值
spectral clustering
density adjustment
neighborhood standard deviation
adaptive
density peak
作者简介
郭笑雨(1995-),女,硕士生.研究方向:数据挖掘.E-mail:2108283044@stu.htu.edu.cn;通信作者:赵晓焱,副教授,E-mail:zhaoxiaoyan@htu.edu.cn。