摘要
针对大多数子空间聚类方法处理非线性数据时聚类效果不理想、不同子空间数据相似性较高及聚类发生错误时无法及时校验的问题,提出局部加权最小二乘回归的重叠子空间聚类算法.利用K近邻思想突出数据的局部信息,取代非线性数据结构,通过高斯加权的方法选择最相似的近邻数据点,得到最优表示系数.然后使用重叠概率模型判断子空间内数据的重叠部分,再次校验聚类结果,提高聚类准确率.在人造数据集和真实数据集上分别进行测试,实验表明,文中算法能够取得较理想的聚类结果.
Most subspace clustering methods can not deal with nonlinear data satisfactorily, and the data in different subspaces possess higher similarity and clustering error can not be verified in time. Aiming at these problems, an overlapping subspace clustering algorithm based on local weighted least squares regression(LWLSR) is proposed. The k-nearest neighbor(KNN) is introduced to highlight the local information of data and replace the nonlinear data structure. The nearest neighbor data points are selected by the Gaussian weighting method to obtain the optimal representation coefficients. Then, an overlapping probability model is employed to determine the overlap of the data in the subspace, and the clustering results are rechecked to improve the clustering accuracy. The experimental results on both artificial datasets and real-world datasets show that the proposed algorithm achieves better clustering results.
出处
《模式识别与人工智能》
EI
CSCD
北大核心
2018年第2期114-122,共9页
Pattern Recognition and Artificial Intelligence
基金
国家自然科学基金青年科学基金项目(No.61401185)资助~~
关键词
重叠子空间聚类
K近邻
高斯加权
重叠概率模型
Overlapping Subspace Clustering, K-Nearest Neighbor, Gaussian Weighting, OverlappingProbability Model
作者简介
邱云飞(通讯作者),博士,教授,主要研究方向为数据挖掘、智能数据处理.E—mail:7415575@qq.com.;费博雯,博上研究生,主要研究方向为数据挖掘、智能数据处理.E—mail:feibowen2098@163.com.;刘大千,博士研究生,主要研究方向为图像与视觉信息计算、目标的检测与跟踪.E—mail:liudaqianIntu@163.com.