期刊文献+

一种基于仿射传播的增强型流聚类算法 被引量:3

An Enhanced Stream Clustering Algorithm Based on Affinity Propagation
在线阅读 下载PDF
导出
摘要 针对目前流聚类算法无法有效处理数据流离群点的检测和处理,以及增量式数据流聚类效率较低等问题,提出了一种基于密度度量的异常检测、删除的增强型仿射传播流聚类算法。在仿射传播流聚类算法的基础上,所提算法通过引进异常检测和删除机制改善了异常点对聚类精度、聚类效率的影响。利用仿射传播聚类实现在线数据流的聚类过程,同时检测数据漂移现象,即数据流分布特征随时间发生变化,并采用基于密度度量的局部异常因子检测技术(LOF)对储备池数据进行异常检测和删除处理,通过对当前类簇和处理过的储备池数据重聚类来重建动态数据流模型。在真实网络数据(KDD’99)上进行了实验,结果表明,所提算法不仅减少了重聚类构建动态模型的次数,改善了聚类效率,而且在同时考虑聚类精度、纯度和熵3种聚类评价标准下,均优于传统的仿射传播流聚类算法。 Aiming at the problem that the traditional stream clustering algorithm cannot effectively deal with the inspection and treatment of outliers, and the incremental data stream clustering efficiency is low, an enhanced stream clustering algorithm based on affinity propagation using density measurement was proposed. Based on the STRAP, the proposed algorithm can improve the clustering accuracy and efficiency by introducing a mechanism for outlier detection and removal. Firstly, the online stream clustering process is realized by the affinity propagation algorithm. Meanwhile, the phenomenon of data drift is detected, i. e. , the distribution of data stream changes with time. In view of this phenomenon, the new algorithm can implement the outlier detection and removal in the reservoir based on local outlier factor, and then re-cluster the current cluster and the treated reservoir to reconstruct the dynamic stream clustering model. Finally, through the validation on the KDD' 99 data, the experimental results showed that the proposed algorithm not only reduces the number of re-clustering and improves the clustering efficiency, but also is superior to the STRAP in terms of the three clustering evaluation criteria, i. e. , the clustering accuracy, purity and entropy.
出处 《西安交通大学学报》 EI CAS CSCD 北大核心 2017年第3期105-110,共6页 Journal of Xi'an Jiaotong University
基金 国家自然科学基金资助项目(61371087 61531013) 国家"863计划"资助项目(2015AA015702)
关键词 流聚类 仿射传播 局部异常因子 异常删除 stream clustering affinity propagation local outlier factor outlier removal
作者简介 赵建龙(1992-),男,博士生; 曲桦(通信作者),男,教授,博士生导师。
  • 相关文献

参考文献1

二级参考文献19

  • 1罗会兰,孔繁胜,李一啸.聚类集成中的差异性度量研究[J].计算机学报,2007,30(8):1315-1324. 被引量:36
  • 2XU R, WUNSCH D. Survey of clustering algorithms [J].IEEE Transactions on Neural Networks, 2005, 16 (3):645-678.
  • 3OMRAN M G H, ENGELBRECHT A P, SALMAN A. An overview of clustering methods[J]. Intelligent Data Analysis, 2007, 11 (6): 583-605.
  • 4MACQUEEN J. Some methods for classification and analysis of multivariate observations[C]//Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability. Berkeley, California,USA: University of California Press, 1967: 281-297.
  • 5FRED A, JAIN A. Combining multiple clusterings using evidence accumulation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27 (6) : 835-850.
  • 6ZHOU Z H, TANG W. Clusterer ensemble [J ]. Knowledge-Based Systems, 2006, 19 (1): 77-83.
  • 7STREHL A, GHOSH J. Cluster ensembles: a knowledge reuse framework for combining multiple partitions [J]. The Journal of Machine Learning Research, 2002 (3) : 583-617.
  • 8FREY B J, DUECK D. Clustering by passing messages between data points [J]. Science, 2007, 315 (5814) : 972-976.
  • 9FREY B J, DUECK D. Response to comment on "clustering by passing messages between data points" [J]. Science, 2008, 319 (5864): 2.
  • 10MEZARD IVI. Computer science: where are the exemplars? [J]. Science, 2007, 315 (5814): 949-951.

共引文献9

同被引文献35

引证文献3

二级引证文献13

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部