期刊文献+

在大规模数据集上进行快速自适应同步聚类 被引量:11

Fast Adaptive Clustering by Synchronization on Large Scale Datasets
在线阅读 下载PDF
导出
摘要 现有的同步聚类方法Sync在同步过程中需要将样本中的每一个分量看作相位振子进行计算,具有较高的时间复杂度,因此在大规模数据集上聚类时具有相当大的局限性.为了解决这一问题,提出了快速自适应同步聚类方法(fast adaptive KDE-based clustering by synchronization,FAKCS).FAKCS首先引入基于压缩集密度估计和中心约束最小包含球技术的快速压缩方法对大规模数据集进行压缩,然后通过使用Davies-Bouldin指标,在压缩集上进行ε参数自适应的同步聚类,并采用新定义的序列参量来评价局部同步的程度.另外,研究了序列参量和核密度估计间的联系,从理论上揭示了样本点的局部同步在概率密度意义下的本质.FAKCS可以在大规模数据集上得到任意形状、个数、密度的聚类而无需预设聚类数目.在图像分割和大规模UCI数据集上的实验验证了FAKCS的有效性. The existing synchronization clustering algorithm Sync regards each attribute of a sample as a phase oscillator in the synchronization process.As a result,the algorithm has higher time complexity and can not be well used on large scale datasets.To solve this problem,we propose a novel fast adaptive clustering algorithm FAKCS in this paper.Firstly,FAKCS introduces a method based on RSDE and CCMEB technology to extract the samples from the original dataset.Then it begins clustering adaptively by using the Davies-Bouldin cluster criterion and the new order parameter which can observe the degree of local synchronization.Moreover,the relationship between the new order parameter and KDE is found in this paper,which reveals the probability density nature of local synchronization.FAKCS can detect clusters of arbitrary shape,number and density on large scale datasets without setting cluster number previously.The effectiveness of the proposed method has been demonstrated in image segmentation examples and experiments on large UCI datasets.
出处 《计算机研究与发展》 EI CSCD 北大核心 2014年第4期707-720,共14页 Journal of Computer Research and Development
基金 国家自然科学基金项目(61272210 61202311) 江苏省自然科学基金项目(BK2012552 BK2012209)
关键词 核密度估计 最小包含球 同步 压缩集密度估计 聚类 kernel density estimation minimal enclosing ball synchronization reduced set density estimator clustering
作者简介 (cslgywh@163.com)Ying Wenhao, born in 1979. PhD. Lecturer at the School of Computer Science and Engineering, Changshu Institute of Technology. His research interest covers pattern recognition, intelligent computation. Xu Min, born in 1980. PhD candidate at the School of Digital Media, Jiangnan University. Her research interest covers pattern recognition, intelligent computation (xum@wxit. edu. cn). Wang Shitong, born in 1964. Professor at the School of Digital Media, Jiangnan University. His research interest covers artificial intelligence, pattern recognition and bioinformatics (wxwangst@yahoo. com. cn). Deng Zhaohong, born in 1981. PhD. Associate professor at the School of Digital Media, Jiangnan University. His research interest covers fuzzy modeling and intelligent computation (dzh666828@yahoo. com. cn).
  • 相关文献

参考文献20

  • 1Jain A K, Murty M N, Flynn P J. Data clustering: A review [J]. ACM Computing Surveys, 1999, 31(3): 264-323.
  • 2孙吉贵,刘杰,赵连宇.聚类算法研究[J].软件学报,2008(1):48-61. 被引量:1083
  • 3王骏,王士同,邓赵红.聚类分析研究中的若干问题[J].控制与决策,2012,27(3):321-328. 被引量:196
  • 4B6hm C, Plant C, Shao J, et al. Clustering by synchronization [C]//Proc of the 16th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining. New York: ACM, 2010: 583-592.
  • 5Kim J, Scott C D. Lz kernel classification [J]. IEEE Trans on Pattern Analysis and Machine Intelligence, 2010, 32 (10) : 1822-1831.
  • 6Freedman D, Kisilev P. Fast data reduction via KDE approximation [C] //Proc of 2009 Data Compression Conference. Los Alamitos, CA: IEEE Computer Society, 2009, 445-445.
  • 7Chao H, Girolami M. Novelty detection employing an L2 optimal non-parametric density estimator [J]. Pattern Recognition Letters, 2004, 25(12), 1389-1397.
  • 8李存华,孙志挥,陈耿,胡云.核密度估计及其在聚类算法构造中的应用[J].计算机研究与发展,2004,41(10):1712-1719. 被引量:65
  • 9张廷宪,郑志刚.耦合非线性振子系统的同步研究[J].物理学报,2004,53(10):3287-3292. 被引量:15
  • 10Moreno Y, Pacheco A F. Synchronization of Kuramo to oscillators in scale-free networks[J].Euro Physics Letters, 2004, 68(4): 603-609.

二级参考文献79

  • 1邓赵红,王士同.鲁棒性的模糊聚类神经网络[J].软件学报,2005,16(8):1415-1422. 被引量:11
  • 2李洁,高新波,焦李成.基于特征加权的模糊聚类新算法[J].电子学报,2006,34(1):89-92. 被引量:114
  • 3王丽娟,关守义,王晓龙,王熙照.基于属性权重的Fuzzy C Mean算法[J].计算机学报,2006,29(10):1797-1803. 被引量:46
  • 4Xu R,Wunsch D. Survey of clustering algorithms[J]. IEEETrans on Neural Networks, 2005, 16(3): 645-678.
  • 5Jain A K, Murty M N, Flynn P J. Data clustering: Areview[J]. ACM Computing Surveys, 1999, 31(3): 264-323.
  • 6Jain A K. Data clustering: 50 years beyond κ-means[J].Pattern Recognition Letters, 2010, 31(8): 651-666.
  • 7Goldberger J, Tassa T. A hierarchical clustering algorithmbased on the Hungarian method[J]. Pattern RecognitionLetters, 2008, 29(1): 1632-1638.
  • 8Kumar P, Krishna P R, Bapi R S, et al. Rough clustering ofsequential data[J]. Data & Knowledge Engineering, 2007,3(2): 183-199.
  • 9Cilibrasi R L, Vitányi P MB. A fast quartet tree heuristicfor hierarchical clustering[J]. Pattern Recognition, 2011,44(3): 662-677.
  • 10Hathaway R J, Hu Y. Density-weighted fuzzy κ-meansclustering[J]. IEEE Trans on Fuzzy Systems, 2009, 17(1):243-252.

共引文献1385

同被引文献92

  • 1ARTHUR D, VASSILVITSKII S. k-means++: The advantages of careful seeding[ C ]//Proceedings of the eighteenth annu- M ACM-SIAM symposium on Discrete algorithms. Society for Industrial and Applied Mathematics, 2007: 1027-1035.
  • 2ZALIK K R. An efficient k-means clustering algorithm [ J]. Pattern Recognition Letters, 2008,29 (9) :1385-1391.
  • 3CAO F, LIANG J, JIANG G. An initialization method for the k-means algorithm using neighborhood model [ J ]. Comput- ers and Mathematics with Applications, 2009, 58 (3) : 474-483.
  • 4HUBERT L J, ARABIE P. Comparing partitions [ J ]. Journal of Classification, 1985, 2 (1) : 193-218.
  • 5DAVIES D L, BOULDIN D W. A cluster separation measure [ J ]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1979, 1(2) :224-227.
  • 6DUNN J C. A fuzzy relative of the isodata process and its use in detecting compact well-separated clusters [ J ]. Cybernet- ics and Systems, 1973, 3 (3) : 32-57.
  • 7BEZDEK J C, PAL N R. Some new indexes of cluster validity [ J ]. IEEE Transactions on Systems, Man, and Cybernet- ics, 1998, 28( 3): 301-315.
  • 8THOMAS H. CORME N, CHARLES E, et al. Rivest and Clifford Stein. Introduction to Algorithms[ M ]. 3rd Edition. The MIT Press,2009.
  • 9CHEN Xin-quan. Weighted clustering and evolutionary analysis of hybrid attributes data streams [ J ]. Journal of Comput- ers, 2008, 12(3) : 60-67.
  • 10FRANK A, ASUNCION A. UCI Machine Learning Repository [ EB/OL]. 2010 [2014-07-02]. http://ar- chive, ics. uci. edu/ml.

引证文献11

二级引证文献60

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部