期刊文献+

BIRCH聚类算法优化及并行化研究 被引量:9

Research of BIRCH clustering algorithm optimization and parallelism
在线阅读 下载PDF
导出
摘要 为了提高聚类质量,针对BIRCH算法中在聚类精度方面所存在的不足,提出了聚类特征树中的不同簇应使用不同阀值的思想,较好地改善了对体积相差悬殊的簇不能很好聚类的问题。并且深入地研究和分析了如何在集群系统中进行快速聚类,提出了自定义数据类型、采用数据并行思想和非均匀数据划分策略等几点改进意见。最后实验结果表明,通过改进能够获得比较理想的运行时间和加速比性能。 To improve the quality ofclustering, consideringtheinsufficiency of clusteringprecisin which existsinthe BIRCH aglrithm, the idea of different threshold should be set in different cluster in CF-tree is implemented. An in-depth study and analysis is carried out on how to accelerate clustering in cluster system. Subsequently, some creative schemes such as custom datatype, data parallelism, and asymmetric data-partition are put forward. Finally, a result of the better performance is obtained after the improvement is verified by an experiment.
出处 《计算机工程与设计》 CSCD 北大核心 2007年第18期4345-4346,4369,共3页 Computer Engineering and Design
关键词 集群 数据挖掘 聚类 聚类质量 并行化 cluster data mining clustering quality of clustering parallelism
作者简介 朱映辉(1977-),男,广东梅州人,硕士,讲师,研究方向为分布式计算、数据挖掘;E-mail:zyh366@163.com 江玉珍(1977-),女,广东潮州人,硕士,讲师,研究方向为分布式计算、图像处理。
  • 相关文献

参考文献5

  • 1David Skillicon.Strategies for parallel data mining[J].IEEE Concurrency,2000,7(4):26-35.
  • 2Giuseppe Patane,Marco Russo.Parallel clustering on a commodity supercomputer[J].IEEE-INNS-ENNS International Joint Conference on Neural Networks(IJCNN'00),2000,3(3):575-580.
  • 3Guha U,Rastogi R,Shim K.CURE:An efficient clustering algorithm for large databases[J].Pergamon Information Systems,2001,26(1):35-61.
  • 4Barry Wilkinson,Michael Allen.Parallel programming:Techniques and applications using networked workstation and parallel computers[M].影印版.北京:高等教育出版社,2002.
  • 5周兵,沈钧毅,彭勤科.集群环境下的并行聚类算法[J].计算机工程,2004,30(4):4-6. 被引量:7

二级参考文献6

  • 1[1]Warschko T M, Blum J M, Tichy W F. ParaStation: Efficient Parallel Computing by Clustering Workstations: Design and Evaluation. Journal of Systems Architecture, 1998, 44:241-260
  • 2[2]Zhang Tian, Ramakrishnan R, Livny M. BIRCH: An Efficient Data Clustering Method for Very Large Databases. ACM 0-89791-794-4/96/0006, 1996
  • 3[3]Ganti V, Gehrke J, Ramakrishnan R. CACTUS-clustering Categorical Data Using Summaries. KD D-99, ACM 1-58113-143-7/99/08, 1999
  • 4[4]Wang W, Yang J, Muntz R. STING: A Statistical Information Grid Approach to Spatial Data Mining. 23rd VLDB Conference, 1997
  • 5[5]Cheng Chunhuang, Fu A W, Zhang Yi. Entropy-based Subspace Clustering for Mining Numerical Data. KD D-99, ACM 1-58113-143-7/99/08, 1999
  • 6[6]Boutsinas B,Gnardellis. On Distributing the Clustering Process. Pattern Recognition Letters, 2002,23: 999-1008

共引文献6

同被引文献56

  • 1蒋盛益,李庆华.聚类分析中的差异性度量方法研究[J].计算机工程与应用,2005,41(11):146-149. 被引量:4
  • 2任兴平,何忠龙,孟增辉.改进DBSCAN算法中参数Eps值的确定[J].现代电子技术,2007,30(11):120-121. 被引量:5
  • 3HANJ,KAMBERM.数据挖掘概念与技术[M].范明,孟小峰,译.北京:机械工业出版社,2006.
  • 4ZHANG TIAN, RAMAKRISHNAN R, LIVNY M. BIRCH: An efficient data clustering method for very large databases[ J]. ACM SIGMOD Record, 1996, 25(2) : 103 - 114.
  • 5HUANG ZHEXUE. Extensions to the k-means algorithm for clustering large data sets with categorical values[ J], Data Mining and Knowledge Discovery, 1998, 2(3) : 283 - 304.
  • 6MACQUEEN J. Some methods for classification and analysis of multivariate observations[ C]// Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability. Berkeley: University of California Press, 1967, 1 : 281 - 297.
  • 7HUANG ZHEXUE. A fast clustering algorithm to cluster very large categorical data sets in data mining[ C]// Proceedings of SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery. [S. l ] : ACM Press, 1997:1 -8.
  • 8MERZ C J, MERPHY P. UCI repository of machine learning databases[ EB/OL]. [ 2008 - 09 - 01 ]. http://www, ics. uci. edu/-mlearn/MLRRepository, html.
  • 9Leung K W T, Ng W, Lee D L. Personalized Concept-based Clustering of Search Engine Queries[J]. IEEE Transactions on Knowledge and Data Engineering, 2008, 20(11): 1505-1518.
  • 10冯兴杰,丁怡心.启发式多阈值BIRCH研究[J].中国民航大学学报,2007,25(5):30-32. 被引量:1

引证文献9

二级引证文献75

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部