期刊文献+

基于KD树的k-means聚类算法优化 被引量:6

Optimization of k-means clustering algorithm based on KD-tree
在线阅读 下载PDF
导出
摘要 作为模式识别最基本的分类方法之一,聚类在各个科学领域的数据分析中都扮演着重要的角色。然而随着大数据的出现,聚类分析在前沿发展中不断地面临着计算复杂度和计算成本等新的问题和挑战。通过研究k-means聚类算法的时间复杂度O(nk),针对迭代过程中大量的最近邻计算和其特殊场景,引入KD树作为索引,提出了基于单KD树的近似近邻算法和基于多KD树的交叉搜索算法。将k-means聚类算法的时间复杂度降为O(nlog k),并通过实验验证,基于多树的交叉搜索算法具有与k-means聚类算法相当的聚类质量。 As one of the most basic classification methods for pattern recognition,clustering plays an important role in data analysis in various scientific fields.However,with the emergence of big data,clustering analysis continues to face new problems and challenges in frontier development such as computing complexity and computational cost.By studying the time complexity O(nk)of the k-means clustering algorithm,we introduce the KD-tree as an index for the large number of nearest neighbor calculations,which scenario is special,in the iterative process,and propose approximate nearest neighbor search algorithms based on a single KD-tree or multiple KD-trees.The algorithms reduce the time complexity of the k-means clustering algorithm to O(nlog k).It is verified by experiments that the algorithm based on multiple KD-trees has the comparable clustering quality with the k-means clustering algorithm.
作者 薛丁文 李建中 XUE Dingwen;LI Jianzhong(Department of Computer Science and Technology,Harbin Institute of Technology,Harbin 150001,China)
出处 《智能计算机与应用》 2021年第11期194-197,共4页 Intelligent Computer and Applications
关键词 聚类分析 K-MEANS聚类 KD树 近似近邻 clustering analysis k-means clustering KD-tree approximate nearest neighbor
作者简介 薛丁文(1995-),男,博士研究生,主要研究方向:海量数据聚类分析;李建中(1950-),男,教授,博士生导师,主要研究方向:海量数据计算、无线传感网络。
  • 相关文献

参考文献2

二级参考文献155

  • 1Shoshani A. Statistical databases: characteristics, problems, and some solutions. In: Proceedings of the 8th Interna- tional Conference on Very Large Data Bases, Mexico City, 1982. 208-222.
  • 2Shoshani A, Olken F, Wong H K T. Characteristics of scientific databases. In: Proceedings of the 10th International Conference on Very Large Data Bases, Singapore, 1984. 147-160.
  • 3Shoshani A, Wong H K T. Statistical and scientific database issues. IEEE T~'ans Softw Eng, 1985, 11:1040-1047.
  • 4Turing A M. On computable numbers, with an application to the entscheidungs problem. Proc London Math Soc, 1936, 2:230-265.
  • 5李建中.大数据计算的挑战.见:香山科学会议,北京,2012.
  • 6李建中.大数据计算的基本概念与研究问题.见:国家基金委第89期双清论坛,上海,2014.
  • 7Li J Z. Complexity, algorithms and quality of big data intensive computing. In: Proceedings of the 19th International Conference on Database Systems for Advanced Applications, Bali, 2014. 230-265.
  • 8李建中.大数据计算的研究问题和部分解.见:第30届中国数据库学术会议,哈尔滨,2013.
  • 9Kleene S C. General recursive functions of natural numbers. MATH ANN, 1936, 112:727-742.
  • 10Post E L. Finite combinatory processes-formulation 1. J Symb Log, 1936, 1:103-105.

共引文献223

同被引文献50

引证文献6

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部