摘要
聚类分析是统计、模式识别和数据挖掘等领域中一个非常重要的研究课题,具有广泛的应用前景.受物理学中场论思想的启发,提出一种基于数据场的层次聚类方法.该方法将物质粒子间的相互作用及其场描述方法引入抽象的数域空间,通过模拟对象在虚拟数据场中的相互作用和运动实现数据对象的自组织层次聚集.实验显示,该方法不依赖于用户输入参数的仔细选择,能够发现任意大小和密度的非球形聚类,对噪声数据不敏感,且具有近似线性的收敛速度.
Clustering is a promising application area for many fields including statistics,pattern recognition,data mining, etc. The effectiveness and efficiency of existing clustering techniques, however, is somewhat limited, owing to the huge amounts data collected in databases. According the theory of fields in physics, a hierarchical clustering method based on data fields is presented. The basic idea is that the field models is introduced to describe the virtual interaction among data objects in data space and the hierarchical partitioning of the original dataset is then performed by iteratively simulating the interaction and movement of the data objects in the fields. Experimental results show that the proposed approach not only enjoys favorite clustering quality and requires no careful parameters tuning, but also has a time complexity approximately linear with respect to the size of dataset.
出处
《电子学报》
EI
CAS
CSCD
北大核心
2006年第2期258-262,共5页
Acta Electronica Sinica
基金
国家自然科学基金(No.60375016
No.60496323)
关键词
聚类分析
层次聚类
数据场
cluster analysis
hierarchical clustering
data field
作者简介
淦文燕 女,1971年生于江西九江,博士后,主要研究方向为数据挖掘、数字水印。复杂网络。E-mail:wenyangan@163.com。
李德毅 男,1944年生于江苏镇江,博士生导师。中国工程院院士,主要研究方向为人工智能、数据挖掘、指挥自动化、智能控制。