期刊文献+

基于XML结构相似性的名老中医病例聚类分析 被引量:7

Structure-similarity cluster analysis on herbal instances based on XML
在线阅读 下载PDF
导出
摘要 针对名老中医病例的结构特点,设计了一种基于模拟退火的聚类算法对数据库中病例聚类进行全局优化。病例聚类时,根据一般意义的树间编辑距离,提出一种用于判断XML描述的病例之间是否相似的度量(称为XML编辑距离)。利用XML编辑距离,可将XML数据间相似性度量的时间复杂度限制在多项式级,且能保持病例的XML描述文档的节点语义信息和节点间的祖孙嵌套关系。最后,在Tamino数据库上进行实验,结果证实了基于模拟退火的病例聚类算法在名老中医数据挖掘实践中的可行性和有效性。 According to characters of herbalist instances, the article designed a clustering arithmetic which based on simulating anneal to optimize the herbalist instances in database. It provided a measurement which consulted usual editing distance between trees to estimate similar degree between instances described by XML( named for XML editing distance). If made full use of XML editing distance, the time complexity which calculated similar degree between XML data could keep in multinomial level, furthermore, the semantic of nodes in document described by XML and the nested relationship among nodes could be preserved, Finally, the test performing in Tamino database gets a good result and proves that it is a feasible and effective clustering arithmetic.
出处 《计算机应用研究》 CSCD 北大核心 2008年第2期365-367,共3页 Application Research of Computers
基金 国家自然科学基金资助项目(60503024 60374032) 国家"十五"科技攻关计划资助项目(2004BA721A01H07)
关键词 名老中医病例 可扩展标记语言描述文档 可扩展标记语言编辑距离 聚类算法 模拟退火算法 herbalist instance XML document XML editing distance ( XED ) clustering arithmetic simulating anneal arithmetic
作者简介 班晓娟(1970-),女,天津人,副教授,博士,主要研究方向为人工智能、数据挖掘等; 马骥(1980-),男,河北秦皇岛人,硕士研究生,主要研究方向为基于XML的数据挖掘(maji_1980@yahoo.com.cn); 尹怡欣(1957-),男,河北沧州人,教授,博士,主要研究方向为人工智能、智能控制; 张德政(1963-),男,北京人,副教授,博士,主要研究方向为数据挖掘等.
  • 相关文献

参考文献4

二级参考文献29

  • 1Ivar J. Software reuse: Architecture, process and organization for business success. Reading: Addison-Wesley Publishing Company,1997.4~15.
  • 2Mill H, Mili A. Reuse based software engineering. New York: John Wiley & Sons Inc., 2002. 444-459.
  • 3Frakes WB, Pole TP. An empirical study of representation methods for reusable software components. IEEE Transactions on Software Engineering, 1994,120(8):617~630.
  • 4Gibb F, McCartan C, O'Donnell R, Sweeney N, Leon R. The integration of information retrieval techniques within a software reuse environment. Journal of Information Science, 2000,26(4):520--539.
  • 5Torshen S. ApproXQL: Design and implementation of an approximate pattern matching language for XML. Technical Report, B 01-02, Freie University at Berlin, 2001.
  • 6Thorsten R. A new measure of the distance between ordered trees and its applications. Research Report, 85166, Department of Computer Science, University of Bonn, 1997.
  • 7Torshen S, Naumann F. Approximate tree embedding for querying XML data. In: Proceedings of ACM SIGIR Workshop on XML and Information Retrieval. Athens, 2000.
  • 8Zhang KZ. On the editing distance between unordered labeled trees. Information Processing Letters, 1992,42(3):133~139.
  • 9Wang YF. Research on retrieving reusable components classified in faceted scheme [Ph.D. Thesis]. Shanghai: Fudan University,2002 (in Chinese with English abstract).
  • 10Chang JC, Li KQ, Ouo LF, Mei H, Yang FQ. Representing and retrieving reusable software components in JB (Jadebird) system.Electronic Journal, 2000,28(8):20-24 (in Chinese with English abstract).

共引文献109

同被引文献185

引证文献7

二级引证文献85

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部