期刊文献+

中文医学细粒度知识表示体系与标注语料库构建 被引量:4

Fine-grained Chinese Medical Knowledge:A Representation System and an Annotated Corpus
在线阅读 下载PDF
导出
摘要 面向医学知识的细粒度、可共享性与高精准性的需求,该文提出了中文医学文本知识表示体系,融合了电子病历、医学书籍与专业医学网站文本三个数据来源的医疗知识。该体系包括9类医学实体、60类实体关系。基于此,开发了可操作性高的标注工具,并为每种来源提供了规范标注的医学文本数据,构建了涵盖范围广、一致性高的细粒度标注语料库。4名临床医生对《诊断学》书籍标注了6526个医学实体,4229条关系,标注一致性可达0.974。三个数据源融合后实体数量344475个,关系数量3196787条。该文综述了数据源融合的映射过程、标注细则,分析了各数据源的文本特点并总结标注模式,通过应用场景与文本特点表明医学书籍标注必要性。该文为中文医学语料库构建提供标注规范,并为中文医学实体识别与关系抽取提供语料支持。 To build a fine-grained,sharable,and high-quality knowledge base in the medical field,we propose a Chinese medical knowledge representation system to cover Chinese clinical texts including electronic medical records,books,and professional medical web text data.This system defines 9 entity types and 60 entity relation types.Then we develop a highly operable annotation tool and construct a public available annotated corpus with wide coverage and high consistency for all three text sources.Four annotators annotate the Chinese medical book named“Diagnostics”with 0.974 inter-annotator agreement,generating altogether 6526 medical entities and 4229 entity relations.The whole corpus consists of 344475 medical entities and 3196787 entity relations without duplications.The paper presents the mapping scheme,annotation rules for knowledge fusion,as well as the text characteristics of each data source.As a pioneering work for Chinese corpus of medical entity recognition and relation extraction,this paper provides an annotation standard for Chinese medical construction.
作者 杨洋 关毅 李雪 姜京池 史怀璋 柳曦光 YANG Yang;GUAN Yi;LI Xue;JIANG Jingchi;SHI Huaizhang;LIU Xiguang(Department of Computer Science,Harbin Institute of Technology,Harbin,Heilongjiang 150001,China;Department of Neurosurgery,First Hospital of Harbin Medical University,Harbin,Heilongjiang 150030,China;Department of Dermatology,HcilongjiangProvincial Hospital,Harbin,Heilongjiang 150030,China)
出处 《中文信息学报》 CSCD 北大核心 2023年第6期52-66,共15页 Journal of Chinese Information Processing
基金 国家自然科学基金(62006063) 黑龙江省博士后科学基金(LBH-Z20015)。
关键词 细粒度标注规范 多源医疗文本 语义标注 语料库构建 fine-grained annotation standard multi-source medical text semantic annotation corpus construction
作者简介 杨洋(1992—),博士研究生,主要研究领域为医疗信息学、知识工程、自然语言处理。E-mail:yangyang_hit_wi@163.com;关毅(1970—),教授,副研究员,主要研究领域为医疗信息学、知识工程、自然语言处理。E-mail:guanyi@hit.edu.cn;李雪(1997—),博士研究生,主要研究领域为医疗信息学、自然语言处理。E-mail:li20s103245@163.com。
  • 相关文献

参考文献3

二级参考文献34

  • 1.中国大百科全书之医学卷[M].中国大百科全书出版社,1998..
  • 2Lu R Q. New Approaches to Knowledge Acquisition. World Scientific Publishers, 1992.
  • 3Welty C. The Ontological Nature of Subject Taxonomies. In:Proc. of the First Intl. Conf. (FOIS'98),June 6-8,Trento,ltaly.317~327.
  • 4Guarino N. Formal Ontology and Information Systems. In: Proc.of the First Intl. Conf. (FOIS'98),June 6-8,Trento,ltaly. 3~15.
  • 5Chaudhri V K, Farquhar A, et al. The Generic Frame Protocol 2.0:[SRI International Technical Report]. 1997.
  • 6Cao Cungen. Extracting and Sharing Medical Knowledge.Journal of Computer Science and Technology, 2002,3.
  • 7Cao Cungen, et al. Progress in the Development Of National Knowledge Infrastructure. Journal of Computer Science and Technology, 2002, 17(5).
  • 8Bowden P R, Halstead P, Rose T G. Extracting Conceptual Knowledge from Text Using Explicit Relation Markers. In:N.Shadbolt,K. Ohara,G. Schreiber, eds. Advances in Knowledge Acquisition. Lecture Notes In Artificial Intelligence, Springer-Verlag ,Berlin, 1996, 1076 : 147 ~ 162.
  • 9Hahn U, Schnattinger K,Romacker M. Automatic Knowledge Acquisition from Medical Texts. Text Knowledge Engineering Lab,1996.
  • 10.《疾病和有关健康问题的国际统计分类》第十次修订本[M].世界卫生组织,..

共引文献254

同被引文献80

引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部