基于Boot Strapping的中文实体关系自动生成被引量：3

Boot Strapping-based Automatic Chinese Entities Relation Extraction

在线阅读下载PDF

导出

摘要针对中文信息抽取系统中建立提取事件模板的难点问题,基于Bootstrapping思想,提出一种简单、可行的实体关系自动生成方法,利用由种子词和种子模板组成的知识库建立学习器,采用标量聚类的方法,通过种子模板抽取更多的与种子词相似语义关系的特征词。在此基础上,利用最近邻居的原则,进而生成更多的抽取模板。丰富了知识库,为分析二元实体关系奠定基础,使得生成复杂的消息模板成为可能,同时极大地减轻手工建立模板的复杂度,有利于系统进行移植。 A method of Chinese automatic entities relation extraction is proposed in this paper based on Bootstrapping algorithm in order to solve the problem of event template extraction in Information Extraction （IE） systems. This method makes use of seed words and seed patterns to build a learning program, which extracts more characteristic words using Scalar Clusters. These characteristic words have semantic similarity with seed words. Then more extraction patterns could be learned automatically and added to the knowledge database, which is a foundation for analysis of two-entity relation and makes it possible that complex event template could be acquired automatically. This method reduces greatly the working load in manually constructing patterns and makes IE systems more feasible and portable.

作者张素香李蕾秦颖钟义信

机构地区北京邮电大学信息工程学院智能科学技术研究中心

出处《微电子学与计算机》 CSCD 北大核心 2006年第12期15-18,共4页 Microelectronics & Computer

基金国家863计划重大项目(2001AA114210)

关键词 BOOT Strapping 种子词种子模板标量聚类 Boot strapping, Seed word, Seed pattern, Scalar cluster

分类号 TP391 [自动化与计算机技术—计算机应用技术]

作者简介张素香（1973-），博士研究生，讲师。研究方向为自然语言理解、机器学习。

引文网络
相关文献

参考文献7

1C Aone,M Ramos-Santacruz.Rees:A large-scale relation and event extraction system.In Proceddings of the 6th Applied Natural Language Processing Conference,2000:76～83
2Chieu H,H Ng.A maximum entroy approach to information extraction from semi-structured and free text,In Proceedings of the Enghteenth International Conference on Artificial Intelligence (AAAI-02),Edmonton,Canada.2002
3Dmitry Zelenko,Chinatsu Aone,Anthony Richardella.Kernel methods for relation extraction.Journal of Machine Learning Research 3,2003:1083～1106
4Yangarber R,R Grishman,P Tapanainen,S Huttunen.Unsupervised discovery of scenario-level patterns for information extraction.In Proceedings of the Applied Natural Language Processing Conference (ANLP).Seattle,WA,2000
5Roman Yangarber,Ralph Grishman.Machine learning of extraction patterns from unannotated corpora.Proc.Workshop on Machine.Learning and Information Extraction,AAAI,1999
6袁里驰,钟义信.基于相似度的词聚类算法[J].微电子学与计算机,2005,22(8):93-95. 被引量：4
7李宝敏.基于语义的Internet研究[J].微电子学与计算机,2005,22(9):130-133. 被引量：4

二级参考文献15

1杜文华.语义网描述语言比较研究[J].情报杂志,2004,23(9):40-42. 被引量：8
2John F Swn. Knowledge Representation. Thomson Publishng House, 2003.
3Guus Schreiber. Knowledge Engineering and Management.MIT Press Publishng House, 2003.
4Ido Dagan, et al. Context Word Similarity and Estimation From Sparse Data [J]. Computer Speech and Language,2001, 9(2): 123-152.
5Firth, John Rupert. 1957. A Synopsis of Linguistic Theory 1930-1955 [C]. In Philological Society, Editor, Studies in Linguistic Analysis. Blackwell, Oxford, pages 1-32.Reprinted in Selected Papers of J. R. Firth, edited by F.Palmer. Longman, 1968.
6Harris, Zelig S. Mathematical Structures of Language[M].New York: Wiley, 1965.
7Cutting, D R Karger, D R Perdersen, J R Tukey, J W(1992). Scatter/garther: A Cluster-Based Approach to Browsing Large Document Collections[C]. In SIGIR 92.
8Gao J Wang, H F, M Lee, K F (2003b). A Unifed Approach to Statistical Language Modeling for Chinese [C].ICASSP-2000, Istanbul, Turkey, June.
9Lee Lillian. 2001. Similarity-Based approaches to Natural Language Processing. Ph.D. thesis,[D] Harvard University,Cambridge, MA.
10Karov Yael, Shimon Edelman. Learning Similarity-Based Word Sense Disambiguation From Sparse Data.[C] In Proceedings of the Fourth Workshop on Very Large Corpora,Copenhagen, Denmark, 1999: 42-55.

共引文献6

1钱文珺.基于RDP协议的远程终端学习系统[J].微电子学与计算机,2006,23(11):208-209. 被引量：2
2张体首,蔡明.语义搜索引擎概念模型[J].微电子学与计算机,2007,24(3):171-173. 被引量：10
3张玉花,张娜.面向对象的本体创建方法[J].微电子学与计算机,2008,25(7):65-67. 被引量：2
4王舵,郄君,张娟,李文斌.一种快速词自动聚类算法[J].计算机应用与软件,2010,27(8):276-278. 被引量：3
5王小华,徐宁,谌志群.基于共词分析的文本主题词聚类与主题发现[J].情报科学,2011,29(11):1621-1624. 被引量：34
6高永兵,周环宇,聂知秘,胡文江.PWSWE:个人微博主题词提取算法的研究[J].计算机应用与软件,2015,32(7):86-89. 被引量：1

同被引文献35

1赵艳杰.数据挖掘方法在入侵检测系统中的应用[J].潍坊学院学报,2008,8(2):19-22. 被引量：2
2苏成.基于数据挖掘的入侵检测技术综述[J].信息网络安全,2008(3):60-61. 被引量：2
3刘群,张华平,俞鸿魁,程学旗.基于层叠隐马模型的汉语词法分析[J].计算机研究与发展,2004,41(8):1421-1429. 被引量：198
4Bienman E, CloeteE. A. comparison of intrusion detection systems[J]. Computers & Security, 2006,20 (8):341-343.
5黄伯荣,廖序东.现代汉语[M].3版.北京:高等教育出版社,2002:12.
6Bach N,Badaskar S.A Review of Relation Extraction[D].Pittsburgh,USA:Carnegie Mellon School,2007.
7Banko M,Cafarella M J,Soderland S,et al.Open Information Extraction from the Web[C]//Proceedings of the 20th International Joint Conference on Artifical Intelligence.New York,USA:ACM Press,2007:2670-2676.
8Wu Fei,Weld D S.Open Information Extraction Using Wikipedia[C]//Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics.New York,USA:ACM Press,2010:118-127.
9Fader A,Soderland S,Etzioni O.Identifying Relations for Open Information Extraction[C]//Proceedings of Conference on Empirical Methods in Natural Language Processing.New York,USA:ACM Press,2011:1535-1545.
10Etzioni O,Fader A,Christensen J,et al.Open Information Extraction:The Second Generation[C]//Proceedings of the 22nd International Joint Conference on Artificial Intelligence.Berlin,Germany:Springer,2011:3-10.

引证文献3

1胡军光,刘力,车奇.基于词性的文本挖掘算法在IDS日志中的应用[J].计算机与数字工程,2010,38(2):90-93. 被引量：2
2李明耀,杨静.基于依存分析的开放式中文实体关系抽取方法[J].计算机工程,2016,42(6):201-207. 被引量：28
3李煜甫,黄蔚,胡国超.弱监督军事实体关系识别[J].电子设计工程,2018,26(1):74-78. 被引量：3

二级引证文献33

1陈瑶,吴红,葛卫红,张海霞,廖俊.基于深度学习模型的我国药品不良反应报告实体关系抽取研究[J].中国药科大学学报,2019,50(6):753-759. 被引量：10
2邢毅雪,朱永华,高海燕,周金,张克.基于注意力机制的远程监督实体关系抽取[J].上海大学学报（自然科学版）,2021,27(5):983-992. 被引量：7
3王学锋,杨若鹏,贾明亮.基于循环神经网络的作战文书实体关系抽取[J].智能安全,2022,1(1):29-35.
4朱蔷蔷,张桂芸,刘文龙.基于MapReduce框架一种文本挖掘算法的设计与实现[J].郑州大学学报（工学版）,2012,33(5):110-113. 被引量：4
5蔡洪民,陈铿.校园网舆情监控系统的设计与实现[J].计算机安全,2013(2):51-54. 被引量：3
6王辉,郁波,洪宇,肖仰华.基于知识图谱的Web信息抽取系统[J].计算机工程,2017,34(6):118-124. 被引量：16
7代江波,毛建华,刘学锋,张鸿洋.基于特征向量与SVO扩展的企业生态关系抽取[J].计算机技术与发展,2018,28(10):139-144. 被引量：5
8姚贤明,甘健侯,徐坚.面向中文开放领域的多元实体关系抽取研究[J].智能系统学报,2019,14(3):597-604. 被引量：5
9鄂海红,张文静,肖思琪,程瑞,胡莺夕,周筱松,牛佩晴.深度学习实体关系抽取研究综述[J].软件学报,2019,30(6):1793-1818. 被引量：173
10邓红梅,郑晓坤,杨慕桥,劳才文,陈冬梅.树体结香前后白木香叶精油含量及成分变化分析[J].广东石油化工学院学报,2019,29(4):26-30. 被引量：3

1张素香,文娟,秦颖,袁彩霞,钟义信.实体关系的自动抽取研究[J].哈尔滨工程大学学报,2006,27(B07):370-373. 被引量：10
2刘德连,张建奇.基于3点匹配的图像拼接算法[J].计算机工程,2006,32(13):203-205. 被引量：5
3张煜斌,陆建峰,李文林,陈涤平.基于Meta-Bootstrapping的中医医案结构化研究[J].微电子学与计算机,2009,26(10):111-114. 被引量：4
4祖李军,王卫平.中文网络评论中提取产品特征的研究[J].计算机系统应用,2014,23(5):196-201. 被引量：7
5姜芳,李国和,岳翔,尼彬杉.语义关系自动抽取方法[J].山西大学学报（自然科学版）,2015,38(4):560-566. 被引量：1
6王辉,陈光.基于Bootstrapping的英文产品评论属性词抽取方法[J].山东大学学报（理学版）,2014,49(12):23-29. 被引量：1
7兰红.多阈值优化的交互式医学图像分割方法[J].计算机科学,2013,40(9):296-299. 被引量：6
8张素香,李蕾,谭咏梅.特定领域下关系模板的研究[J].北京邮电大学学报,2006,29(5):79-83. 被引量：3
9兰红,闵乐泉.多阈值优化交互式分割算法及其在医学图像中的应用[J].计算机应用,2013,33(5):1435-1438. 被引量：5
10刘克彬,李芳,刘磊,韩颖.基于核函数中文关系自动抽取系统的实现[J].计算机研究与发展,2007,44(8):1406-1411. 被引量：60

微电子学与计算机

2006年第12期

浏览历史

内容加载中请稍等...

基于Boot Strapping的中文实体关系自动生成被引量：3

参考文献7

二级参考文献15

共引文献6

同被引文献35

引证文献3

二级引证文献33

相关作者

相关机构

相关主题

浏览历史

基于Boot Strapping的中文实体关系自动生成 被引量：3

参考文献7

二级参考文献15

共引文献6

同被引文献35

引证文献3

二级引证文献33

相关作者

相关机构

相关主题

浏览历史

基于Boot Strapping的中文实体关系自动生成被引量：3