摘要
由于缺乏大量已标注数据,在中文医疗命名实体识别中,主要利用外部资源来改善医疗实体识别的性能,这需要大量的时间和有效的规则加入外部资源.为了解决标注数据不足的问题,提出了一种基于生成对抗网络的数据增强算法,自动生成大量标注数据,提高医疗实体识别的性能.实验结果表明,该算法在性能方面优于实验中的基准模型,证明了该算法在医疗实体识别上的有效性.
Chinese clinical named entity recognition plays an important role in recognizing medical entities contained in Chinese electronic medical records.Limited to lack of large annotated data,most of existing methods concentrate on employing external resources to improve the performance of clinical named entity recognition,which require lots of time and efficient rules.To solve the problem of lack of large annotated data,data augmentation using sequence adversarial generative network is used to generate more various data depending on entities and non-entities in the training set.Experiments show that when using generated data to expand training set,the proposed named entity recognition system has achieved competitive performance compared with state-of-art methods,which shows the effectiveness of our data augmentation method.
作者
王蓬辉
李明正
李思
WANG Peng-hui;LI Ming-zheng;LI Si(School of Artificial Intelligence,Beijing University of Posts and Telecommunications,Beijing 100876,China)
出处
《北京邮电大学学报》
EI
CAS
CSCD
北大核心
2020年第5期84-90,共7页
Journal of Beijing University of Posts and Telecommunications
基金
国家自然科学基金项目(61702047)
关键词
命名实体识别
数据增强
序列生成对抗网络
named entity recognition
data augmentation
generative adversarial network
作者简介
王蓬辉(1996—),男,硕士生;通信作者:李思(1985—),女,副教授,E-mail:lisi@bupt.edu.cn.