摘要
在医疗领域中,实体识别能够从大规模电子病历文本中提取有价值信息,由于缺乏定位实体边界的特征以及存在语义信息提取不完整等问题,中文的命名实体识别(NER)实现更加困难。该文提出一种针对中文电子病历的结合多特征嵌入和多网络融合的模型(MFE-MNF)。该模型嵌入多粒度特征,即字符、单词、部首和外部知识,扩展字符的特征表示,明确实体边界。将特征向量分别输入到双向长短期记忆神经网络(BiLSTM)和该文构建的自适应图卷积网络等双通路中,全面深入地捕获上下文语义信息和全局语义信息,缓解语义信息提取不完整问题。在CCKS2019和CCKS2020数据集上进行实验验证,结果表明,相比于传统实体识别模型,该文模型能够准确且有效地提取实体。
In the medical field,entity recognition can extract valuable information from the text of large-scale electronic medical records.Due to the lack of features for locating entity boundaries and incomplete semantic information extraction,the implementation of Chinese Named Entity Recognition(NER)is more difficult.In this paper,a model combining Multi-Feature Embedding and Multi-Net-work Fusion model(MFE-MNF)is proposed.The model embeds multi-granularity features,i.e.characters,words,radicals and external knowledge,extends the feature representation of characters and defines the entity boundary.The feature vectors are input respectively into the two paths of Bi-directional Long Short-Term Memory(BiLSTM)and adaptive graph convolution network to capture comprehensively and deeply the context semantic information and global semantic information,and alleviate the problem of incomplete semantic information extraction.The experimental results on CCKS2019 and CCKS2020 datasets show that compared with the traditional entity recognition model,the proposed model can extract entities accurately and effectively.
作者
雷松泽
刘博
王瑜菲
单奥奎
LEI Songze;LIU Bo;WANG Yufei;SHAN Aokui(School of Computer Science and Engineering,Xi’an Technological University,Xi’an 710021,China)
出处
《电子与信息学报》
EI
CSCD
北大核心
2023年第8期3032-3039,共8页
Journal of Electronics & Information Technology
基金
新型网络与检测控制国家地方联合工程实验室基金(GSYSJ2016008)。
关键词
命名实体识别
多特征嵌入
多网络融合
自适应图卷积网络
Named Entity Recognition(NER)
Multi-feature embedding
Multi-network fusion
Adaptive graph convolutional network
作者简介
雷松泽,男,博士,副教授,研究方向为深度学习、模式识别等;通信作者:刘博,女,硕士生,研究方向为深度学习等.liubo0909888@163.com;王瑜菲,女,硕士生,研究方向为深度学习等;单奥奎,男,硕士生,研究方向为深度学习等.