摘要
识别谓语动词是理解句子的关键。由于中文谓语动词结构复杂、使用灵活、形式多变,识别谓语动词在中文自然语言处理中是一项具有挑战的任务。本文从信息抽取角度,介绍了与中文谓语动词识别相关的概念,提出了一种针对中文谓语动词标注方法。在此基础上,研究了一种基于Attentional-BiLSTM-CRF神经网络的中文谓语动词识别方法。该方法通过双向递归神经网络获取句子内部的依赖关系,然后用注意力机制建模句子的焦点角色。最后通过条件随机场(Conditional random field,CRF)层返回一条最大化的标注路径。此外,为解决谓语动词输出唯一性的问题,提出了一种基于卷积神经网络的谓语动词唯一性识别模型。通过实验,该算法超出传统的序列标注模型CRF,在本文标注的中文谓语动词数据上到达76.75%的F值。
Recognizing predicate verbs is the key to understanding sentences.Because Chinese predicate verbs are complex in structure,flexible in use,and changeable in form,identifying predicate verbs is a challenging task in Chinese natural language processing.This article introduces the concepts related to the recognition of Chinese predicate verbs from the perspective of information extraction,and proposes a method for marking Chinese predicate verbs.On this basis,a Chinese predicate verb recognition method based on Attentional-BiLSTM-CRF neural network is studied.This method uses the bidirectional recurrent neural network to obtain the dependency relationship within the sentence,and then uses the attention mechanism to model the focus role of the sentence.Finally,a maximized labeling path through the conditional random field(CRF)layer is returned.In addition,in order to solve the problem of the uniqueness of predicate verb output,a unique recognition model of predicate verb based on convolutional neural network is proposed.Through experiments,the algorithm exceeds the traditional sequence labeling model CRF,and reaches an F value of 76.75%on the Chinese predicate verb data labeled in this paper.
作者
李婷
秦永彬
黄瑞章
程欣宇
陈艳平
LI Ting;QIN Yongbin;HUANG Ruizhang;CHENG Xinyu;CHEN Yanping(College of Computer Science and Technology,Guizhou University,Guiyang,550025,China;Laboratory of Data Fusion and Analysis Application(Guizhou University),Guiyang,550025,China;Guizhou Intelligent Human‐Computer Interaction Engineering Technology Research Center,Guiyang,550025,China)
出处
《数据采集与处理》
CSCD
北大核心
2020年第3期582-590,共9页
Journal of Data Acquisition and Processing
基金
国家自然科学基金联合基金重点(U1836205)资助项目
国家自然科学基金重大研究计划(91746116)资助项目
贵州省科技重大专项计划(黔科合重大专项字[2017]3002)资助项目
贵州省科学技术基金重点(黔科合基础[2020]1Z055)资助项目。
关键词
谓语动词识别
神经网络
中文信息抽取
predicate verb recognition
neural networks
Chinese information extraction
作者简介
李婷(1995‐),女,硕士研究生,研究方向:自然语言处理、数据融合分析,E‐mail:krystal951028@163.com。;秦永彬(1980‐),男,博士,教授,研究方向:大数据治理与应用、多源数据融合与应用和企业信息化与电子政务。;黄瑞章(1979‐),女,博士,副教授,研究方向:数据融合分析、文本挖掘、网络挖掘和知识发现。;程欣宇(1978‐),男,硕士,副教授,研究方向:机器学习、机器视觉、软件工程与网络通信。;陈艳平(1980‐),男,博士,副教授,研究方向:数据融合分析、自然语言处理和知识发现。