摘要
地理知识图谱作为一种科学领域的知识图谱,从概念探讨和初步实验阶段快速发展为地理信息科学领域的跨学科研究热点。地理命名实体识别是地理知识图谱构建的基础,直接影响着地理知识图谱的构建效率与质量。设计了一个地理知识图谱应用管理系统,针对其中地理实体库构建过程依赖人工制定规则以及信息提取不充分等问题,面向地理知识图谱构建过程进行地理命名实体识别研究。首先,通过人工标注方法构建了一个地理知识语料库;其次,通过BERT预训练模型得到结合语境信息的动态字向量,利用双向门控循环单元提取全局语义特征,并基于注意力机制获得增强语义特征;最后,通过CRF解码输出概率最大的全局最优标签序列,实现地理命名实体的自动识别。实验结果表明,相比传统的BiLSTM-CRF、BERT-BiLSTM-CRF等模型,所提出的基于BERT-BiGRU-CRF与多头注意力机制的模型在地理命名实体识别任务中表现更优,能够为地理知识图谱构建提供有效支撑。
As a knowledge graph in the scientific field,the geographical knowledge graph has rapidly developed into a multi-discipline hotspot in the field of geographic information science from the concept discussion and preliminary experimental stage.Geographical named-entity recognition is the basis of the construction of geographical knowledge graph,which directly affects the efficiency and quality of the construction of geographical knowledge graph.This paper designs an application management system of geographical knowledge graph.Aiming at the problems that the construction process of geographical entity database depends on manual rule making and information extraction is insufficient,the research on geographical named-entity recognition is conducted for the construction process of geographical knowledge graph.Firstly,a geographic knowledge corpus was constructed using manual annotation methods;Secondly,the BERT pre-trained model is used to obtain dynamic word vectors that combine contextual information,and global semantic features are extracted using bidirectional gated loop units,and enhanced semantic features are obtained based on attention mechanism;Finally,the globally optimal label sequence with the highest probability is output through CRF decoding to achieve automatic recognition of geographic named-entities.The experimental results show that,compared with the traditional BiLSTM-CRF model and BERT-BiLSTM-CRF model,the model based on BERT-BiGRU-CRF and multi-head attention mechanism proposed in this study performs better in the task of geographical named-entity recognition,and can provide effective support for the construction of geographical knowledge graph.
作者
徐道柱
金澄
马超
焦洋洋
许剑
Xu Daozhu;Jin Cheng;Ma Chao;Jiao Yangyang;Xu Jian(State Key Laboratory of Geographic Information Engineering,Xi'an 710054,China;Xi'an Institute of Surveying and Mapping,Xi'an 710054,China)
出处
《网络安全与数据治理》
2023年第S01期169-173,共5页
CYBER SECURITY AND DATA GOVERNANCE
关键词
地理知识图谱
命名实体识别
BERT预训练模型
多头注意力机制
geography knowledge graph
named-entity recognition
BERT pre-trained model
multi-head attention mechanism
作者简介
通信作者:徐道柱(1982-),男,博士,副研究员,主要研究方向:地理信息处理与应用;金澄(1976-),男,博士,正高级工程师,主要研究方向:地图学与地理信息工程;马超(1988-),男,博士,助理研究员,主要研究方向:数字地图制图。