摘要
【目的】深入挖掘文献特征间的深层次关联关系,提升学术文献作者姓名消歧的效果。【方法】结合机构名称规范库、学科分类体系、主题词表等规范知识库的先验知识,设计一种基于知识增强的特征提取框架,在规范数据的基础上,通过异质信息网络嵌入融合文献特征的语义信息和关系信息,生成高质量的文献向量表示,使用层次凝聚算法进行聚类。【结果】在本文构建的测试集中,模型F1值达到89.07%。【局限】知识库的质量和规模限制了模型在新兴及细分领域的准确性和泛化能力。【结论】本文方法结合专家先验知识和深度学习的强大学习能力,为学术文献中的姓名消歧任务提供了有效路径。
[Objective]This study explores the deep relationships among document features and enhances the effectiveness of author name disambiguation in academic literature.[Methods]We designed a knowledgeenhanced feature extraction framework incorporating prior knowledge from standardized knowledge bases such as institutional name authority files,disciplinary classification systems,and thesauri.Based on the standardized data,the framework integrates semantic and relational information of document features through heterogeneous information network embedding to generate high-quality document vector representations.Finally,we used hierarchical agglomerative clustering for clustering.[Results]Our model’s F1 score reached 89.07%on the constructed test dataset.[Limitations]The quality and scale of the knowledge base limit the model’s accuracy and generalizability in emerging and subdivided fields.[Conclusions]This proposed method combines expert prior knowledge with powerful learning capabilities of deep learning,providing an effective approach for author name disambiguation tasks in academic literature.
作者
董文佳
孙坦
赵瑞雪
马玮璐
熊赫
鲜国建
Dong Wenjia;Sun Tan;Zhao Ruixue;Ma Weilu;Xiong He;Xian Guojian(Agricultural Information Institute,Chinese Academy of Agricultural Sciences,Beijing 100081,China;Key Laboratory of Agricultural Big Data,Ministry of Agriculture and Rural Affairs,Beijing 100081,China;Chinese Academy of Agricultural Sciences,Beijing 100081,China;Key Laboratory of Knowledge Mining and Knowledge Services in Agricultural Converging Publishing,Beijing 100081,China)
出处
《数据分析与知识发现》
北大核心
2025年第1期133-144,共12页
Data Analysis and Knowledge Discovery
基金
国家社会科学基金项目(项目编号:20BTQ014)的研究成果之一。
关键词
姓名消歧
规范知识库
特征提取
深度学习
异质信息网络
Name Disambiguation
Canonical Knowledge Base
Feature Representation
Deep Learning
Heterogeneous Information Network
作者简介
通讯作者:鲜国建,ORCID:0000-0003-4332-1958,E-mail:xianguojian@caas.cn。