Tracking and analyzing data from research projects is critical for understanding research trends and supporting the development of science and technology strategies.However,the data from these projects is often comple...Tracking and analyzing data from research projects is critical for understanding research trends and supporting the development of science and technology strategies.However,the data from these projects is often complex and inadequate,making it challenging for researchers to conduct in-depth data mining to improve policies or management.To address this problem,this paper adopts a top-down approach to construct a knowledge graph(KG)for research projects.Firstly,we construct an integrated ontology by referring to the metamodel of various architectures,which is called the meta-model integration conceptual reference model.Subsequently,we use the dependency parsing method to extract knowledge from unstructured textual data and use the entity alignment method based on weakly supervised learning to classify the extracted entities,completing the construction of the KG for the research projects.In addition,a knowledge inference model based on representation learning is employed to achieve knowledge completion and improve the KG.Finally,experiments are conducted on the KG for research projects and the results demonstrate the effectiveness of the proposed method in enriching incomplete data within the KG.展开更多
大语言模型(large language model,LLM)随着不断发展,在开放领域取得了出色的表现.然而,由于缺乏专业知识,LLM在垂直领域问答任务上效果较差.这一问题引发了研究者的广泛关注.现有研究通过“检索-问答”的方式,将领域知识注入大语言模型...大语言模型(large language model,LLM)随着不断发展,在开放领域取得了出色的表现.然而,由于缺乏专业知识,LLM在垂直领域问答任务上效果较差.这一问题引发了研究者的广泛关注.现有研究通过“检索-问答”的方式,将领域知识注入大语言模型,以增强其性能.然而该方式通常会检索到额外的噪声数据而导致LLM的性能损失.为了解决该问题,提出基于知识相关性的知识图谱问答方法.具体而言,将噪声数据与回答问题所需要的知识进行区分,在“检索-相关性评估-问答”的框架下,引导大语言模型选择合理的知识做出正确的回答.此外,提出一个机械领域知识图谱问答的数据集Mecha-QA,包含传统机械制造以及增材制造两个子领域,以推进该领域大语言模型与知识图谱问答相关的研究.为了验证所提方法的有效性,在Mecha-QA和航空航天领域数据集Aero-QA上进行实验.结果表明,该方法可以显著提升大语言模型在垂直领域知识图谱问答的性能.展开更多
针对知识推理模型在捕获实体之间的复杂语义特征方面难以捕捉多层次语义信息,同时未考虑单一路径的可解释性对正确答案的影响权重不同等问题,提出一种融合路径与子图特征的知识图谱(KG)多跳推理模型PSHAM(Hierarchical Attention Model ...针对知识推理模型在捕获实体之间的复杂语义特征方面难以捕捉多层次语义信息,同时未考虑单一路径的可解释性对正确答案的影响权重不同等问题,提出一种融合路径与子图特征的知识图谱(KG)多跳推理模型PSHAM(Hierarchical Attention Model fusing Path-Subgraph features)。PS-HAM将实体邻域信息与连接路径信息进行融合,并针对不同路径探索多粒度的特征。首先,使用路径级特征提取模块提取每个实体对之间的连接路径,并采用分层注意力机制捕获不同粒度的信息,且将这些信息作为路径级的表示;其次,使用子图特征提取模块通过关系图卷积网络(RGCN)聚合实体的邻域信息;最后,使用路径-子图特征融合模块对路径级与子图级特征向量进行融合,以实现融合推理。在两个公开数据集上进行实验的结果表明,PS-HAM在指标平均倒数秩(MRR)和Hit@k(k=1,3,10)上的性能均存在有效提升。对于指标MRR,与MemoryPath模型相比,PS-HAM在FB15k-237和WN18RR数据集上分别提升了1.5和1.2个百分点。同时,对子图跳数进行的参数验证的结果表明,PS-HAM在两个数据集上都在子图跳数在3时推理效果达到最佳。展开更多
基金supported by the National Natural Science Foundation of China(72101263).
文摘Tracking and analyzing data from research projects is critical for understanding research trends and supporting the development of science and technology strategies.However,the data from these projects is often complex and inadequate,making it challenging for researchers to conduct in-depth data mining to improve policies or management.To address this problem,this paper adopts a top-down approach to construct a knowledge graph(KG)for research projects.Firstly,we construct an integrated ontology by referring to the metamodel of various architectures,which is called the meta-model integration conceptual reference model.Subsequently,we use the dependency parsing method to extract knowledge from unstructured textual data and use the entity alignment method based on weakly supervised learning to classify the extracted entities,completing the construction of the KG for the research projects.In addition,a knowledge inference model based on representation learning is employed to achieve knowledge completion and improve the KG.Finally,experiments are conducted on the KG for research projects and the results demonstrate the effectiveness of the proposed method in enriching incomplete data within the KG.
文摘大语言模型(large language model,LLM)随着不断发展,在开放领域取得了出色的表现.然而,由于缺乏专业知识,LLM在垂直领域问答任务上效果较差.这一问题引发了研究者的广泛关注.现有研究通过“检索-问答”的方式,将领域知识注入大语言模型,以增强其性能.然而该方式通常会检索到额外的噪声数据而导致LLM的性能损失.为了解决该问题,提出基于知识相关性的知识图谱问答方法.具体而言,将噪声数据与回答问题所需要的知识进行区分,在“检索-相关性评估-问答”的框架下,引导大语言模型选择合理的知识做出正确的回答.此外,提出一个机械领域知识图谱问答的数据集Mecha-QA,包含传统机械制造以及增材制造两个子领域,以推进该领域大语言模型与知识图谱问答相关的研究.为了验证所提方法的有效性,在Mecha-QA和航空航天领域数据集Aero-QA上进行实验.结果表明,该方法可以显著提升大语言模型在垂直领域知识图谱问答的性能.