期刊文献+

DeepSeek赋能领域知识图谱低成本构建研究

Research on DeepSeek-Empowered Low-Cost Construction of Domain-Specific Knowledge Graphs
在线阅读 下载PDF
导出
摘要 [目的/意义]在以DeepSeek为代表的开源大语言模型驱动知识工程范式变革的背景下,本研究针对传统领域知识图谱构建中存在的专家规则依赖度高、人工标注成本大、多源数据处理效率低等瓶颈问题,提出基于DeepSeek的领域知识图谱低成本构建方法。[方法/过程]通过构建本体建模、数据融合、智能抽取的方法框架,基于领域认知特征设计本体模型,构建多源异构数据融合方法实现数据结构统一表征,创新性地将DeepSeek与知识抽取相结合,提出语义理解增强、提示工程的领域知识抽取技术体系。[结果/结论]以生猪全产业链领域知识图谱构建为实证对象,定义产业链结构、21类核心实体及其属性关系,实现面向智慧养殖的生猪产业知识建模。实验表明,DeepSeek-R1在零样本学习条件下,对生猪疫病防治场景的实体识别F1值达0.92。本研究为领域知识图谱构建提供了“机器初筛——人工精校”协同范式,验证了大语言模型在垂直领域的知识抽取潜力,对推动DeepSeek赋能知识图谱低成本构建具有研究价值与实践参考。 [Purpose/Significance]In recent years,large language models(LLMs)have achieved revolutionary breakthroughs in semantic understanding and generation capabilities through massive text pre-training.This has injected brand-new impetus into the field of knowledge engineering.As a structured knowledge carrier,the knowledge graph has unique advantages in integrating heterogeneous data from multiple sources and constructing an industrial knowledge system.In the context of a paradigm shift in the field of knowledge engineering driven by the emergence of open-source LLMs such as DeepSeek,this study proposes a cost-effective method for constructing domain knowledge graphs based on DeepSeek.We aim to address the limitations of traditional domain knowledge graphs,such as high dependence on expert rules,the high cost of manual annotation,and inefficient processing of multisource data.[Method/Process]We proposed the semantic understanding-enhanced,cue-engineered domain knowledge extraction technology system,constructed on the methodological framework of manually constructing ontology modelling.In order to process the acquired data,the ETL\MinerU and other tools were used,and the DeepSeek-R1application programming interface was then invoked for intelligent extraction.The ontology model was designed based on domain cognitive features and the multi-source heterogeneous data fusion method was used to achieve the unified characterization of the data structure.Furthermore,the DeepSeek and knowledge extraction were combined.Our system provides a cost-effective reusable technical paradigm for constructing domain knowledge graphs,as well as efficient knowledge extraction,leveraging the advanced powerful textual reasoning ability of the DeepSeek model.[Results/Conclusions]In this study,we take the construction of a domain knowledge map of the entire pig industrial chain as an empirical object.We define the structure of the industrial chain,identify 21 types of core entities and describe their attribute relationships.We achieve the knowledge modelling of the pig industry with a focus on smart farming.The methodology developed in this research was also employed to process and extract knowledge from online and offline resource data.Preliminary experiments demonstrate that DeepSeek-R1 exhibits an F1 value of 0.92 when recognizing the attributes of 161 diseases and 11 types of entities in pig disease control scenarios under zero-sample learning conditions.These experiments also ascertain the reusability of the methodology for other links in the chain.Concurrently,the constructed knowledge map of the entire industrial chain of pigs will be utilized for the design and validation of intelligent application scenarios,with the objective of promoting the intelligent information processing in the pig industry.This study proposes a synergistic paradigm for constructing domain knowledge graphs using DeepSeek,a method that combines deep learning with manual calibration for efficient knowledge extraction and ensure accuracy.This approach ensures the efficiency of knowledge extraction and verifies the knowledge extraction potential of LLMs in vertical domains.The study's findings contribute to the extant literature and offer a practical reference for the promotion of DeepSeek-enabled cost-effective construction of knowledge graphs.
作者 史忠艳 雷洁 孙坦 赵瑞雪 李娇 黄永文 鲜国建 SHI Zhongyan;LEI Jie;SUN Tan;ZHAO Ruixue;LI Jiao;HUANG Yongwen;XIAN Guojian(Agricultural Information Institute,Chinese Academy of Agricultural Sciences,Beijing 100081;Key Laboratory of Agricultural Big Data,Ministry of Agriculture and Rural Affairs,Beijing 100081;Key Laboratory of Knowledge Mining and Knowledge Services in Agricultural Converging Publishing,National Press and Publication Administration,Beijing 100081)
出处 《农业图书情报学报》 2025年第3期4-17,共14页 Journal of Library and Information Science in Agriculture
基金 国家社会科学基金一般项目“多模态科技资源的语义组织与关联发现服务研究”(22BTQ079) 中国科协青年人才托举工程项目“面向科研论文的科学论证语义识别与解析研究”(2022QNRC001)。
关键词 DeepSeek 知识抽取 知识图谱 零样本 知识底座 生猪 全产业链 DeepSeek knowledge extraction knowledge graph zero-shot learning knowledge foundation swine whole industry chain
作者简介 史忠艳,硕士研究生,研究方向为知识图谱;雷洁,博士,助理研究员,研究方向为信息资源管理、知识组织;孙坦,博士,研究馆员(二级),研究方向为数字信息描述与组织;赵瑞雪,博士,研究员,研究方向为农业信息管理系统;李娇,博士,副研究员,研究方向为知识图谱与知识服务;黄永文,博士,研究员,研究方向为知识组织与知识服务;通信作者:鲜国建,博士,研究员,研究方向为大数据融汇治理与知识图谱。Email:xianguojian@caas.cn。
  • 相关文献

参考文献13

二级参考文献264

共引文献1648

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部