摘要
[目的/意义]随着数据生态环境越来越复杂多变,大量真实而有价值的数据常常隐藏在海量多源异构数据中。如何高效地从多源情报信息中采集与整合数据并有效支撑深入分析,一直是数据科学领域的核心问题,也是情报学领域的重要基础工作。[方法/过程]首先,从多个角度对多源情报信息环境进行解析,分析不同视角下信息环境的主要特征。其次,融合Hook机制进行信息采集方法研究,探索Hook作用于信息传递过程的机理,并与Python采集框架相结合。然后,设计与构建多源情报信息采集框架,确定框架的实施方式与作用范围,并扩展采集框架至知识与智慧层次,以实现价值转换。最后,针对多源情报信息类型应用本框架实施采集。[结果/结论]在面对复杂的多源情报信息时,本研究提出的多源情报信息采集框架能够准确有效地获取数据,并通过元数据整合为后续知识挖掘研究奠定了扎实的数据基础。
[Purpose/significance]With the increasing complexity of data ecological environment,a large number of real and valuable data are often hidden in massive multi-source heterogeneous data.How to efficiently collect and integrate data from multi-source intelligence information and effectively support in-depth analysis has always been a core issue in the field of data science,and an important basic work in the field of information science.[Method/process]Firstly,this paper analyzes the multi-source intelligence information environment from multiple perspectives,and analyzes the main characteristics of the information environment from different perspectives.Secondly,it studies the method of information collection by integrating Hook mechanism,explores the mechanism of Hook in the process of information transmission,and combines with Python collection framework.Then,it designs and constructs the framework of multi-source intelligence information collection,determines the implementation mode and scope of the framework,and expands the collection framework to the level of knowledge and wisdom,so as to realize the value conversion.Finally,the framework is applied to collecting multi-source intelligence information.[Result/conclusion]In the face of complex multi-source intelligence information,the multi-source intelligence information collection framework proposed in this study can accurately and effectively obtain data,and lay a solid data foundation for subsequent knowledge mining research through metadata integration.
作者
靳嘉林
王曰芬
刘城
邹本涛
JIN Jialin;WANG Yuefen;LIU Cheng;ZOU Bentao(Nanjing University of Science and Technology,School of Economic and Management,Nanjing 210094;Management School of Tianjin Normal University,Tianjin 300387)
出处
《科技情报研究》
2022年第1期13-22,共10页
Scientific Information Research
基金
国家社会科学基金会资助重大项目“面向知识创新服务的数据科学理论与方法研究”(编号:16ZDA224)。
关键词
多源情报信息
数据采集
框架设计
Hook机制
multi-source intelligence information
data collection
framework design
Hook mechanism
作者简介
靳嘉林(ORCID:0000-0003-2668-6517),男,1992年生,博士研究生,主要研究方向:知识融合与数据科学,E-mail:jinjialin9219@163.com;王曰芬(ORCID:0000-0002-7143-7766),女,1963年生,博士,教授,博士生导师,主要研究方向:文本挖掘与知识管理、数据科学与知识服务,E-mail:yfwang@njust.edu.cn;刘城,男,1999年生,学士,主要研究方向:数据采集,E-mail:njust_liuc@126.com;邹本涛(ORCID:0000-0002-3972-0705),男,1994年生,博士研究生,主要研究方向:科学合作与数据科学,E-mail:zoubentao@njust.edu.cn。