期刊文献+

面向学术资源的术语区分能力的测度方法研究 被引量:7

A Study on the Measurement Methods of Term Discriminative Capacity for Academic Resources
在线阅读 下载PDF
导出
摘要 改进索引术语质量的衡量方法可以有效提高IR系统的检索效率,但术语的固有属性易受文档长度影响,难以全面衡量术语质量。对此,本文从术语内在的区分性出发,借鉴词袋模型的基本思想,提出了术语区分能力(term discriminative capacity,TDC)这一理论及3种不同的计算方法。本文还采集了Web of Science的3个子数据库中包含4个著录项的900条记录作为实验数据,来实现TDC的大规模计算,并观察3种算法在实践中的差异。经过实验分析得出,计算术语区分能力的最佳方法为TDC-T,该算法在多个方面表现稳定,且不受DF值的影响,可以作为衡量术语质量的全新指标,记为TDC。但是本研究所选取的A&HCI数据库的记录较少,这或许会造成另两个领域计算结果的失衡。 Improving the quality of indexing terms can effectively improve the retrieval efficiency of the IR system,but the inherent properties of the term are susceptible to the length of the document,making it difficult to fully measure the quality of the term.In this regard,this paper starts from the intrinsic property of the term’s discrimination and proposes the theory of term discriminative capacity(TDC)and three different calculation methods based on the idea of the bag-ofwords model.In this paper,900 records containing 4 entries from three sub-databases of Web of Science were collected as experimental data to realize large-scale calculation of TDC and observe the differences between the three algorithms in practice.Through experimental analysis,the best method for calculating the term discriminative capacity is determined to be TDC-T.Its algorithm is stable in many respects and is not affected by the DF value.Therefore,as a new indicator to measure the quality of the term,it is recorded as TDC.However,the A&HCI database selected in this study has fewer records,which may cause an imbalance in the calculation results of the other two fields.
作者 王昊 唐慧慧 张海潮 张进 张紫玄 Wang Hao;Tang Huihui;Zhang Haichao;Zhang Jin;Zhang Zixuan(School of Information Management,Nanjing University,Nanjing 210023;Jiangsu Key Laboratory of Data Engineering and Knowledge Service,Nanjing 210023;School of Information Studies,University of Wisconsin-Milwaukee,Milwaukee 53201)
出处 《情报学报》 CSSCI CSCD 北大核心 2019年第10期1078-1091,共14页 Journal of the China Society for Scientific and Technical Information
基金 国家自然科学基金青年科学基金项目“面向学术资源的TSD与TDC测度及分析研究”(71503121) “江苏青年社科英才”人才培养项目 “南京大学仲英青年学者”人才培养项目
关键词 索引术语 词袋模型 术语区分能力 术语空间密度 术语质量评价 indexing term bag-of-words model term discriminative capacity term space density term quality evaluation
作者简介 王昊,男,1981年生,博士,博士生导师,主要研究方向为自然语言处理、数据挖掘应用、本体学习等;唐慧慧,女,1995年生,硕士,主要研究方向为自然语言处理等,E-mail:mf1714055@smail.nju.edu.cn;张海潮,女,1995年生,硕士,主要研究方向为自然语言处理等;张进,男,1959年生,博士,博士生导师,主要研究方向为信息检索算法,搜索引擎评估等;张紫玄,女,1994年生,硕士,主要研究方向为自然语言处理等。
  • 相关文献

参考文献1

二级参考文献7

  • 1Law J,Bauin S,Ccurtial J. Policy and the mapping of scientific change:A co-word analysis of research into environmental acidificetion[J].Scientometrics,1988,(3-4):251-264.
  • 2Callon M,Courtlal J P,Laville F. Co-word analysis as a tool for describing the network of interactions between basic and technological research:The case of polymer chemsitry[J].Scientometrics,1991,(01):155-205.
  • 3Reuters T. Social science citation index-information science & library science-journal list[EB/OL].http:∥scientific.thomsonreuters,com/cgi-bin/jrnlst/jlresults.cgi? pc=j&sc=nu,2011.
  • 4Cottrill C A,Rogers E M,Mills T. Co-citation analysis of the scientific literature of innovation research traditions:Diffusion of innovations and technology transfer[J].KNOWLEDGE-CREATION DIFFUSION UTILIZATION,1989,(02):181-208.
  • 5张勤,马费成.国外知识管理研究范式——以共词分析为方法[J].管理科学学报,2007,10(6):65-75. 被引量:494
  • 6邱均平,杨思洛,王明芝,刘敏.改革开放30年来我国情报学研究论文内容分析[J].图书情报知识,2009,26(3):5-17. 被引量:29
  • 7杨颖,崔雷.基于共词分析的学科结构可视化表达方法的探讨[J].现代情报,2011,31(1):91-96. 被引量:84

共引文献632

同被引文献118

引证文献7

二级引证文献19

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部