期刊文献+

基于HowNet义原和Word2vec词向量表示的多特征融合消歧方法 被引量:7

Disambiguation method of multi-feature fusion based on HowNet sememe and Word2vec word embedding representation
在线阅读 下载PDF
导出
摘要 针对目前词向量表示低频词质量差,表示的语义信息容易混淆,以及现有的消歧模型对多义词不能准确区分等问题,提出一种基于词向量融合表示的多特征融合消歧方法。该方法将使用知网(HowNet)义原表示的词向量与Word2vec生成的词向量进行融合来补全词的多义信息以及提高低频词的表示质量。首先计算待消歧实体与候选实体的余弦相似度来获得二者的相似度;其次使用聚类算法和知网知识库来获取实体类别特征相似度;然后利用改进的潜在狄利克雷分布(LDA)主题模型来抽取主题关键词以计算实体主题特征相似度,最后通过加权融合以上三类特征相似度实现多义词词义消歧。在西藏畜牧业领域测试集上进行的实验结果表明,所提方法的准确率(90.1%)比典型的图模型消歧方法提高了7.6个百分点。 Aiming at the problems that the low-frequency words expressed by the existing word vectors are of poor quality,the semantic information expressed by them is easy to be confused,and the existing disambiguation models cannot distinguish polysemous words accurately,a multi-feature fusion disambiguation method based on word vector fusion was proposed.In the method,the word vectors expressed by HowNet sememes and the word vectors generated by Word2vec(Word to vector)were fused to complement the polysemous information of words and improve the expression quality of lowfrequency words.Firstly,the cosine similarity between the entity to be disambiguated and the candidate entity was calculated to obtain the similarity between them.After that,the clustering algorithm and HowNet knowledge base were used to obtain entity category feature similarity.Then,the improved Latent Dirichlet Allocation(LDA)topic model was used to extract the topic keywords to calculate the similarity of entity topic feature similarity.Finally,the word sense disambiguation of polysemous words was realized by weighted fusion of the above three types of feature similarities.Experimental results conducted on the test set of the Tibet animal husbandry field show that the accuracy of the proposed method(90.1%)is 7.6 percentage points higher than that of typical graph model disambiguation method.
作者 王伟 赵尔平 崔志远 孙浩 WANG Wei;ZHAO Erping;CUI Zhiyuan;SUN Hao(College of Information Engineering,Xizang Minzu University,Xianyang Shaanxi 712082,China)
出处 《计算机应用》 CSCD 北大核心 2021年第8期2193-2198,共6页 journal of Computer Applications
基金 国家自然科学基金资助项目(61762082) 西藏自治区科技计划项目(XZ202001ZY0055G)。
关键词 消歧 义原 词向量融合 特征融合 多义词 disambiguation sememe word vector fusion feature fusion polysemy
作者简介 王伟(1996-),男,江苏扬州人,硕士研究生,CCF会员,主要研究方向:自然语言处理、知识图谱;通信作者:赵尔平(1976-),男,陕西彬县人,副教授,硕士,CCF会员,主要研究方向:大数据、知识图谱,电子邮箱xdzep@163.com;崔志远(1997-),男,山东潍坊人,硕士研究生,CCF会员,主要研究方向:自然语言处理、知识图谱;孙浩(1995-),男,江苏徐州人,硕士研究生,CCF会员,主要研究方向:自然语言处理、知识图谱。
  • 相关文献

参考文献17

二级参考文献69

  • 1余传明,钟韵辞,林奥琛,安璐.基于网络表示学习的作者重名消歧研究[J].数据分析与知识发现,2020,4(2):48-59. 被引量:11
  • 2http://www.wikipedia.org/.
  • 3S Auer, C Bizer,G Kobilarov, et al. Dbpedia: A Nu-cleus for Web of Open Data [ C]//Proceedings ofISWC, 2007:11-15.
  • 4Marius Pasca. Outclassing Wikipedia in Open-domainInformation Extraction: Weakly-supervised Acquisi-tion of Attributes over Conceptual Hierarchies [C]//Proceedings of the 12th Conference of the EuropeanChapter of the ACL, 2009: 639-647.
  • 5Simone Palo Ponzetto,Michael Strube. KnowledgeDerived from Wikipedia for Computing Semantic Relat-edness[J]. Journal of Artificial Inteeligence Research,2007: 181-212.
  • 6Angela Fogarolli. Word Sense Disambiguation basedon Wikipedia link structure [C]//Proceedings of Inter-national Conference on Semantic Computing, 2009 : 77-82.
  • 7P McNamee, H Simpson, H T Dang. Overview of theTAG 2009 Knowledge Base Population Track [C]//Proceedings of TAC,2009.
  • 8X Han, J Zhao. Named Entity Disambiguation by Le-veraging Wikipedia Semantic Knowledge [ C]//Pro-ceedings of CIKM,2009 : 215-224.
  • 9E Gabrilovich, S Markovitch. Computing SemanticRelatedness using Wikipedia-based Explicit SemanticAnalysis [ C]//Proceedings of IJCAI, 2007 : 1606-1611.
  • 10David Milne, Ian H Witten. Learning to link withWikipedia [C]//Proceedings of CIKM 2008. USA:ACM, 2008:509-518.

共引文献122

同被引文献75

引证文献7

二级引证文献15

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部