期刊文献+

中文命名实体识别模型对比分析 被引量:11

Analysis and Comparison of Chinese Named Entity Recognition Model
在线阅读 下载PDF
导出
摘要 为了比较客观了解现有主要中文命名实体识别系统与开源系统的性能,基于字的双向长短时记忆循环神经网络(BiLSTM)接入条件随机场(CRF)的系统,利用微软亚洲研究院的MSRA数据集实现中文命名实体识别模型,然后使用MSRA测试数据对自建模型、哈工大的语言技术平台(LTP)和斯坦福大学CoreNLP自然语言处理工具进行对比测试与分析。实验表明:BiLSTM对地名实体的识别效果最佳,与地名和人名相比机构名识别效果与开源工具保持同等水平。实验在语料规模以及实验设计方面有提升空间。后续将实验模型作为重点,将特定领域实体与序列标注问题相结合进行开展研究。 In order to get a considerable understanding about the existing major Chinese named entity recognition models and the performance of open source systems,adopts char-based Bi-directional Long Short Term Memory with Conditional Random Field which uses the Microsoft Research Lab-Asia's MSRA dataset to implement the Chinese named entity recognition model,and also The MSRA test data is used to compare and test the self-built model,Harbin Institute of Technology's Language Technology Platform(LTP)and Stanford University CoreNLP natural language processing tools.Experiments show that BiLSTM has the best recognition effect on place name,compared with location names,person and organization name are sustaining the same level with the open source tools.The experiment has room for improvement in terms of size of corpus and experimental design.Subsequent focus on the experimental model,combining specific domain entities with sequence labeling issues to conduct research.
作者 祖木然提古丽·库尔班 艾山·吾买尔 Zumurantiguli Kuerban;Aishan Wumaier(School of Information Science and Engineering,Xinjiang University,Urumqi 830046;Xinjiang Laboratory of Multi-Language Information Technology,Urumqi 830046)
出处 《现代计算机》 2019年第14期3-7,共5页 Modern Computer
基金 国家自然科学基金(No.61662077、No.61262060)
关键词 命名实体识别 双向长短时记忆循环神经网络(BiLSTM) 语言技术平台(LTP) CoreNLP Named Entity Recognition BiLSTM(Bi-directional Long Short Term Memory) LTP(Language Technology Platform) CoreNLP
作者简介 祖木然提古丽·库尔班(1992-),女,新疆阿克苏人,硕士,研究方向为自然语言处理与机器翻译;通信作者:艾山·吾买尔(1981-),男,新疆乌鲁木齐人,博士,副教授,研究方向为自然语言处理与机器翻译E-mail: hasan1479@xju.edu.cn.
  • 相关文献

参考文献4

二级参考文献39

  • 1刘群,张华平,俞鸿魁,程学旗.基于层叠隐马模型的汉语词法分析[J].计算机研究与发展,2004,41(8):1421-1429. 被引量:199
  • 2毛婷婷,李丽双,黄德根.基于混合模型的中国人名自动识别[J].中文信息学报,2007,21(2):22-28. 被引量:10
  • 3Wang Houfeng,Shi Wuguang.A simple rule-based approach to organization name recognition in chinese text[A].Proc of 5th CICLing[C].LNCS 3406,Heidelberg,German:Springer-Verlag,2005.769-772.
  • 4Hongkui Yu,Huaping Zhang,Quan Liu.Recognition of Chinese organization name based role tagging[A].Proc of Advances in Computation of Oriental Languages[C].Beijing:Tsinghua University Press,2003.79-87.
  • 5McCallum A,Freitag D,Pereira F.Maximum entropy Markov models for information extraction and segmentation[A].Proc of 17th ICML[C].Stanford,California,USA:Morgan Kaufmann,2000.591-598.
  • 6John Lafferty,Andrew McCallum,Fernando Pereira.Conditional random fields:Probabilistic models for segmenting and labeling sequence data[A].Proc of the 18th ICML[C].San Francisco:Morgan Kaufmann,USA:2001.282-289.
  • 7Andrew McCallum,Wei Li.Early results for named entity recognition with conditional random fields,feature induction and Web-enhanced lexicons[A].Proc of the 7th CoNLL[C].Edmonton,Canada:Morgan Kaufmann,2003.188-191.
  • 8Thorsten Brants.Cascaded Markov models[A].Proc of EACL'99[C].Bergen,Norway:European Chapter of the Association for Computational Linguistics,1999.118-125.
  • 9M Skounakis,M Craven,S Ray.Hierarchical hidden markov models for information extraction[A].Proc of the 18th International Joint Conference on Artificial Intelligence[C].Acapulco,Mexico:Morgan Kaufmann,2003.427-433.
  • 10Eric Brill.Transformation based error driven learning and natural language processing:A case study in part of speech tagging[J].Computational Linguistics,1995,21(4):543-566.

共引文献297

同被引文献162

引证文献11

二级引证文献74

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部