期刊文献+

基于深度学习的中文短语复述抽取技术研究 被引量:1

Deep Learning Based Chinese Phrasal Paraphrase Extraction
在线阅读 下载PDF
导出
摘要 复述抽取是自然语言处理任务中的一个重要分支,高质量的复述资源对于提升信息检索、问答系统、机器翻译等任务的效果有很大帮助。该文将任务限定在中文短语复述抽取,提出了基于2BiLSTM+CNN+CRF的序列标注模型,用于单语中文语料短语划分,通过若干过滤规则获取优质中文短语。之后又提出了基于表示学习的候选复述获取方法,通过BattRAE模型获取中文短语向量表示,并使用余弦相似度计算短语间的语义距离。该文根据语义距离对短语对进行过滤,将语义距离相近的短语视作候选的复述短语,再通过规则过滤掉错误的候选复述。在最后的结果中,随机抽取出500条短语复述资源进行人工评价,达到了0.814的精确度以及0.826的MRR值。 High-quality paraphrase resources are of great help to improve the task of question answering system,machine translation and many other tasks.This paper is focused on paraphrase extraction of Chinese phrases,and proposes a sequence annotation model based on 2 BiLSTM+CNN+CRF for phrase division in monolingual Chinese corpus.High-quality Chinese phrases are obtained through several filtering rules.After that,we adopt a method based on representation learning to obtain candidate paraphrase,in which Chinese phrase vector representation is learned through BattRAE model.In this paper,we extract candidate paraphrases based on the cosine similarity and filter them by rules.In the experiment,500 phrasal paraphrases are randomly selected for manual evaluation,revealing an accuracy of 0.814 and a MRR of 0.826.
作者 颜欣 张宇 潘晓彤 刘作鹏 刘挺 YAN Xin;ZHANG Yu;PAN Xiaotong;LIU Zuopeng;LIU Ting(Research Center for Social Computing and Information Retrieval,Harbin Institute of Technology,Harbin,Heilongjiang 150001,China;Xiaomi AI,Beijing Pinecone Electronics Co.Ltd.,Beijing 100085,China)
出处 《中文信息学报》 CSCD 北大核心 2021年第2期61-68,77,共9页 Journal of Chinese Information Processing
基金 国家自然科学基金(61976068) 科技部重点研发计划(2019YFF0303003)。
关键词 复述抽取 短语划分 表示学习 paraphrase extraction phrase division representation learning
作者简介 颜欣(1993-),硕士研究生,主要研究领域为复述抽取与生成。E-mail:xyan@ir.hit.edu.cn;通信作者:张宇(1972-),博士,教授,主要研究领域为自然语言处理、问答系统、个性化信息检索。E-mail:zhangyu@ir.hit.edu.cn;潘晓彤(1984-),学士,高级工程师,主要研究领域为自然语言处理、对话系统。E-mail:panxiaotong@xiaomi.com。
  • 相关文献

参考文献3

二级参考文献19

  • 1周强.汉语句法树库标注体系[J].中文信息学报,2004,18(4):1-8. 被引量:91
  • 2刘挺,李维刚,张宇,李生.复述技术研究综述[J].中文信息学报,2006,20(4):25-32. 被引量:13
  • 3徐中一,胡谦,刘磊.基于CRF的中文组块分析[J].吉林大学学报(理学版),2007,45(3):416-420. 被引量:7
  • 4Du Jinhua, Jiang Jie, Way A. Facilitating translationusing source language paraphrase lattices // Procee- dings of the 2010 Conference on Empirical Mthods in Natural Language Processing. Massachusetts: Association for Computational Linguistics, 2010: 420-429.
  • 5Callison-Burch C, Koehn P, Osborne M. Improved statistical machine translation using paraphrases // Proceedings of the Main Conference on Human Language Technology Conference of the North American Chapter of the Association of Computa- tional Linguistics. New York, 2006:17-24.
  • 6Madnani N, Dorr B J. Generating phrasal and sentential paraphrases: a survey of data-driven methods. Computational Linguistics, 2010, 36(3): 341-387.
  • 7Wu Hua, Zhou Ming. Synonymous collocation extraction using translation information // Procee- dings of the 41st Annual Meeting on Association for Computational Linguistics-Volume 1. Sapporo, 2003: 120-127.
  • 8Och F J. Minimum error rate training in statistical machine translation//Proceedings of the 41st Annual Meeting on Association for Computational Linguistics- Volume 1. Sapporo, 2003:160-167.
  • 9Koehn P, Och F J, Marcu D. Statistical phrase-based translation // Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-Volume 1. Atlanta, 2003:48-54.
  • 10Papineni K, Roukos S, Ward T, et al. BLEU: a method for automatic evaluation of machine translation // Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. Philadelphia, 2002: 311-318.

共引文献19

同被引文献13

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部