期刊文献+

基于预训练语言模型的BERT-CNN多层级专利分类研究 被引量:25

BERT-CNN: A Hierarchical Patent Classifier Based on Pre-trained Language Model
在线阅读 下载PDF
导出
摘要 专利文献的自动分类对于知识产权保护、专利管理和专利信息检索十分重要,构建准确的专利自动分类器可以为专利发明人、专利审查员提供辅助支持。该文以专利文献分类为研究任务,选取国家信息中心公布的全国专利申请信息为实验数据,提出了基于预训练语言模型的BERT-CNN多层级专利分类模型。实验结果表明:在该数据集上,BERT-CNN模型在准确率上达到了84.3%,大幅度领先于卷积神经网络和循环神经网络等其他深度学习算法。BERT抽取的特征向量在表达词汇与语义方面比传统Word2Vec具有更加强大的性能。另外,该文还探讨了全局与局部策略在专利多层文本分类上的差异。 An accurate automatic patent classifier is crucial to patent inventors and patent examiners,and is of potential application in the fields of intellectual property protection,patent management,and patent information retrieval.This paper presents BERT-CNN,a hierarchical patent classifier based on pre-trained language model,which is trained by the national patent application documents collected from the State Information Center,China.The experimental results show that the proposed method achieves 84.3%accuracy,much better than the two compared baseline methods,Convolutional Neural Networks and Recurrent Neural Networks.In addition,this article also discusses the differences between hierarchical and flat strategies in multi-layer text classification.
作者 陆晓蕾 倪斌 LU Xiaolei;NI Bin(School of Foreign Languages and Cultures,Xiamen University,Xiamen,Fujian 361005,China;Xiamen Data Intelligence Academy of ICT,CAS,Xiamen,Fujian 361005,China)
出处 《中文信息学报》 CSCD 北大核心 2021年第11期70-79,共10页 Journal of Chinese Information Processing
基金 教育部人文社科基金(18YJCZH117) 中央高校基本科研项目(20720191053)
关键词 专利 文本分类 BERT patent text classification BERT
作者简介 陆晓蕾(1988-),博士,助理教授,主要研究领域为语言智能。E-mail:luxiaolei@xmu.edu.cn;通信作者:倪斌(1990-),硕士,工程师,主要研究领域为自然语言处理。E-mail:nibiner@live.cn
  • 相关文献

参考文献5

二级参考文献66

  • 1袁时金,李荣陆,周水庚,胡运发.层次化中文文档分类[J].通信学报,2004,25(11):55-63. 被引量:6
  • 2凌云,刘军,王勋.多层次web文本分类[J].情报学报,2005,24(6):684-689. 被引量:12
  • 3谭金波.一种改进的文档层次分类方法[J].现代图书情报技术,2007(2):56-59. 被引量:3
  • 4Silla C N, Freitas A A. A survey of hierarchical classification across different application domains. Data Mining and Knowledge Discovery, 2010, 22(1-2): 31-72.
  • 5Guan Hu, Zhou Jing-Yu, Guo Min-Yi. A class-feature-cen- troid classifier for text categorization//Proceedings of the 18th international conference on World Wide Web. Madrid, Spain, 2009:201-210.
  • 6Wang Xiao-Lin, Zhao Hai, Lu Bao-Liang. Enhance K Nea- rest neighbor algorithm for large-scale multi-labeled hierar- chical classification//Proceedings of the 2011 European Con- ference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Athens, Greece, 2011: 58-66.
  • 7Zhang Cong-Le, Xue Gui-Rong, YongZu et al. Web-scale classification with Naive Bayes//Proceedings of the 18th In- ternational Conference on World Wide Web. Madrid, Spain, 2009 : 1083-1084.
  • 8Labrou Y, Finin T W. Yahoo! as an ontology: Using Yahoo! Categories to describe documents//Proceedings of the 8th International Conference on Information and Knowl- edge Management. Kansas City, USA, 1999: 180-187.
  • 9Christophe Brouard. ECHO at the LSHTC pascal challenge 2//Proceedings of the 2011 European Conference on Machine Learning and Principles and Practice of Knowledge Diseovery in Databases. Athens, Greece, 2011:49-57.
  • 10Madani O, Huang Jian. Large-scale many-class prediction via flat teehniques//Proeeedings of the Large-Seale Hierar- ehieal Classification Workshop in the 32nd European Confer- ence on Information Retrial. Milton Keynes, UK, 2010:1-6.

共引文献157

同被引文献259

引证文献25

二级引证文献23

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部