期刊文献+

一种去注意力机制的动态多层语义感知机 被引量:2

A dynamic multi-layer semantics perceptron without attention mechanism
原文传递
导出
摘要 Transformer在大规模数据集上取得了优异效果,但由于使用多头注意力使得模型过于复杂,且在小规模数据集上效果不理想.对于多头注意力替换的研究在图像处理领域已取得一些成果,但在自然语言处理领域还少有研究.为此,首先提出一种去注意力的多层语义感知机(multi-layer semantics perceptron,MSP)方法,其核心创新是使用token序列转换函数替换编码器中的多头注意力,降低模型复杂度,获得更好的语义表达;然后,提出一种动态深度控制框架(dynamic depth control framework,DDCF),优化模型深度,降低模型复杂度;最后,在MSP方法和DDCF的基础上,提出动态多层语义感知机(dynamic multi-layer semantics perceptron,DMSP)模型,在多种文本数据集上的对比实验结果表明,DMSP既能提升模型分类精度,又能有效降低模型复杂度,与Transformer比较,在模型深度相同的情况下,DMSP模型分类精度大幅提升,同时模型的参数量大幅降低. Transformer has achieved excellent results on large-scale data sets,but it is too complex due to utilizing Multi Head Attention(MHA),and its performance is poor on small-scale data sets.The study on the replacement of MHA is little in the field of natural language processing,although it has made great achievements in the field of image processing.Therefore,a method called multi-layer semantics perceptron(MSP)is proposed.Its major innovation is that instead of MHA,a simple token sequence transformation function is used,thus achieving a better semantic feature representation with lower complexity.Additionally,a dynamic depth control framework(DDCF)is proposed,which is able to optimize the depth of neural networks automatically,as a result the complexity of the model is reduced markedly.Finally,based on the MSP and the DDCF,the dynamic multi-layer semantics perceptron model(DMSP)is proposed.Compared with the Transformer model with same depth,the experimental results on multi-data sets show that the DMSP model achieves better performance significantly,meanwhile,its parameters declines sharply.
作者 刘孝炎 唐焕玲 王育林 窦全胜 鲁明羽 LIU Xiao-yan;TANG Huan-ling;WANG Yu-lin;DOU Quan-sheng;LU Ming-yu(School of Computer Science and Technology,Shandong Technology and Business University,Yantai 264005,China;Co-Innovation Center of Shandong Colleges and Universities:Future Intelligent Computing,Yantai 264005,China;Key Laboratory of Intelligent Information Processing in Universities of Shandong,Shandong Technology and Business University,Yantai 264005,China;Information Science and Technology College,Dalian Maritime University,Dalian 116026,China)
出处 《控制与决策》 EI CSCD 北大核心 2024年第2期588-594,共7页 Control and Decision
基金 国家自然科学基金项目(61976124,61976125,62176140)。
关键词 特征表示 语义感知机 动态深度控制 TRANSFORMER 文本分类 feature representation semantics perceptron dynamic depth control Transformer text categorization
作者简介 刘孝炎(1997−),男,硕士生,从事机器学习、人工智能、数据挖掘等研究,E-mail:lxy15058247683@aliyun.com;通讯作者:唐焕玲(1970−),女,教授,博士,从事机器学习、人工智能、数据挖掘等研究,E-mail:thL01@163.com;王育林(1998−),男,硕士生,从事机器学习、人工智能、数据挖掘等研究,E-mail:ylinwang@yeah.net;窦全胜(1971−),男,教授,博士,从事机器学习、人工智能、演化计算等研究,E-mail:li_dou@163.com;鲁明羽(1963−),男,教授,博士生导师,从事机器学习、人工智能、数据挖掘等研究,E-mail:lumingyu@dlmu.edu.cn.
  • 相关文献

参考文献2

二级参考文献15

  • 1唐焕玲,孙建涛,陆玉昌.文本分类中结合评估函数的TEF-WA权值调整技术[J].计算机研究与发展,2005,42(1):47-53. 被引量:26
  • 2李闯,丁晓青,吴佑寿.一种改进的AdaBoost算法——AD AdaBoost[J].计算机学报,2007,30(1):103-109. 被引量:54
  • 3Freund Y, Schapire R E. A decision-theoretic generalization of on-line learning and an application to boosting [J]. J of Computer and System Sciences, 1997, 55(1): 119-139.
  • 4Schapire R E, Singer Y. BoosTexter: A boosting-based system for text categorization [J]. Machine Learning, 2000, 39(2/3): 135-168.
  • 5Zhou Z H, Jiang Y. NeC4.5: Neural ensemble based C4.5 [J]. IEEE Trans on Knowledge and Data Engineering, 2004, 16(6): 770-773.
  • 6Ting K M, Zheng Z. Improving the performance of boosting for naive Bayesian classification [C]. Proc of the 3rd PAKDD. Berlin: Springer Verlag, 1999: 296-305.
  • 7Sebastiani F. Machine learning in automated text categorization [J]. ACM Computing Surveys, 2002, 34(1): 1-47.
  • 8Dagan I, Engelson S. Committee-based sampling for training probabilistic classifiers [C]. Proc of the 12th Int Conf on Machine Learning. Tahoe, 1995: 150-157.
  • 9Kuncheva L I, Whitaker C J. Measures of diversity in classifier ensembles[J]. Machine Learning 2003, 51(2): 181-207.
  • 10朱靖波,王会珍,张希娟.面向文本分类的混淆类判别技术[J].软件学报,2008,19(3):630-639. 被引量:9

共引文献4

同被引文献22

引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部