摘要
方面提取是情感分析中的关键步骤,随着互联网的快速发展,短文本数据迅猛增加,对短文本数据加以整理和利用极为重要。本文针对短文本的特殊性,提出了短文本模型WESM。与现有模型不同的是,本文引入了词汇共现网络,丰富了词汇的上下文信息,针对中文数据,引入了cw2vec模型,能够充分利用中文词语的语义信息;为了提高短文本的上下文语义缺失,引入了自注意力机制,能够丰富模型的上下文语义信息,提高方面词汇权重,在词汇聚类过程中,降低了非方面词汇的影响。相较于传统方面提取算法性能有着显著的提升。
Aspect extraction is a key step in sentiment analysis tasks.With the rapid development of the Internet,the data of short has increased rapidly,and it is important to organize and make use of those.The main work of this paper is as follows:For the particularity of short text,this paper proposes a short text model WESM.Different from the existing models,this paper introduces a vocabulary co-occurrence network to enrich the context information of the vocabulary.As for Chinese data,the cw2vec model has been introduced,which will make full use of the context semantic information;in order to improve the lack of contextual semantics of short texts,this paper introduces a self-attention mechanism,which can enrich the contextual semantic information of the model and increase the weight of the terms.In the process of clustering,the influence of non-aspect words is reduced.Compared with the traditional extraction algorithm,the performance has been significantly improved.
作者
吴杭鑫
张云华
WU Hangxin;ZHANG Yunhua(School of Information,Zhejiang Sci-Tech University,Hangzhou 310018,China)
出处
《智能计算机与应用》
2021年第4期25-29,共5页
Intelligent Computer and Applications
关键词
方面提取
词嵌入
自注意力机制
Aspect extraction
Word embedding
Self-attention mechanism
作者简介
吴杭鑫(1994-),男,硕士研究生,主要研究方向:智能信息处理;张云华(1965-),男,博士,教授,硕士生导师,主要研究方向:软件架构、软件工厂、智能信息处理