With the rising and spreading of micro-blog, the sentiment classification of short texts has become a research hotspot. Some methods have been developed in the past decade. However, since the Chinese and English are d...With the rising and spreading of micro-blog, the sentiment classification of short texts has become a research hotspot. Some methods have been developed in the past decade. However, since the Chinese and English are different in language syntax, semantics and pragmatics, sentiment classification methods that are effective for English twitter may fail on Chinese micro-blog. In addition, the colloquialism and conciseness of short Chinese texts introduces additional challenges to sentiment classification. In this work, a novel hybrid learning model was proposed for sentiment classification of Chinese micro-blogs, which included two stages. In the first stage, emotional scores were calculated over the whole dataset by utilizing an improved Chinese-oriented sentiment dictionary classification method. Data with extremely high or low scores were directly labeled. In the second stage, the remaining data were labeled by using an integrated classification method based on sentiment dictionary, support vector machine(SVM) and k-nearest neighbor(KNN). An improved feature selection method was adopted to enhance the discriminative power of the selected features. The two-stage hybrid framework made the proposed method effective for sentiment classification of Chinese micro-blogs. Experiments on the COAE2014(Chinese Opinion Analysis Evaluation 2014) dataset show that the proposed method outperforms other schemes.展开更多
Sentiment analysis is the computational study of how opinions, attitudes, emotions, and perspectives are expressed in language, and has been the important task of natural language processing. Sentiment analysis is hig...Sentiment analysis is the computational study of how opinions, attitudes, emotions, and perspectives are expressed in language, and has been the important task of natural language processing. Sentiment analysis is highly valuable for both research and practical applications. The focuses were put on the difficulties in the construction of sentiment classifiers which normally need tremendous labeled domain training data, and a novel unsupervised framework was proposed to make use of the Chinese idiom resources to develop a general sentiment classifier. Furthermore, the domain adaption of general sentiment classifier was improved by taking the general classifier as the base of a self-training procedure to get a domain self-training sentiment classifier. To validate the effect of the unsupervised framework, several experiments were carried out on publicly available Chinese online reviews dataset. The experiments show that the proposed framework is effective and achieves encouraging results. Specifically, the general classifier outperforms two baselines(a Na?ve 50% baseline and a cross-domain classifier), and the bootstrapping self-training classifier approximates the upper bound domain-specific classifier with the lowest accuracy of 81.5%, but the performance is more stable and the framework needs no labeled training dataset.展开更多
随着网络上越来越多的人发表自己的观点,带有情绪的贴文也逐渐增多,负面情绪的累积可能导致舆论失控,准确地识别贴文的情感极性能有效分析舆论现状。目前方面级的情感分析尚未有效融合语法信息以及语义信息,无法同时考虑语法结构的互补...随着网络上越来越多的人发表自己的观点,带有情绪的贴文也逐渐增多,负面情绪的累积可能导致舆论失控,准确地识别贴文的情感极性能有效分析舆论现状。目前方面级的情感分析尚未有效融合语法信息以及语义信息,无法同时考虑语法结构的互补性和语义相关性。为此,提出了一个融合语法和语义的方面级情感分析模型(Aspect-level Sentiment Analysis Mo-dels Based on Syntax and Semantics,SS-GCN),包括语法分析模块、语义分析模块以及融合模块。首先将文本作为预训练BERT模型的输入,通过语法分析模块获得语法关联关系的特征表示,同时经由邻域增强机制的语义分析模块捕获语义的相关性的特征表示。最后把二者输入到融合模块,在仿射变换的作用下对语法信息和语义信息进行有效的交互和融合,实现方面级情感分析。展开更多
针对现有情感分类模型在深层情感理解上的局限性、传统注意力机制的单向性束缚以及自然语言处理(NLP)中的类别不平衡等问题,提出一种融合多尺度BERT(Bidirectional Encoder Representations from Transformers)特征和双向交叉注意力机...针对现有情感分类模型在深层情感理解上的局限性、传统注意力机制的单向性束缚以及自然语言处理(NLP)中的类别不平衡等问题,提出一种融合多尺度BERT(Bidirectional Encoder Representations from Transformers)特征和双向交叉注意力机制的情感分类模型M-BCA(Multi-scale BERT features with Bidirectional Cross Attention)。首先,从BERT的低层、中层和高层分别提取多尺度特征,以捕捉句子文本的表面信息、语法信息和深层语义信息;其次,利用三通道门控循环单元(GRU)进一步提取深层语义特征,从而增强模型对文本的理解能力;最后,为促进不同尺度特征之间的交互与学习,引入双向交叉注意力机制,从而增强多尺度特征之间的相互作用。此外,针对不平衡数据问题,设计数据增强策略,并采用混合损失函数优化模型对少数类别样本的学习。实验结果表明,在细粒度情感分类任务中,M-BCA表现优异。M-BCA在处理分布不平衡的多分类情感数据集时,它的性能显著优于大多数基线模型。此外,M-BCA在少数类别样本的分类任务中表现突出,尤其是在NLPCC 2014与Online_Shopping_10_Cats数据集上,MBCA的少数类别的Macro-Recall领先其他所有对比模型。可见,该模型在细粒度情感分类任务中取得了显著的性能提升,并适用于处理不平衡数据集。展开更多
With the popularization of social media,public opi-nion information on emergencies spreads rapidly on the Internet,the impact of negative public opinions on an event has become more significant.Based on the organizati...With the popularization of social media,public opi-nion information on emergencies spreads rapidly on the Internet,the impact of negative public opinions on an event has become more significant.Based on the organizational form of public opinion information,the knowledge graph is used to construct the knowledge base of public opinion risk cases on the emer-gency network.The emotion recognition model of negative pub-lic opinion information based on the bi-directional long short-term memory(BiLSTM)network is studied in the model layer design,and a linear discriminant analysis(LDA)topic extraction method combined with association rules is proposed to extract and mine the semantics of negative public opinion topics to real-ize further in-depth analysis of information topics.Focusing on public health emergencies,knowledge acquisition and knowl-edge processing of public opinion information are conducted,and the experimental results show that the knowledge graph framework based on the construction can facilitate in-depth theme evolution analysis of public opinion events,thus demon-strating important research significance for reducing online pub-lic opinion risks.展开更多
由于传统文本评论情感分类方法通常忽略用户性格对于情感分类结果的影响,提出一种基于用户性格和语义-结构特征的文本评论情感分类方法(User Personality and Semantic-structural Features based Sentiment Classification Method for ...由于传统文本评论情感分类方法通常忽略用户性格对于情感分类结果的影响,提出一种基于用户性格和语义-结构特征的文本评论情感分类方法(User Personality and Semantic-structural Features based Sentiment Classification Method for Text Comments,BF_Bi GAC).依据大五人格模型能够有效表达用户性格的优势,通过计算不同维度性格得分,从评论文本中获取用户性格特征.利用双向门控循环单元(Bidirectional Gated Recurrent Unit,Bi GRU)和卷积神经网络(Convolutional Neural Network,CNN)可以有效提取文本上下文语义特征和局部结构特征的优势,提出一种基于Bi GRU、CNN和双层注意力机制的文本语义-结构特征获取方法.为区分不同类型特征的影响,引入混合注意力层实现对用户性格特征和文本语义-结构特征的有效融合,以此获得最终的文本向量表达.在IMDB、Yelp-2、Yelp-5及Ekman四个评论数据集上的对比实验结果表明,BF_Bi GAC在分类准确率(Accuracy)和加权macro F_(1)值(F_(w))上均获得较好表现,相对于拼接Bi GRU、CNN的情感分类方法(Sentiment Classification Method Concatenating Bi GRU and CNN,Bi G-RU_CNN)在Accuracy值上分别提升0.020、0.012、0.017及0.011,相对于拼接CNN、Bi GRU的情感分类方法(Sentiment Classification Method Concatenating CNN and Bi GRU,Conv Bi LSTM)F_(w)值上分别提升0.022、0.013、0.028及0.023;相对于预训练模型BERT和Ro BERTa,BF_Bi GAC在保证分类精度的情况下获得了较高的运行效率.展开更多
现有的大多数研究者使用循环神经网络与注意力机制相结合的方法进行方面级情感分类任务。然而,循环神经网络不能并行计算,并且模型在训练过程中会出现截断的反向传播、梯度消失和梯度爆炸等问题,传统的注意力机制可能会给句子中重要情...现有的大多数研究者使用循环神经网络与注意力机制相结合的方法进行方面级情感分类任务。然而,循环神经网络不能并行计算,并且模型在训练过程中会出现截断的反向传播、梯度消失和梯度爆炸等问题,传统的注意力机制可能会给句子中重要情感词分配较低的注意力权重。针对上述问题,该文提出了一种融合Transformer和交互注意力网络的方面级情感分类模型。首先利用BERT(bidirectional encoder representation from Transformers)预训练模型来构造词嵌入向量,然后使用Transformer编码器对输入的句子进行并行编码,接着使用上下文动态掩码和上下文动态权重机制来关注与特定方面词有重要语义关系的局部上下文信息。最后在5个英文数据集和4个中文评论数据集上的实验结果表明,该文所提模型在准确率和F1上均表现最优。展开更多
基金Projects(61573380,61303185)supported by the National Natural Science Foundation of ChinaProject(13BTQ052)supported by the National Social Science Foundation of China+1 种基金Project(2016M592450)supported by the China Postdoctoral Science FoundationProject(2016JJ4119)supported by the Hunan Provincial Natural Science Foundation of China
文摘With the rising and spreading of micro-blog, the sentiment classification of short texts has become a research hotspot. Some methods have been developed in the past decade. However, since the Chinese and English are different in language syntax, semantics and pragmatics, sentiment classification methods that are effective for English twitter may fail on Chinese micro-blog. In addition, the colloquialism and conciseness of short Chinese texts introduces additional challenges to sentiment classification. In this work, a novel hybrid learning model was proposed for sentiment classification of Chinese micro-blogs, which included two stages. In the first stage, emotional scores were calculated over the whole dataset by utilizing an improved Chinese-oriented sentiment dictionary classification method. Data with extremely high or low scores were directly labeled. In the second stage, the remaining data were labeled by using an integrated classification method based on sentiment dictionary, support vector machine(SVM) and k-nearest neighbor(KNN). An improved feature selection method was adopted to enhance the discriminative power of the selected features. The two-stage hybrid framework made the proposed method effective for sentiment classification of Chinese micro-blogs. Experiments on the COAE2014(Chinese Opinion Analysis Evaluation 2014) dataset show that the proposed method outperforms other schemes.
基金Supported by National High Technology Research and Development Program of China (863 Program) (2008AA01Z144) National Natural Science Foundation of China (60803093 60975055)
基金Projects(61170156,60933005)supported by the National Natural Science Foundation of China
文摘Sentiment analysis is the computational study of how opinions, attitudes, emotions, and perspectives are expressed in language, and has been the important task of natural language processing. Sentiment analysis is highly valuable for both research and practical applications. The focuses were put on the difficulties in the construction of sentiment classifiers which normally need tremendous labeled domain training data, and a novel unsupervised framework was proposed to make use of the Chinese idiom resources to develop a general sentiment classifier. Furthermore, the domain adaption of general sentiment classifier was improved by taking the general classifier as the base of a self-training procedure to get a domain self-training sentiment classifier. To validate the effect of the unsupervised framework, several experiments were carried out on publicly available Chinese online reviews dataset. The experiments show that the proposed framework is effective and achieves encouraging results. Specifically, the general classifier outperforms two baselines(a Na?ve 50% baseline and a cross-domain classifier), and the bootstrapping self-training classifier approximates the upper bound domain-specific classifier with the lowest accuracy of 81.5%, but the performance is more stable and the framework needs no labeled training dataset.
文摘随着网络上越来越多的人发表自己的观点,带有情绪的贴文也逐渐增多,负面情绪的累积可能导致舆论失控,准确地识别贴文的情感极性能有效分析舆论现状。目前方面级的情感分析尚未有效融合语法信息以及语义信息,无法同时考虑语法结构的互补性和语义相关性。为此,提出了一个融合语法和语义的方面级情感分析模型(Aspect-level Sentiment Analysis Mo-dels Based on Syntax and Semantics,SS-GCN),包括语法分析模块、语义分析模块以及融合模块。首先将文本作为预训练BERT模型的输入,通过语法分析模块获得语法关联关系的特征表示,同时经由邻域增强机制的语义分析模块捕获语义的相关性的特征表示。最后把二者输入到融合模块,在仿射变换的作用下对语法信息和语义信息进行有效的交互和融合,实现方面级情感分析。
文摘针对现有情感分类模型在深层情感理解上的局限性、传统注意力机制的单向性束缚以及自然语言处理(NLP)中的类别不平衡等问题,提出一种融合多尺度BERT(Bidirectional Encoder Representations from Transformers)特征和双向交叉注意力机制的情感分类模型M-BCA(Multi-scale BERT features with Bidirectional Cross Attention)。首先,从BERT的低层、中层和高层分别提取多尺度特征,以捕捉句子文本的表面信息、语法信息和深层语义信息;其次,利用三通道门控循环单元(GRU)进一步提取深层语义特征,从而增强模型对文本的理解能力;最后,为促进不同尺度特征之间的交互与学习,引入双向交叉注意力机制,从而增强多尺度特征之间的相互作用。此外,针对不平衡数据问题,设计数据增强策略,并采用混合损失函数优化模型对少数类别样本的学习。实验结果表明,在细粒度情感分类任务中,M-BCA表现优异。M-BCA在处理分布不平衡的多分类情感数据集时,它的性能显著优于大多数基线模型。此外,M-BCA在少数类别样本的分类任务中表现突出,尤其是在NLPCC 2014与Online_Shopping_10_Cats数据集上,MBCA的少数类别的Macro-Recall领先其他所有对比模型。可见,该模型在细粒度情感分类任务中取得了显著的性能提升,并适用于处理不平衡数据集。
基金supported by the National Social Science Foundation Major Project(22&ZD135)the National Social Science Fund National Emergency Management System Construction Research Project(20VYJ061).
文摘With the popularization of social media,public opi-nion information on emergencies spreads rapidly on the Internet,the impact of negative public opinions on an event has become more significant.Based on the organizational form of public opinion information,the knowledge graph is used to construct the knowledge base of public opinion risk cases on the emer-gency network.The emotion recognition model of negative pub-lic opinion information based on the bi-directional long short-term memory(BiLSTM)network is studied in the model layer design,and a linear discriminant analysis(LDA)topic extraction method combined with association rules is proposed to extract and mine the semantics of negative public opinion topics to real-ize further in-depth analysis of information topics.Focusing on public health emergencies,knowledge acquisition and knowl-edge processing of public opinion information are conducted,and the experimental results show that the knowledge graph framework based on the construction can facilitate in-depth theme evolution analysis of public opinion events,thus demon-strating important research significance for reducing online pub-lic opinion risks.
文摘由于传统文本评论情感分类方法通常忽略用户性格对于情感分类结果的影响,提出一种基于用户性格和语义-结构特征的文本评论情感分类方法(User Personality and Semantic-structural Features based Sentiment Classification Method for Text Comments,BF_Bi GAC).依据大五人格模型能够有效表达用户性格的优势,通过计算不同维度性格得分,从评论文本中获取用户性格特征.利用双向门控循环单元(Bidirectional Gated Recurrent Unit,Bi GRU)和卷积神经网络(Convolutional Neural Network,CNN)可以有效提取文本上下文语义特征和局部结构特征的优势,提出一种基于Bi GRU、CNN和双层注意力机制的文本语义-结构特征获取方法.为区分不同类型特征的影响,引入混合注意力层实现对用户性格特征和文本语义-结构特征的有效融合,以此获得最终的文本向量表达.在IMDB、Yelp-2、Yelp-5及Ekman四个评论数据集上的对比实验结果表明,BF_Bi GAC在分类准确率(Accuracy)和加权macro F_(1)值(F_(w))上均获得较好表现,相对于拼接Bi GRU、CNN的情感分类方法(Sentiment Classification Method Concatenating Bi GRU and CNN,Bi G-RU_CNN)在Accuracy值上分别提升0.020、0.012、0.017及0.011,相对于拼接CNN、Bi GRU的情感分类方法(Sentiment Classification Method Concatenating CNN and Bi GRU,Conv Bi LSTM)F_(w)值上分别提升0.022、0.013、0.028及0.023;相对于预训练模型BERT和Ro BERTa,BF_Bi GAC在保证分类精度的情况下获得了较高的运行效率.
文摘现有的大多数研究者使用循环神经网络与注意力机制相结合的方法进行方面级情感分类任务。然而,循环神经网络不能并行计算,并且模型在训练过程中会出现截断的反向传播、梯度消失和梯度爆炸等问题,传统的注意力机制可能会给句子中重要情感词分配较低的注意力权重。针对上述问题,该文提出了一种融合Transformer和交互注意力网络的方面级情感分类模型。首先利用BERT(bidirectional encoder representation from Transformers)预训练模型来构造词嵌入向量,然后使用Transformer编码器对输入的句子进行并行编码,接着使用上下文动态掩码和上下文动态权重机制来关注与特定方面词有重要语义关系的局部上下文信息。最后在5个英文数据集和4个中文评论数据集上的实验结果表明,该文所提模型在准确率和F1上均表现最优。