With the rising and spreading of micro-blog, the sentiment classification of short texts has become a research hotspot. Some methods have been developed in the past decade. However, since the Chinese and English are d...With the rising and spreading of micro-blog, the sentiment classification of short texts has become a research hotspot. Some methods have been developed in the past decade. However, since the Chinese and English are different in language syntax, semantics and pragmatics, sentiment classification methods that are effective for English twitter may fail on Chinese micro-blog. In addition, the colloquialism and conciseness of short Chinese texts introduces additional challenges to sentiment classification. In this work, a novel hybrid learning model was proposed for sentiment classification of Chinese micro-blogs, which included two stages. In the first stage, emotional scores were calculated over the whole dataset by utilizing an improved Chinese-oriented sentiment dictionary classification method. Data with extremely high or low scores were directly labeled. In the second stage, the remaining data were labeled by using an integrated classification method based on sentiment dictionary, support vector machine(SVM) and k-nearest neighbor(KNN). An improved feature selection method was adopted to enhance the discriminative power of the selected features. The two-stage hybrid framework made the proposed method effective for sentiment classification of Chinese micro-blogs. Experiments on the COAE2014(Chinese Opinion Analysis Evaluation 2014) dataset show that the proposed method outperforms other schemes.展开更多
Sentiment analysis is the computational study of how opinions, attitudes, emotions, and perspectives are expressed in language, and has been the important task of natural language processing. Sentiment analysis is hig...Sentiment analysis is the computational study of how opinions, attitudes, emotions, and perspectives are expressed in language, and has been the important task of natural language processing. Sentiment analysis is highly valuable for both research and practical applications. The focuses were put on the difficulties in the construction of sentiment classifiers which normally need tremendous labeled domain training data, and a novel unsupervised framework was proposed to make use of the Chinese idiom resources to develop a general sentiment classifier. Furthermore, the domain adaption of general sentiment classifier was improved by taking the general classifier as the base of a self-training procedure to get a domain self-training sentiment classifier. To validate the effect of the unsupervised framework, several experiments were carried out on publicly available Chinese online reviews dataset. The experiments show that the proposed framework is effective and achieves encouraging results. Specifically, the general classifier outperforms two baselines(a Na?ve 50% baseline and a cross-domain classifier), and the bootstrapping self-training classifier approximates the upper bound domain-specific classifier with the lowest accuracy of 81.5%, but the performance is more stable and the framework needs no labeled training dataset.展开更多
针对文本分类的深度学习主流模型中存在的特征提取不全面、位置结构信息缺失等问题,提出一种融合情感簇的混合神经网络短文本情感分类模型(sentiment clustering and fusion of multiple neural networks,SCMN)。该方法首先通过双向变...针对文本分类的深度学习主流模型中存在的特征提取不全面、位置结构信息缺失等问题,提出一种融合情感簇的混合神经网络短文本情感分类模型(sentiment clustering and fusion of multiple neural networks,SCMN)。该方法首先通过双向变换器模型(bidirectional encoder representations from Transformers,BERT)预训练模型生成词向量,并进行情感簇聚类和情感权重增强;然后使用带有注意力机制的双向长短期记忆网络(bidirectional long short term memory,BiLSTM),捕获文本的上下文特征;再通过胶囊网络(capsual network,CapsNet)提取带有句子结构信息的局部语义特征并完成分类。基于公开数据集和自爬取数据集,将本文模型与深度学习主流分类模型进行对比实验及不同组件的消融实验。实验结果表明,相较于其他方法,本文模型精确率实现了平均5.5%的增长,证实了不同组件能为模型带来有效增益,提升文本情感分类效果。展开更多
随着网络上越来越多的人发表自己的观点,带有情绪的贴文也逐渐增多,负面情绪的累积可能导致舆论失控,准确地识别贴文的情感极性能有效分析舆论现状。目前方面级的情感分析尚未有效融合语法信息以及语义信息,无法同时考虑语法结构的互补...随着网络上越来越多的人发表自己的观点,带有情绪的贴文也逐渐增多,负面情绪的累积可能导致舆论失控,准确地识别贴文的情感极性能有效分析舆论现状。目前方面级的情感分析尚未有效融合语法信息以及语义信息,无法同时考虑语法结构的互补性和语义相关性。为此,提出了一个融合语法和语义的方面级情感分析模型(Aspect-level Sentiment Analysis Mo-dels Based on Syntax and Semantics,SS-GCN),包括语法分析模块、语义分析模块以及融合模块。首先将文本作为预训练BERT模型的输入,通过语法分析模块获得语法关联关系的特征表示,同时经由邻域增强机制的语义分析模块捕获语义的相关性的特征表示。最后把二者输入到融合模块,在仿射变换的作用下对语法信息和语义信息进行有效的交互和融合,实现方面级情感分析。展开更多
针对现有情感分类模型在深层情感理解上的局限性、传统注意力机制的单向性束缚以及自然语言处理(NLP)中的类别不平衡等问题,提出一种融合多尺度BERT(Bidirectional Encoder Representations from Transformers)特征和双向交叉注意力机...针对现有情感分类模型在深层情感理解上的局限性、传统注意力机制的单向性束缚以及自然语言处理(NLP)中的类别不平衡等问题,提出一种融合多尺度BERT(Bidirectional Encoder Representations from Transformers)特征和双向交叉注意力机制的情感分类模型M-BCA(Multi-scale BERT features with Bidirectional Cross Attention)。首先,从BERT的低层、中层和高层分别提取多尺度特征,以捕捉句子文本的表面信息、语法信息和深层语义信息;其次,利用三通道门控循环单元(GRU)进一步提取深层语义特征,从而增强模型对文本的理解能力;最后,为促进不同尺度特征之间的交互与学习,引入双向交叉注意力机制,从而增强多尺度特征之间的相互作用。此外,针对不平衡数据问题,设计数据增强策略,并采用混合损失函数优化模型对少数类别样本的学习。实验结果表明,在细粒度情感分类任务中,M-BCA表现优异。M-BCA在处理分布不平衡的多分类情感数据集时,它的性能显著优于大多数基线模型。此外,M-BCA在少数类别样本的分类任务中表现突出,尤其是在NLPCC 2014与Online_Shopping_10_Cats数据集上,MBCA的少数类别的Macro-Recall领先其他所有对比模型。可见,该模型在细粒度情感分类任务中取得了显著的性能提升,并适用于处理不平衡数据集。展开更多
With the popularization of social media,public opi-nion information on emergencies spreads rapidly on the Internet,the impact of negative public opinions on an event has become more significant.Based on the organizati...With the popularization of social media,public opi-nion information on emergencies spreads rapidly on the Internet,the impact of negative public opinions on an event has become more significant.Based on the organizational form of public opinion information,the knowledge graph is used to construct the knowledge base of public opinion risk cases on the emer-gency network.The emotion recognition model of negative pub-lic opinion information based on the bi-directional long short-term memory(BiLSTM)network is studied in the model layer design,and a linear discriminant analysis(LDA)topic extraction method combined with association rules is proposed to extract and mine the semantics of negative public opinion topics to real-ize further in-depth analysis of information topics.Focusing on public health emergencies,knowledge acquisition and knowl-edge processing of public opinion information are conducted,and the experimental results show that the knowledge graph framework based on the construction can facilitate in-depth theme evolution analysis of public opinion events,thus demon-strating important research significance for reducing online pub-lic opinion risks.展开更多
基金Projects(61573380,61303185)supported by the National Natural Science Foundation of ChinaProject(13BTQ052)supported by the National Social Science Foundation of China+1 种基金Project(2016M592450)supported by the China Postdoctoral Science FoundationProject(2016JJ4119)supported by the Hunan Provincial Natural Science Foundation of China
文摘With the rising and spreading of micro-blog, the sentiment classification of short texts has become a research hotspot. Some methods have been developed in the past decade. However, since the Chinese and English are different in language syntax, semantics and pragmatics, sentiment classification methods that are effective for English twitter may fail on Chinese micro-blog. In addition, the colloquialism and conciseness of short Chinese texts introduces additional challenges to sentiment classification. In this work, a novel hybrid learning model was proposed for sentiment classification of Chinese micro-blogs, which included two stages. In the first stage, emotional scores were calculated over the whole dataset by utilizing an improved Chinese-oriented sentiment dictionary classification method. Data with extremely high or low scores were directly labeled. In the second stage, the remaining data were labeled by using an integrated classification method based on sentiment dictionary, support vector machine(SVM) and k-nearest neighbor(KNN). An improved feature selection method was adopted to enhance the discriminative power of the selected features. The two-stage hybrid framework made the proposed method effective for sentiment classification of Chinese micro-blogs. Experiments on the COAE2014(Chinese Opinion Analysis Evaluation 2014) dataset show that the proposed method outperforms other schemes.
基金Supported by National High Technology Research and Development Program of China (863 Program) (2008AA01Z144) National Natural Science Foundation of China (60803093 60975055)
基金Projects(61170156,60933005)supported by the National Natural Science Foundation of China
文摘Sentiment analysis is the computational study of how opinions, attitudes, emotions, and perspectives are expressed in language, and has been the important task of natural language processing. Sentiment analysis is highly valuable for both research and practical applications. The focuses were put on the difficulties in the construction of sentiment classifiers which normally need tremendous labeled domain training data, and a novel unsupervised framework was proposed to make use of the Chinese idiom resources to develop a general sentiment classifier. Furthermore, the domain adaption of general sentiment classifier was improved by taking the general classifier as the base of a self-training procedure to get a domain self-training sentiment classifier. To validate the effect of the unsupervised framework, several experiments were carried out on publicly available Chinese online reviews dataset. The experiments show that the proposed framework is effective and achieves encouraging results. Specifically, the general classifier outperforms two baselines(a Na?ve 50% baseline and a cross-domain classifier), and the bootstrapping self-training classifier approximates the upper bound domain-specific classifier with the lowest accuracy of 81.5%, but the performance is more stable and the framework needs no labeled training dataset.
文摘针对文本分类的深度学习主流模型中存在的特征提取不全面、位置结构信息缺失等问题,提出一种融合情感簇的混合神经网络短文本情感分类模型(sentiment clustering and fusion of multiple neural networks,SCMN)。该方法首先通过双向变换器模型(bidirectional encoder representations from Transformers,BERT)预训练模型生成词向量,并进行情感簇聚类和情感权重增强;然后使用带有注意力机制的双向长短期记忆网络(bidirectional long short term memory,BiLSTM),捕获文本的上下文特征;再通过胶囊网络(capsual network,CapsNet)提取带有句子结构信息的局部语义特征并完成分类。基于公开数据集和自爬取数据集,将本文模型与深度学习主流分类模型进行对比实验及不同组件的消融实验。实验结果表明,相较于其他方法,本文模型精确率实现了平均5.5%的增长,证实了不同组件能为模型带来有效增益,提升文本情感分类效果。
文摘随着网络上越来越多的人发表自己的观点,带有情绪的贴文也逐渐增多,负面情绪的累积可能导致舆论失控,准确地识别贴文的情感极性能有效分析舆论现状。目前方面级的情感分析尚未有效融合语法信息以及语义信息,无法同时考虑语法结构的互补性和语义相关性。为此,提出了一个融合语法和语义的方面级情感分析模型(Aspect-level Sentiment Analysis Mo-dels Based on Syntax and Semantics,SS-GCN),包括语法分析模块、语义分析模块以及融合模块。首先将文本作为预训练BERT模型的输入,通过语法分析模块获得语法关联关系的特征表示,同时经由邻域增强机制的语义分析模块捕获语义的相关性的特征表示。最后把二者输入到融合模块,在仿射变换的作用下对语法信息和语义信息进行有效的交互和融合,实现方面级情感分析。
文摘针对现有情感分类模型在深层情感理解上的局限性、传统注意力机制的单向性束缚以及自然语言处理(NLP)中的类别不平衡等问题,提出一种融合多尺度BERT(Bidirectional Encoder Representations from Transformers)特征和双向交叉注意力机制的情感分类模型M-BCA(Multi-scale BERT features with Bidirectional Cross Attention)。首先,从BERT的低层、中层和高层分别提取多尺度特征,以捕捉句子文本的表面信息、语法信息和深层语义信息;其次,利用三通道门控循环单元(GRU)进一步提取深层语义特征,从而增强模型对文本的理解能力;最后,为促进不同尺度特征之间的交互与学习,引入双向交叉注意力机制,从而增强多尺度特征之间的相互作用。此外,针对不平衡数据问题,设计数据增强策略,并采用混合损失函数优化模型对少数类别样本的学习。实验结果表明,在细粒度情感分类任务中,M-BCA表现优异。M-BCA在处理分布不平衡的多分类情感数据集时,它的性能显著优于大多数基线模型。此外,M-BCA在少数类别样本的分类任务中表现突出,尤其是在NLPCC 2014与Online_Shopping_10_Cats数据集上,MBCA的少数类别的Macro-Recall领先其他所有对比模型。可见,该模型在细粒度情感分类任务中取得了显著的性能提升,并适用于处理不平衡数据集。
基金supported by the National Social Science Foundation Major Project(22&ZD135)the National Social Science Fund National Emergency Management System Construction Research Project(20VYJ061).
文摘With the popularization of social media,public opi-nion information on emergencies spreads rapidly on the Internet,the impact of negative public opinions on an event has become more significant.Based on the organizational form of public opinion information,the knowledge graph is used to construct the knowledge base of public opinion risk cases on the emer-gency network.The emotion recognition model of negative pub-lic opinion information based on the bi-directional long short-term memory(BiLSTM)network is studied in the model layer design,and a linear discriminant analysis(LDA)topic extraction method combined with association rules is proposed to extract and mine the semantics of negative public opinion topics to real-ize further in-depth analysis of information topics.Focusing on public health emergencies,knowledge acquisition and knowl-edge processing of public opinion information are conducted,and the experimental results show that the knowledge graph framework based on the construction can facilitate in-depth theme evolution analysis of public opinion events,thus demon-strating important research significance for reducing online pub-lic opinion risks.