With the rising and spreading of micro-blog, the sentiment classification of short texts has become a research hotspot. Some methods have been developed in the past decade. However, since the Chinese and English are d...With the rising and spreading of micro-blog, the sentiment classification of short texts has become a research hotspot. Some methods have been developed in the past decade. However, since the Chinese and English are different in language syntax, semantics and pragmatics, sentiment classification methods that are effective for English twitter may fail on Chinese micro-blog. In addition, the colloquialism and conciseness of short Chinese texts introduces additional challenges to sentiment classification. In this work, a novel hybrid learning model was proposed for sentiment classification of Chinese micro-blogs, which included two stages. In the first stage, emotional scores were calculated over the whole dataset by utilizing an improved Chinese-oriented sentiment dictionary classification method. Data with extremely high or low scores were directly labeled. In the second stage, the remaining data were labeled by using an integrated classification method based on sentiment dictionary, support vector machine(SVM) and k-nearest neighbor(KNN). An improved feature selection method was adopted to enhance the discriminative power of the selected features. The two-stage hybrid framework made the proposed method effective for sentiment classification of Chinese micro-blogs. Experiments on the COAE2014(Chinese Opinion Analysis Evaluation 2014) dataset show that the proposed method outperforms other schemes.展开更多
股票市场的不确定性和复杂性使得股票预测成为一项具有挑战性的任务。鉴于金融文本在股票预测中的潜在价值,采用词典法和BERT双向长短期记忆模型(bidirectional encoder representations from transformers-bidirectional long short-te...股票市场的不确定性和复杂性使得股票预测成为一项具有挑战性的任务。鉴于金融文本在股票预测中的潜在价值,采用词典法和BERT双向长短期记忆模型(bidirectional encoder representations from transformers-bidirectional long short-term memory,BERT-BiLSTM)对在线财经新闻提取情感特征,构建了融合情感特征和股票交易特征的股指预测模型。实验对比了融合情感特征前后模型的预测能力,并探讨了不同模型、不同时间周期下预测能力的差异。实验结果表明,融合词典法和深度学习技术提取的情感特征均能提升各模型股指预测的准确率。LSTM模型相较其他实验模型在融合情感特征前后的股指预测上均表现较好。进一步的时间跨度分析表明,股指预测模型在较短的时间跨度上对股票指数涨跌的预测能力更强。为验证股指预测模型的实际价值,对沪深300指数的牛熊市和震荡市进行回测分析,结合LSTM模型和深度Q网络(deep Q-network,DQN)原理,对比了传统均线策略以及结合DQN强化学习算法后股指回测差异。回测结果表明,相比于单一的传统交易策略,结合传统交易策略和深度学习方法的股票指数预测模型在牛熊市及震荡市中均保证了正的夏普比例和累积收益率,并有效控制了最大回撤,显示出更强的市场适应性和盈利能力。展开更多
针对现有情感分类模型在深层情感理解上的局限性、传统注意力机制的单向性束缚以及自然语言处理(NLP)中的类别不平衡等问题,提出一种融合多尺度BERT(Bidirectional Encoder Representations from Transformers)特征和双向交叉注意力机...针对现有情感分类模型在深层情感理解上的局限性、传统注意力机制的单向性束缚以及自然语言处理(NLP)中的类别不平衡等问题,提出一种融合多尺度BERT(Bidirectional Encoder Representations from Transformers)特征和双向交叉注意力机制的情感分类模型M-BCA(Multi-scale BERT features with Bidirectional Cross Attention)。首先,从BERT的低层、中层和高层分别提取多尺度特征,以捕捉句子文本的表面信息、语法信息和深层语义信息;其次,利用三通道门控循环单元(GRU)进一步提取深层语义特征,从而增强模型对文本的理解能力;最后,为促进不同尺度特征之间的交互与学习,引入双向交叉注意力机制,从而增强多尺度特征之间的相互作用。此外,针对不平衡数据问题,设计数据增强策略,并采用混合损失函数优化模型对少数类别样本的学习。实验结果表明,在细粒度情感分类任务中,M-BCA表现优异。M-BCA在处理分布不平衡的多分类情感数据集时,它的性能显著优于大多数基线模型。此外,M-BCA在少数类别样本的分类任务中表现突出,尤其是在NLPCC 2014与Online_Shopping_10_Cats数据集上,MBCA的少数类别的Macro-Recall领先其他所有对比模型。可见,该模型在细粒度情感分类任务中取得了显著的性能提升,并适用于处理不平衡数据集。展开更多
基金Projects(61573380,61303185)supported by the National Natural Science Foundation of ChinaProject(13BTQ052)supported by the National Social Science Foundation of China+1 种基金Project(2016M592450)supported by the China Postdoctoral Science FoundationProject(2016JJ4119)supported by the Hunan Provincial Natural Science Foundation of China
文摘With the rising and spreading of micro-blog, the sentiment classification of short texts has become a research hotspot. Some methods have been developed in the past decade. However, since the Chinese and English are different in language syntax, semantics and pragmatics, sentiment classification methods that are effective for English twitter may fail on Chinese micro-blog. In addition, the colloquialism and conciseness of short Chinese texts introduces additional challenges to sentiment classification. In this work, a novel hybrid learning model was proposed for sentiment classification of Chinese micro-blogs, which included two stages. In the first stage, emotional scores were calculated over the whole dataset by utilizing an improved Chinese-oriented sentiment dictionary classification method. Data with extremely high or low scores were directly labeled. In the second stage, the remaining data were labeled by using an integrated classification method based on sentiment dictionary, support vector machine(SVM) and k-nearest neighbor(KNN). An improved feature selection method was adopted to enhance the discriminative power of the selected features. The two-stage hybrid framework made the proposed method effective for sentiment classification of Chinese micro-blogs. Experiments on the COAE2014(Chinese Opinion Analysis Evaluation 2014) dataset show that the proposed method outperforms other schemes.
文摘股票市场的不确定性和复杂性使得股票预测成为一项具有挑战性的任务。鉴于金融文本在股票预测中的潜在价值,采用词典法和BERT双向长短期记忆模型(bidirectional encoder representations from transformers-bidirectional long short-term memory,BERT-BiLSTM)对在线财经新闻提取情感特征,构建了融合情感特征和股票交易特征的股指预测模型。实验对比了融合情感特征前后模型的预测能力,并探讨了不同模型、不同时间周期下预测能力的差异。实验结果表明,融合词典法和深度学习技术提取的情感特征均能提升各模型股指预测的准确率。LSTM模型相较其他实验模型在融合情感特征前后的股指预测上均表现较好。进一步的时间跨度分析表明,股指预测模型在较短的时间跨度上对股票指数涨跌的预测能力更强。为验证股指预测模型的实际价值,对沪深300指数的牛熊市和震荡市进行回测分析,结合LSTM模型和深度Q网络(deep Q-network,DQN)原理,对比了传统均线策略以及结合DQN强化学习算法后股指回测差异。回测结果表明,相比于单一的传统交易策略,结合传统交易策略和深度学习方法的股票指数预测模型在牛熊市及震荡市中均保证了正的夏普比例和累积收益率,并有效控制了最大回撤,显示出更强的市场适应性和盈利能力。
文摘针对现有情感分类模型在深层情感理解上的局限性、传统注意力机制的单向性束缚以及自然语言处理(NLP)中的类别不平衡等问题,提出一种融合多尺度BERT(Bidirectional Encoder Representations from Transformers)特征和双向交叉注意力机制的情感分类模型M-BCA(Multi-scale BERT features with Bidirectional Cross Attention)。首先,从BERT的低层、中层和高层分别提取多尺度特征,以捕捉句子文本的表面信息、语法信息和深层语义信息;其次,利用三通道门控循环单元(GRU)进一步提取深层语义特征,从而增强模型对文本的理解能力;最后,为促进不同尺度特征之间的交互与学习,引入双向交叉注意力机制,从而增强多尺度特征之间的相互作用。此外,针对不平衡数据问题,设计数据增强策略,并采用混合损失函数优化模型对少数类别样本的学习。实验结果表明,在细粒度情感分类任务中,M-BCA表现优异。M-BCA在处理分布不平衡的多分类情感数据集时,它的性能显著优于大多数基线模型。此外,M-BCA在少数类别样本的分类任务中表现突出,尤其是在NLPCC 2014与Online_Shopping_10_Cats数据集上,MBCA的少数类别的Macro-Recall领先其他所有对比模型。可见,该模型在细粒度情感分类任务中取得了显著的性能提升,并适用于处理不平衡数据集。