摘要
在互联网快速发展的今天,人类已经进入"大数据"时代,其中文本数据作为人类知识的载体,对于人类的进步与发展意义重大。如何运用大量未标记样本来提升文本情感分类的精度,也变得愈发重要。将半监督学习中的聚类核算法应用到情感分类问题中,给出基于聚类核的半监督情感分类算法。在标记样本和未标记样本上,建立加权无向图,求解聚类核,然后将该核函数用于SVM的情感分类器的训练上,完成情感分类工作。该方法直接将未标记样本所蕴含的信息融合到核中,不需要建立多个分类器,有效利用了未标记样本。实验结果表明,CKSVM算法在分类精度上明显优于基于Self-learning SVM和Co-training SVM的半监督情感分类算法,且在不同数据集上都有较好的适应性。
In the rapid development of the Intemet today,mankind has entered the era of big data. Text data as the carrier of human knowledge,is of great significance for human progress and development. So the usage of a large number of unlabeled samples to improve the accuracy of sentiment classification,has become more and more important. The kernel clustering method in semi supervised learning is applied to the emotion classification problem, and a semi supervised sentiment classification algorithm based on kernel clustering is proposed. A weighted undirected graph is built according to the labeled samples and unlabeled samples, solving the clustering kernel, and then the kernel function is used for the training of classifier SVM. This method directly uses the information contained by unlabeled samples into the kernel, no need to set up multiple classifiers, effective useagc of the unlabeled samples. Experimental results show that the CKSVM is better than that based on Self-learning SVM and Co-training SVM in classification accuracy ,with better adaptability on different data sets.
出处
《计算机技术与发展》
2016年第12期87-91,95,共6页
Computer Technology and Development
基金
国家自然科学基金资助项目(61070234
61071167
61501251)
南京邮电大学引进人才科研启动基金资助项目(NY214191)
关键词
半监督学习
聚类核
图
情感分类
semi-supervised learning
clustering kernel
graph
sentiment classification
作者简介
郑文静(1990-),女,研究方向为机器学习、情感分类;
李雷,博士,教授,研究方向为智能信号处理、非线性分析与计算智能、机器学习。