期刊文献+

基于文本内容分析的微博广告过滤模型研究 被引量:2

Research on Microblog Advertisement Filtering Model Based on Text Content Analysis
在线阅读 下载PDF
导出
摘要 针对新浪、腾讯等微博平台出现大量广告的问题,提出一个微博广告过滤模型。通过对数据的预处理,将采集到的微博原始数据转换成干净且计算机易处理的数据。在预处理阶段,根据微博文本的特点,对停用词表进行改进,以提高查准率,然后基于支持向量机构建一个训练分类器对数据进行训练,经过不断的学习和反馈,取得较好的分类效果。实验结果表明,该模型进行广告过滤时准确率超过90%,效果优于基于关键字的方法。 In order to solve the problem of a large number of advertisements on Sina, Tencent microblog platform, this paper proposes a microblog advertisement filtering model. Through the data pretreatment, the raw data are converted into clean data and easy to be handled by the computer. In the pretreatment stage, according to the characteristics of the microblog, this paper emphatically improves the stop word list, and it plays a key role in improving precision. Then it builds a classifier based on support vector machine for training data, and through continuous learning and feedback, better classification results are achieved. Experimental results show that the model of advertisement filter achieves better effect, when filtering accuracy is more than 90%, which is better than the method based on keywords.
作者 高俊波 梅波
出处 《计算机工程》 CAS CSCD 2014年第5期17-20,共4页 Computer Engineering
基金 上海海事大学科研基金资助项目(20100093)
关键词 微博 文本处理 向量空间模型 支持向量机 文本分类 广告过滤 microblog text processing vector space model Support Vector Machine(SVM) text classification advertisement filtering
作者简介 高俊波(1972--),男,副教授、博士,主研方向:计算智能,数据挖掘;梅波,硕士研究生。
  • 相关文献

参考文献5

二级参考文献59

共引文献2408

同被引文献24

  • 1张华平.NLPIR汉语分词系统[CP/OL].[2014—12-11].http://ietelas.nlpir.org/.
  • 2BLEI D M, NG A Y, JORDAN M I. Latent Dirichlet allocation [ J]. Journal of Machine Learning Research, 2003, 3:993 -1022.
  • 3XU T, OARD D W. Wikipedia-based topic clustering for microblogs [ J]. Proceedings of the American Society for Information Science and Technology, 2011, 48(1): 1 -10.
  • 4GRIFFITHS T L, STEYVERS M. Finding scientific topics [ J]. Proceedings of the National Academy of Sciences of the United States of America, 2004, 101(S1): 5228-5235.
  • 5SALTON G, WONG A, YANG C S. A vector space model for automatic indexing [ J]. Communications of the ACM, 1975, 18 (11): 613-620.
  • 6SALTON G, YANG C S. On the specification of term values in automatic indexing [ J]. Journal of Documentation, 1973, 29 (4) : 351 -372.
  • 7CAO J, XIA T, et al. A density-based method for adaptive LDA model selection [ J]. Neurocomputing, 2009, 72(7/8/9) : 1775 - 1781.
  • 8CHANG C-C, LIN C-J. LIBSVM: a library for support vector machines [ J]. ACM Transactions on Intelligent Systems and Technology, 2011, 2(3): Article No. 27.
  • 9李文波,孙乐,张大鲲.基于Labeled-LDA模型的文本分类新算法[J].计算机学报,2008,31(4):620-627. 被引量:103
  • 10贺涛,曹先彬,谭辉.基于免疫的中文网络短文本聚类算法[J].自动化学报,2009,35(7):896-902. 被引量:18

引证文献2

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部