期刊文献+

基于Boosting机制的Naive Bayesian文本分类器 被引量:3

Naive Bayesian Classifier Using Boosting Mechanism
在线阅读 下载PDF
导出
摘要 Naive Bayesian分类器是一种有效的文本分类方法,但由于具有较强的稳定性,很难通过Boosting机制提高其性能。因此用Naive Bayesian分类器作为Boosting的基分类器需要解决的最大问题,就是如何破坏Naive Bayesian分类器的稳定性。提出了3种破坏Naive Bayesian学习器稳定性的方法。第一种方法改变训练集样本,第二种方法采用随机属性选择社团,第三种方法是在Boosting的每次迭代中利用不同的文本特征提取方法建立不同的特征词集。实验表明,这几种方法各有其优缺点,但都比原有方法准确、高效。 Naive Bayesian classifier is a kind of effective text categorization methods,but it is hard to improve its performance by Boosting procedure because of its stability.So the main problem derived from the Boosting procedure using Naive Bayesian classifier as the basic classifier is how to break its stability.Three methods that break the stability of naive Bayesian classifier are given.The first method changes samples of the training set,the second adopts the random selected feature group,and the third creates different feature set using different method to extract text features in each iteration of Boosting procedure.The three methods have respective advantages and disadvantages,but all of them are more accurate and effective than the original Naive Bayesian classifier.
出处 《计算机工程与应用》 CSCD 北大核心 2005年第8期31-33,67,共4页 Computer Engineering and Applications
基金 国家973重点基础研究发展规划项目基金资助(编号:G1998030414)
关键词 BOOSTING NAIVE BAYESIAN CLASSIFIER 文本分类 文本挖掘 数据挖掘 Boosting,Naive Bayesian Classifier,text categorization,text mining,data mining
  • 相关文献

参考文献7

  • 1Schapire R,Singer Y.BoostTexer:A system for multiclass multi-label text categorization[J].Machine Learning, 1998;39(2/3): 135-168.
  • 2Ting K M,Zheng Z.Improving the performance of boosting for Naive Bayesian classification[C].In:Proceedings of the Third Pacific-Asia Conference on Methodologies for Knowledge Discovery and Data Mining, 1999: 296-305.
  • 3McCallum A,Nigam K.A Comparison of Event Models for Naive Bayesian Text Classification[C].In:Proc of the AAAI/ICML-98 Workshop on Learning for Text Categorization, 1998:41-48.
  • 4Dietterich T G,Kong E B.Machine Learning Bias,Statistical Bias,and Statistical Variance of Decision Tree Algorithms[R].Technical Report, Dept of Computer Science, Oregon State University, Corvallis,Oregon, 1995.
  • 5Ali K M.Learning Probabilistic Relational Concept Descriptions[D].Ph D diss. Dept of Info and Computer Science,Univ of Califomia,Irvine,1996.
  • 6Zheng Z,Webb G I.Stochastic Attribute Selection Committees[R].Technical Report(TR C98/08),School of Computing and Mathematics ,Deakin University,Australia,1998.
  • 7Lili Diao,Mingyu Lu,Yuchang Lu et al. Using Boosting Mechanism to Refine the Threshold of VSM-based Similarity in Text Classification[C].In:Proc of the 4th World Congress on Intelligent Control and Automation' 2002 (WCICA'02), 2002: 2326-2329.

同被引文献14

引证文献3

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部