摘要
Naive Bayesian分类器是一种有效的文本分类方法,但由于具有较强的稳定性,很难通过Boosting机制提高其性能。因此用Naive Bayesian分类器作为Boosting的基分类器需要解决的最大问题,就是如何破坏Naive Bayesian分类器的稳定性。提出了3种破坏Naive Bayesian学习器稳定性的方法。第一种方法改变训练集样本,第二种方法采用随机属性选择社团,第三种方法是在Boosting的每次迭代中利用不同的文本特征提取方法建立不同的特征词集。实验表明,这几种方法各有其优缺点,但都比原有方法准确、高效。
Naive Bayesian classifier is a kind of effective text categorization methods,but it is hard to improve its performance by Boosting procedure because of its stability.So the main problem derived from the Boosting procedure using Naive Bayesian classifier as the basic classifier is how to break its stability.Three methods that break the stability of naive Bayesian classifier are given.The first method changes samples of the training set,the second adopts the random selected feature group,and the third creates different feature set using different method to extract text features in each iteration of Boosting procedure.The three methods have respective advantages and disadvantages,but all of them are more accurate and effective than the original Naive Bayesian classifier.
出处
《计算机工程与应用》
CSCD
北大核心
2005年第8期31-33,67,共4页
Computer Engineering and Applications
基金
国家973重点基础研究发展规划项目基金资助(编号:G1998030414)