摘要
为了在提高文本分类效率和提升分类速度间进行平衡,综合考虑SVM决策树的深度、均衡度、构造方式、类内样本数、类间相似度等对分类结果的影响,提出针对海量文本多分类问题的SVM决策树构建算法.在大规模语料库上的文本分类实验表明,该算法可在一定程度上提升分类效果,同时可以大幅减少训练和测试时间,方法可行且适应性强.
In order to make balance between the improvement of the text categorization efficiency and the promotion of the categorization speed, the influence of the depth balancing degree, construction mode in- ner sample number, and inter-category similarity of SVM decision tree on the categorization result was comprehensively considered and a construction algorithm of the SVM decision tree for massive text catego- rization was proposed. The experiment on text categorization of massive linguistic corpus demonstrated that this algorithm could improve the categorization efficiency to some extent and decrease greatly the training and testing time at the same time. This method was feasible with strong adaptability.
出处
《兰州理工大学学报》
CAS
北大核心
2012年第3期98-101,共4页
Journal of Lanzhou University of Technology
基金
河南省自然科学基金(112300410301)
关键词
文本分类
支持向量机
决策树
多类分类器
text categorization
support vector machine
decision tree
multi-category classifier
作者简介
方莹(1977-),女,河南商丘人,博士生。讲师.