摘要
为在决策树剪枝中正确选择剪枝方法,基于理论分析和算例详细地比较了当前主要的4种剪枝方法的计算复杂性、剪枝方式、误差估计和理论基础.与PEP相比,MEP产生的树精度较小且树较大;REP是最简单的剪枝方法之一,但需要独立剪枝集;在同样精度情况下,CCP比REP产生的树小.如果训练数据集丰富,可以选择REP,如果训练数据集较少且剪枝精度要求较高,则可以选用PEP.
To select a suitable pruning method in decision tree pruning, four well-known pruning methods were compared in terms of computational complexity, traversal strategy, error estimation and theoretical principle by taking a classification and regression tree as an example. Compared with pessimistic error pruning (PEP), minimum error pruning (MEP) is less accurate and produces a larger tree. Reduced error pruning (REP) is one of the simplest pruning strategies, but it has the disadvantage of requiring a separate data set for pruning. Cost-complexity pruning (CCP) produces a smaller tree than REP with similar accuracy. Practically, if the training data is abundant, REP is preferable; and if the train data is the expected accuracy is high but with limited data, PEP is good choice.
出处
《西南交通大学学报》
EI
CSCD
北大核心
2005年第1期44-48,共5页
Journal of Southwest Jiaotong University
关键词
数据挖掘
决策树
事后剪枝
PEP
MEP
REP
CCP
data mining
decision tree
post pruning
pessimistic error pruning
minimum error pruning
reduced error pruning
cost-complexity pruning