摘要
以提升不均衡数据集内少数类样本的分类性能为目标,从样本采样以及分类器优化两方面构建面向不均衡数据集的过抽样数学模型。利用数据分布不均衡条件下的少数类过抽样算法处理不均衡数据集内少数类样本,算法将少数类样本作为中心,利用新生成的虚拟少数类样本改善不平衡数据集内数据不均匀分布情况,将完成处理的少数类样本与多数类样本结合建立新训练样本集合,新训练样本集合输入经过熵值法优化的混合核ε-SVM分类器中训练分类器,将测试样本集输入完成训练的优化混合核ε-SVM分类器中,实现不均衡数据集内样本精准分类。实验结果表明,上述模型分类不均衡数据集内少数类样本的F-Value值高于0.8,具有良好的分类效果,可解决数据集内样本分布不均衡情况。
This paper constructs an oversampling mathematical model for imbalanced data sets based on sample sampling and classifier optimization for improving the classification performance of minority samples in imbalanced data sets.The minority oversampling algorithm for imbalanced data was applied to deal with minority samples. Minority samples were taken as the center, and the uneven distribution of data was improved with the new virtual minority samples. The processed minority class samples and majority class samples were combined to construct a new training sample set. In the training classifier of the hybrid kernel ε-SVM classifier optimized by entropy method, the new training sample set was input, realizing the accurate classification of the samples in the unbalanced data set.The results show that the model has excellent classification effect(F-value>0.8).
作者
杨思狄
王亚玲
YANG Si-di;WANG Ya-ling(Manzhouli College of Inner Mongolia University,Hulunbeier Neimenggu 021400,China)
出处
《计算机仿真》
北大核心
2021年第5期472-476,共5页
Computer Simulation
基金
内蒙古大学2020年校级本科教学改革研究与建设项目(NDJ2094)。
关键词
不均衡
数据集
过抽样
数学模型
熵值法
少数类样本
Imbalance
Data set
Oversampling
Mathematical model
Entropy method
Minority samples
作者简介
杨思狄(1980-),男(达斡尔族),内蒙古呼伦贝尔人,硕士,讲师,主要研究方向:计算数学;王亚玲(1985-),女(汉族),吉林四平人,硕士,讲师,主要研究方向:基础数学。