期刊文献+

基于三支决策的不平衡数据过采样方法 被引量:31

An Oversampling Method for Imbalance Data Based on Three-Way Decision Model
在线阅读 下载PDF
导出
摘要 采样是解决不平衡数据分类问题的一个有效途径.文中结合三支决策理论,根据样本分布将样本划分成三个区域:正域、边界域和负域;在此基础上,分别对边界域和负域中的小类样本进行不同的过采样处理,提出了一种基于三支决策的不平衡数据过采样算法(TWD-IDOS算法).实验结果表明,在C4.5、KNN和CART等分类器上,文中提出的算法能有效解决不平衡数据的二分类问题,在Recall、F-value、AUC等指标上优于文献中的过采样算法. Sampling is an effective way to solve the problem of unbalanced data classification. According to the distribution of samples,we employ the three-way decision model to divide the universe into three parts: positive region,boundary region and negative region. After that,we oversample the minority class samples in boundary region and negative region respectively.Then,a novel oversampling algorithm for imbalance data based on three-way decision model,namely TWDIDOS,is developed. The experimental results show that the proposed method can effectively solve the two-class classification problems of imbalanced data and has a better performance in such measures( Recall、F-value、AUC) on C45,KNN and CART classifiers than other oversampling methods.
出处 《电子学报》 EI CAS CSCD 北大核心 2018年第1期135-144,共10页 Acta Electronica Sinica
基金 国家自然科学基金(No.61309014 No.61379114 No.61472056) 教育部人文社科规划(No.15XJA630003) 重庆市基础与前沿研究计划(No.cstc2013jcyjA40063 No.cstc2014jcyjA40049) 重庆市教委科学技术研究(No.KJ1500416)
关键词 三支决策 邻域粗糙集 边界采样 不平衡数据 SMOTE three-way decision neighborhood rough set boundary sampling imbalanced data SMOTE
作者简介 胡峰男,1978年7月出生,湖北天门人,教授、硕士生导师.2000年、2003年和2011年分别在重庆大学、武汉大学和西南交通大学获得理学学士、工学硕士和工学博士学位,现为重庆邮电大学教师.主要研究方向为数据挖掘、Rough集和粒计算等.E—mail:hufeng@cqupt.edu.cn;王蕾 男,1989年出生于山东德州,重庆邮电大学在读硕土研究生.主要研究方向为数据挖掘、三支决策、Rough集.
  • 相关文献

参考文献4

二级参考文献61

  • 1徐琳宏,林鸿飞,杨志豪.基于语义理解的文本倾向性识别机制[J].中文信息学报,2007,21(1):96-100. 被引量:124
  • 2Veropoulos K., Campbell C. and Crisfianini N. Controlling the Sensitivity of Support Vector Machines[A]. Proceedings of the 16^th International Joint Conference on Artificial Intelligence (IJCAI 1999) [C]. Stockholm, Sweden: IJCAI Press, 1999:55 - 60.
  • 3R. Akbani, S. Kwek and N. Japkowicz. Applying Support Vector Machines to Imbalanced Datasets [ A ]. Proceedings of the 15th European Conference on Machine Learning (ECML 2004) [ C]. Italy: Springer Press, 2004.39 - 50.
  • 4Yuan J., Li J., and Zhang B. Learning Concepts from Large Scale Imbalanced Data Sets using Support Ouster Machines [ A].Proceedings of the 14th annul ACM International Conference on Multimedia[ C ]. Santa Barbara: ACM Press, 2006. 441 - 450.
  • 5P. Kang and S. Cho. EUS SVMs: Ensemble of Under - Sampied SVMs for Data Imbalance Problems [A]. Proceedings of the 13^th International Conference on Neural Information Processing (ICONIP 2006) [C]. Hong Kong: Springer Press, 2006: 837 - 846.
  • 6T Imam, K M Ting, J Kamruzzaman. z - SVM: An SVM for Improved Classification of Imbalanced Data [ A ]. Proceedings of the 19th Australian Joint Conference on Artifical Intelligence (AJCAI 2006) [ C]. Hobart, Australia: Springer Press, 2006. 264 - 273.
  • 7Chawla N V, Bowyer K W, Hall L O, Kegelmeyer W. P. Smote: Synthetic Minority Over-sampling Technique[ J]. Journal of Artificial Intelligence Research. (JAIR) ,2002,16:321 - 357.
  • 8Y. Liu,A.An,X.Huang. Boosting prediction accuracy on irn- balanced datasets with SVM ensembles[ A]. Proceedings of the 10th Pacific- Asia Conference on Knowledge Discovery and Data Mining ( PAKDD 2006) [ C ]. Singapore: Springer Press, 2006:107 - 118.
  • 9J T Kwok, I W Tsang. The Pre-image Problem in Kernel Methods [J]. IEEE. Transactions on Neural Networks,2004, 15(6) : 1517- 1525.
  • 10J C Crower. Adding a Point to Vector Diagrams In Multivariate Analysis [ J]. Biometrika, 1968,55 (3) : 582 - 585.

共引文献97

同被引文献193

引证文献31

二级引证文献125

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部