期刊文献+

森林优化特征选择算法的增强与扩展 被引量:10

Enhancement and Extension of Feature Selection Using Forest Optimization Algorithm
在线阅读 下载PDF
导出
摘要 特征选择作为一种重要的数据预处理方法,不但能解决维数灾难问题,还能提高算法的泛化能力.各种各样的方法已被应用于解决特征选择问题,其中,基于演化计算的特征选择算法近年来获得了更多的关注并取得了一些成功.近期研究结果表明,森林优化特征选择算法具有更好的分类性能及维度缩减能力.然而,初始化阶段的随机性、全局播种阶段的人为参数设定,影响了该算法的准确率和维度缩减能力;同时,算法本身存在着高维数据处理能力不足的本质缺陷.从信息增益率的角度给出了一种初始化策略,在全局播种阶段,借用模拟退火控温函数的思想自动生成参数,并结合维度缩减率给出了适应度函数;同时,针对形成的优质森林采取贪心算法,形成一种特征选择算法EFSFOA(enhanced feature selection using forest optimization algorithm).此外,在面对高维数据的处理时,采用集成特征选择的方案形成了一个适用于EFSFOA的集成特征选择框架,使其能够有效处理高维数据特征选择问题.通过设计对比实验,验证了EFSFOA与FSFOA相比在分类准确率和维度缩减率上均有明显的提高,高维数据处理能力更是提高到了100 000维.将EFSFOA与近年来提出的比较高效的基于演化计算的特征选择方法进行对比,EFSFOA仍具有很强的竞争力. As an important data preprocessing method, feature selection can not only solve the dimensionality disaster problem, but also improve the generalization ability of algorithms. A variety of methods have been applied to solve feature selection problems, where evolutionary computation techniques have recently gained much attention and shown some success. Recent study has shown that feature selection using forest optimization algorithm has better classification performance and dimensional reduction ability. However, the randomness of initialization phase and the artificial parameter setting of global seeding phase affect the accuracy and the dimension reduction ability of the algorithm. At the same time, the algorithm itself has the essential defect of insufficient high-dimensional data processing capability. In this study, an initialization strategy is given from the perspective of information gain rate, parameter is automatically generated by using simulated annealing temperature control function during global seeding, a fitness function is given by combining dimension reduction rate, using greedy algorithm to select the best tree from the high-quality forest obtained, and a feature selection algorithm EFSFOA(enhanced feature selection using forest optimization algorithm) is proposed. In addition, in the face of high-dimensional data processing, ensemble feature selection scheme is used to form an ensemble feature selection framework suitable for EFSFOA, so that it can effectively deal with the problem of high-dimensional data feature selection. Through designing some contrast experiments, it is verified that EFSFOA has significantly improved classification accuracy and dimensionality reduction rate compared with FSFOA, and the high-dimensional data processing capability has been increased to 100 000 dimensions. Comparing EFSFOA with other efficient evolutionary computation for feature selection approaches which have been proposed in recent years, EFSFOA still has strong competitiveness.
作者 刘兆赓 李占山 王丽 王涛 于海鸿 LIU Zhao-Geng;LI Zhan-Shan;WANG Li;WANG Tao;YU Hai-Hong(College of Software,Jilin University,Changchun 130012,China;College of Computer Science and Technology,Jilin University,Changchun 130012,China;College of Computer Science and Engineering,Changchun University of Technology,Changchun 130012,China;Key Laboratory of Symbolic Computation and Knowledge Engineering(Jilin University),Ministry of Education,Changchun 130012,China)
出处 《软件学报》 EI CSCD 北大核心 2020年第5期1511-1524,共14页 Journal of Software
基金 国家自然科学基金(61672261) 吉林省自然科学基金(20180101043JC) 吉林省发改委产业技术研究与开发专项资金(2019C053-9)。
关键词 enhanced feature selection using forest optimization algorithm(EFSFOA) 高维 特征选择 演化计算 enhanced feature selection using forest optimization algorithm(EFSFOA) high-dimensional feature selection evolutionary computation
作者简介 刘兆赓(1993-),山东沂水人,男,硕士生,主要研究领域为机器学习;李占山(1966-),男,博士,教授,博士生导师,CCF专业会员,主要研究领域为约束优化与约束求解,机器学习,基于模型的诊断,智能规划与调度;王丽(1994-),女,硕士生,主要研究领域为机器学习;王涛(1969-),女,副教授,主要研究领域为约束优化与约束求解,机器学习;通讯作者:于海鸿(1975-),男,博士,讲师,主要研究领域为约束优化与约束求解,大数据与数据挖掘,智能规划与调度,E-mail:yuhh@jlu.edu.cn。
  • 相关文献

参考文献1

二级参考文献1

共引文献19

同被引文献66

引证文献10

二级引证文献18

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部