期刊文献+

特征选择稳定性研究综述 被引量:38

Survey on Stability of Feature Selection
在线阅读 下载PDF
导出
摘要 随着大数据的发展和机器学习的广泛应用,各行业的数据量呈现大规模的增长,高维性是这些数据的重要特点,采用特征选择对高维数据进行降维是一种预处理方法.特征选择稳定性是其中重要的研究内容,它是指特征选择方法对训练样本的微小扰动具有一定鲁棒性.提高特征选择稳定性有助于发现相关特征,增强特征可信度,进一步降低开销.在回顾现有特征选择稳定性提升方法的基础上对其进行分类,分析比较各类方法的特点和适用范围,总结特征选择稳定性中的相关评估工作,并通过实验剖析其中稳定性度量指标的性能,进而对比4种集成方法的效用.最后讨论当前工作的局限性,指出未来的研究方向. With the development of big data and the wide application of machine learning,data from all walks of life is growing massively. High dimensionality is one of its most important characteristics,and applying feature selection to reduce dimensions is one of the preprocessing methods of high dimensional data. Stability of feature selection is an important research direction,and it stands for the robustness of results with respect to small changes in the dataset composition. Improving the stability of feature selection can help to identify relevant features,increase experts' confidence to the results,and further reduce the complexity and costs of getting original data. This paper reviews current methods for improving the stability,and presents a classification of those methods with analysis and comparison on the characteristics and range of application of each category. Then it summarizes the evaluations of stability of feature selection,and analyzes the performance of stability measurement and validates the effectiveness of four ensemble approaches through experiments. Finally,it discusses the localization of current works and a perspective of the future work in this research area.
作者 刘艺 曹建军 刁兴春 周星 LIU Yi;CAO Jian-Jun;DIAO Xing-Chun;ZHOU Xing(College of Command Information Systems,PLA University of Science and Technology,Nanjing 210007,China;The 63rd Institute,National University of Defense Technology,Nanjing 210007,China)
出处 《软件学报》 EI CSCD 北大核心 2018年第9期2559-2579,共21页 Journal of Software
基金 国家自然科学基金(61371196) 中国博士后科学基金(201003797)~~
关键词 高维数据 特征选择 稳定性 稳定性指标 集成选择 演化算法 high dimensional data feature selection stability stability measures ensemble selection evolutionary algorithms
作者简介 刘艺(1990-),男,安徽蚌埠人,博士生,主要研究领域为数据治理,演化算法.;曹建军(1975-),男,博士,副研究员,CCF高级会员,主要研究领域为数据治理,演化算法.通讯作者:曹建军,E-mail:jianjuncao@yeah.net;刁兴春(1964-),男,研究员,博士生导师主要研究领域为数据工程.;周星(1988-),男,博士,工程师,主要研究领域为数据挖掘,数据工程.
  • 相关文献

参考文献1

二级参考文献22

  • 1李霞,张田文,郭政.一种基于递归分类树的集成特征基因选择方法[J].计算机学报,2004,27(5):675-682. 被引量:26
  • 2李颖新,刘全金,阮晓钢.一种肿瘤基因表达数据的知识提取方法[J].电子学报,2004,32(9):1479-1482. 被引量:13
  • 3邹涛,张翠,田新广,张尔扬.概念级误用检测系统的认知能力研究[J].电子学报,2004,32(10):1694-1697. 被引量:1
  • 4边肇祺.模式识别[M].北京:清华大学出版社,1987..
  • 5Liu H, Sun J, Liu L, et al. Feature selection with dynamic mutual information[ J ]. Pattern Recognition, 2009,42 ( 7 ) : 1330 - 1339.
  • 6Zhang Daoqiang, Chen Songcan, Zhou Zhi-Hua. Constraint score.A new filter method for feature selection with pair- wise constraints[ J ]. Pattern Recognition, 2008,41 ( 5 ) : 1440 - 1451.
  • 7Guyon I, Weston J, Barnhil S, et al. Gene selection for cancer classification using support vector machines [ J]. Machine learning, 2002,46 ( 1 - 3 ) : 389 - 422.
  • 8Kennedy J, Eberhart R C. Particle swarm optimization[ A]. Proceedings of International Conference on Neutral Net- works IV[ C ]. Piscataway NJ : IEEE Service Center, 1995. 1942 - 1948.
  • 9Kennedy J,Eberhart RC. A discrete binary version of theparticle swarm algorithm[ A]. Proceedings of IEEE Inter- national Conference on Systems, Man, and Cybernetics [C]. Washington: 1EEE, 1997. 4104 - 4109.
  • 10Lin SW, Ying KC, Chen SC, et al. Particle swarm optimi- zation for parameter determination and feature selection of support vector machines [ J ]. Expert Systems with Appli- cations,2008,35(4) : 1817 - 1824.

共引文献8

同被引文献283

引证文献38

二级引证文献128

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部