摘要
在高维小样本数据的特征选择中,样本的变化会导致最终选出的特征呈现不稳定的特点。针对这种情况,提出了一种新的特征选择算法。首先通过计算特征间的互信息以确定特征关联性的强弱,依据将特征间的关联性强弱将特征分为不同的组,待分组完成后,使用粒子群算法对特征进行选择,并对少部分粒子进行随机扰动,避免粒子陷入局部最优。选择完成后的特征进行集成,得到最终特征子集。结合5个公开数据集进行实验,该算法的特征维度平均降低77.5%。与现有的方法比较,得到的结果在稳定性方面平均提高了4.0%。
In the feature selection of high-dimensional data,the change of the sample will cause the final selected feature to exhibit unstable characteristics.In view of this situation,a new high-dimensional small sample feature selection algorithm is proposed.Firstly,the mutual information between the features is calculated to determine the strength of the feature relevance,and the features are divided into different groups according to the strength of the relevance between the features.A small number of particles are randomly perturbed to avoid particles falling into local optimum.The completed features are selected and integrated into the final feature subset.Combined with five open datasets,the feature dimension of this algorithn is reduced by 77.5%on average.Compared with the existing methods,the stability is improved by 4.0%on average.
作者
余肖生
江川
陈鹏
YU Xiaosheng;JIANG Chuan;CHEN Peng(College of Computer and Information,Three Gorges University,Yichang 443002)
出处
《计算机与数字工程》
2022年第5期1047-1052,共6页
Computer & Digital Engineering
基金
国家重点研究发展计划项目“城镇安全风险评估与应急保障技术研究”(编号:2016YFC0802500)资助。
关键词
特征选择
稳定性
PSO算法
特征分组
feature selection
stability
PSO algorithm
feature grouping
作者简介
余肖生,男,博士,副教授,研究方向:健康医疗大数据分析;江川,男,硕士研究生,研究方向:大数据分析技术;陈鹏,男,博士,教授,研究方向:大数据分析技术。