摘要
特征选择(Feature Selection,FS)是一种有效的数据预处理方法,它可以通过选择高维数据中一组具有高相关性和低冗余性的特征,从而解决数据冗余引起的维数灾难.目前许多计算方法已经被应用于求解FS问题,其中基于教与学优化(Teaching and Learning-based Optimization Algorithm,TLBO)的特征选择模型由于其高效的全局搜索能力受到越来越多学者的关注.然而,随着数据规模的不断扩大,这些算法所具有的模型不稳定、模型精确度低和局部搜索能力差等局限性,使算法的研究逐步陷入困境.为解决上述问题,本文提出了融合教与学优化算法与局部搜索方法(Local Search,LS)的混合进化Wrapper算法模型(Teaching and Learning-based Optimization-Local Search Algorithm,TLBOLS).首先,由于传统的教与学优化算法不能直接用于求解特征选择问题,算法在初始化阶段将实数型编码转为二进制编码,然后为保证种群的多样性,在教阶段引入最差个体重启机制,并针对进化班级过程中学习者与教学者两种身份采用不同值的TF值,提出二进制的教与学特征选择算法(Binary Teaching and Learning-based Optimization-Local Search Algorithm,BTLBOLS).随后,提出结合多操作的局部搜索方法和变邻域搜索逐渐增强扰动力度,提高整个种群的个体质量.为优化特征选择结果,BTLBOLS利用综合评价指标作为目标函数指导整体进化过程.实验选取45个高维癌症基因表达数据集进行测试并与十种特征选择算法相比,实验结果表明,相比其他算法,BTLBOLS在分类准确率和特征个数上都具有一定优势,算法分类性能有效提高.
Feature selection(FS)is an effective data pre-processing method that solves the dimensionality disaster caused by data redundancy by selecting a set of features with high relevance and low redundancy in high-dimensional data.Many computational methods have been applied to solve the FS problem,among which the teaching and learning-based optimization algorithm(TLBO)feature selection model has received increasing attention from scholars due to its efficient global search capability.However,with the increasing size of data,the limitations of these algorithms,such as model insta⁃bility,low model accuracy and poor local search ability,have gradually put the research of the algorithms into difficulties.To address these problems,this paper proposes a hybrid evolutionary Wrapper algorithm model(Teaching and Learning-Based Optimization-Local Search algorithm,TLBOLS)that integrates teaching-learning optimization algorithms with local search methods.Firstly,the algorithm converts the real-type coding to binary coding in the initialization phase,then intro⁃duces the worst individual restart mechanism in the teaching phase,and proposes a binary teaching-learning feature selec⁃tion algorithm for the evolutionary class process using different values of TF values for the two identities of learners and pedagogues(Binary Teaching and Learning-Based Optimization-Local Search algorithm,BTLBOLS).Subsequently,a lo⁃cal search method combining multiple operations and variable neighborhood search is proposed to gradually enhance the perturbation strength and improve the individual quality of the whole population.To optimize the feature selection results,BTLBOLS utilizes a comprehensive evaluation metric as an objective function to guide the overall evolutionary process.Forty-five high-dimensional cancer gene expression datasets are selected for testing and compared with ten feature selection algorithms,and the experimental results show that compared to other algorithms,the BTLBOLS has certain advantages in terms of classification accuracy and number of features,which effectively improves the algorithm classification perfor⁃mance.
作者
高慧敏
王云鹤
卞闯
李向涛
GAO Hui-min;WANG Yun-he;BIAN Chuang;LI Xiang-tao(School of Artificial Intelligence,Jilin University,Changchun,Jilin 130000,China;School of Artificial Intelligence,Hebei University of Technology,Tianjin 300401,China)
出处
《电子学报》
EI
CAS
CSCD
北大核心
2023年第6期1619-1636,共18页
Acta Electronica Sinica
基金
国家自然科学基金(No.62076109)。
关键词
教与学优化算法
局部搜索
新型Wrapper混合特征选择算法
特征选择
分类
基因表达数据
teaching and learning-based optimization algorithm
local search
new Wrapper hybrid feature selection algorithm
feature selection
classification
gene expression data
作者简介
高慧敏,女,1997年3月生,山西大同人.吉林大学人工智能学院硕士研究生.主要研究方向为进化计算和特征选择;王云鹤,女,1991年4月生,河北沧州人.现为河北工业大学人工智能与数据科学学院讲师.主要研究方向为智能计算和机器学习;卞闯,男,1996年7月生,吉林长春人.吉林大学人工智能学院硕士研究生.主要研究方向为生物信息学和特征选择;通讯作者:李向涛,男,1987年4月生,江苏淮安人.现为吉林大学人工智能学院教授.主要研究方向为智能计算、进化数据挖掘、约束优化、多目标优化及其应用.E-mail:lixt314@jlu.edu.cn。