摘要
在多标记学习中,数据降维是一项重要且具有挑战性的任务,而特征选择又是一种高效的数据降维技术。在邻域粗糙集理论的基础上提出一种多标记专属特征选择方法,该方法从理论上确保了所得到的专属特征与相应标记具有较强的相关性,进而改善了约简效果。首先,该方法运用粗糙集理论的约简算法来减少冗余属性,在保持分类能力不变的情况下获得标记的专属特征;然后,在邻域精确度和邻域粗糙度概念的基础上,重新定义了基于邻域粗糙集的依赖度与重要度的计算方法,探讨了该模型的相关性质;最后,构建了一种基于邻域粗糙集的多标记专属特征选择模型,实现了多标记分类任务的特征选择算法。在多个公开的数据集上进行仿真实验,结果表明了该算法是有效的。
Dimensionality reduction of data is a significant and challenging task under multi-label learning,and feature selection is a valid technology to reduce the dimension of vector.In this paper,a multi-label-specific feature selection method based on neighborhood rough set theory was proposed.This method ensures theoretically that there exists a strong correlation between the obtained label-specific features and the corresponding labels,and then reduction efficiency can be improved well.Firstly,a reduction algorithm of rough set theory is applied to reduce redundant attributes,and the label-specific features are obtained while keeping the classification ability unchanged.Then,the concepts of neighborhood accuracy and neighborhood roughness are introduced,the calculation approaches to dependence and attribute significance based on neighborhood rough set are redefined,and the related properties of this model are discussed.Finally,a multi-label-specific feature selection model based on neighborhood rough set is presented,and the corresponding feature selection algorithm for multi-label classification task is designed.The experimental results under some public datasets demonstrate the effectiveness of the proposed multi-label-specific feature selection method.
出处
《计算机科学》
CSCD
北大核心
2018年第1期173-178,共6页
Computer Science
基金
国家自然科学基金项目(61772176
61402153
61370169
61602158)
中国博士后科学(2016M602247)
河南省科技攻关项目(162102210261)
新乡市科技攻关计划项目(CXGG17002)
河南师范大学博士科研启动费支持课题(qd15132)资助
关键词
多标记学习
邻域粗糙集
专属特征
特征选择
Multi-label learning
Neighborhood rough set
Label-specific feature
Feature selection
作者简介
孙林(1979-),男,博士,副教授,CCF会员,主要研究方向为粒计算、数据挖掘、生物信息学等,E-mail:sunlin@htu.edu.cn(通信作者);潘俊方(1994-)女,硕士生,主要研究方向为多标记学习、数据挖掘等;;张霄雨(1993-),女,硕士生,主要研究方向为粒计算;;王伟(1975-),男,博士,讲师,主要研究方向为生物信息学;;徐久成(1964-),男,博士,教授,CCF高级会员,主要研究方向为粒计算、数据挖掘、生物信息学等.