摘要
为改善高维数据的降维结果,提高数据低维表示的判别能力,通过对类内和类间距离的研究,提出基于类内和类间距离的主成分分析(IOPCA)数据降维算法。计算属性信息熵,对比信息熵阈值,进行数据矩阵特征筛选;采用综合类间距离最大化和类内距离最小化思想,改进PCA算法进行数据降维;将降维后的数据通过KNN、SVM算法分类。对比PCA、E-PCA、LDA算法,仿真结果表明,该算法在改善降维结果的同时,有效提高了降维后低维数据的判别性能。
To improve the dimension reduction result of high-dimensional data and the discrimination ability of low-dimensional representation of data,by studying the intra-class and inter-class distance,a principal component analysis data dimension reduction algorithm based on intra-class and inter-class distance was proposed.The entropy of attribute information was calculated,which was compared with the threshold value of information entropy,and the feature screening of data matrix was realized.The improved algorithm was used to reduce the dimension of data.The data after dimension reduction was classified using KNN and SVM algorithms.Compared with the PCA,E-PCA and LDA algorithms,the simulation results show that the proposed algorithm can not only improve the result of dimension reduction,but also effectively improve the discrimination performance of low-dimensional data after dimension reduction.
作者
张素智
陈小妮
杨芮
李鹏辉
蔡强
ZHANG Su-zhi;CHEN Xiao-ni;YANG Rui;LI Peng-hui;CAI Qiang(School of Computer and Communication Engineering,Zhengzhou University of Light Industry,Zhengzhou 450002,China;Beijing Key Laboratory of Big Data Technology for Food Safety,Beijng Technology and Business University,Beijing 100048,China)
出处
《计算机工程与设计》
北大核心
2020年第8期2177-2183,共7页
Computer Engineering and Design
基金
北京市重点实验室开放课题基金项目(BKBD-2017KF08)
国家自然科学基金项目(61802353)。
关键词
信息熵
类内距离
类间距离
主成分分析
数据降维
information entropy
intra-class distance
inter-class distance
principal component analysis
data dimension reduction
作者简介
张素智(1965-),男,河南焦作人,博士,教授,CCF会员,研究方向为Web数据库、分布式计算和异构系统集成;陈小妮(1993-),女,河北邯郸人,硕士研究生,CCF学生会员,研究方向为大数据挖掘与分析;杨芮(1994-),女,河南驻马店人,硕士研究生,CCF学生会员,研究方向为大数据分析与挖掘;李鹏辉(1993-),男,河南三门峡人,硕士研究生,研究方向为大数据挖掘;蔡强(1969-),男,重庆人,博士,教授,研究方向为计算机图形学、科学可视化、智能信息处理、食品安全信息技术。E-mail:zhsuzhi@zzuli.edu.cn。