摘要
为提高大数据集粗分类识别率,提出一种基于聚类分析的SVM-Kd-tree树型粗分类方法。首先根据数据集特征分布进行k-means两簇聚类,对聚类后的数据集进行类别分析,同时将属于两簇的同一类别样本划分出来;然后使用两簇中剩余样本训练SVM二分类器并作为树型结构根节点,将两簇数据分别合并,将划分出来的样本作为左右子孩子迭代构建子节点,直到满足终止条件后,叶子节点开始训练Kd-tree。实验结果表明,迭代构建树型粗分类方法使训练单一SVM平均时间减少了61.977 4%,比Kd-tree同近邻数量的准确率提高了0.03%。在进行大规模数据集粗分类时,使用聚类分析迭代构建组合分类器时间更短、准确率更高。
In order to improve the rough classification accuracy of large data sets,a SVM-Kd-tree tree classification method based on cluster analysis is proposed.Firstly,cluster the training data set by K-means according to the feature distribution into two clusters,and the samples of the same category belonging to the two clusters are leaved out.Then remaining samples in the two clusters are used to train SVM as the root node of the tree structure.The two clusters of data combined with the leaved out samples separately construct the left and right child nodes.This process is iteratively constructed until meet the termination condition,and using the samples of leaf node to train Kd-tree.The experimental results show that the iterative construction of the tree-based rough classification method reduces the average time for training a single SVM by 61.9774%,which is 0.03%higher than the accuracy of the same neighbors of Kd-tree.In the large-scale data set for rough classification,using the cluster analysis iteratively construct ensemble classifiers has shorter time and higher accuracy.
作者
胡素黎
黄丰喜
刘晓英
HU Su-li;HUANG Feng-xi;LIU Xiao-ying(Beijing Xitui Technology Co.,Ltd.,Beijing 100026,China)
出处
《软件导刊》
2020年第4期111-114,共4页
Software Guide
基金
青海省科技厅科技成果转化专项项目(2017-SF-160)。
作者简介
通讯作者:胡素黎(1988-),女,硕士,北京细推科技有限公司工程师,研究方向为模式识别;黄丰喜(1984-),男,硕士,北京细推科技有限公司工程师,研究方向为计算数学与机器学;刘晓英(1967-),男,博士,北京细推科技有限公司工程师,研究方向为生物识别。