摘要
针对古漆器漆膜数据类间不平衡、样本规模小,以及传统机器学习算法分类效果较差的问题,提出一种改进SMOTE的过采样方法改变漆器漆膜数据样本分布,使其达到平衡。该方法通过比较各类样本间的欧式距离,删除了人工样本中的噪声数据,然后运用集成学习中的随机森林算法进行分类,提高了少数类的分类准确率。UCI数据集上的实验结果表明,改进的过采样方法性能更优,评价指标F1-score与AUC值分别得到2%、5%以上的提升。结合改进的过采样方法与机器学习算法进行对比实验,结果证明,随机森林算法精度更高,在对古漆器年代的判别中,随机森林算法的F1-score与AUC值高达87.76%、89.34%。
In order to solve the problems of the imbalance of data categories in the lacquer film on ancient lacquerware,the small sam⁃ple size,and the poor classification effect of traditional machine learning algorithms,an oversampling method to improve SMOTE is proposed to change the sample distribution of lacquer film data to keep the balance.This method removes the noise data in the artificial samples by comparing the Euclidean distance between the samples of each category,and then uses the random forest algorithm in en⁃semble learning to make classification,which improves the classification accuracy of the minority class.The experimental results on the UCI data set show that the improved oversampling method has better performance,and the evaluation indexes f1-score and AUC are increased by more than 2%and 5%respectively.Combined with the improved oversampling method and the machine learning algo⁃rithm for comparative experiments,the experimental results prove that the random forest algorithm has higher accuracy,and the F1-score and AUC values are as high as 87.76%and 89.34%in the age determination of ancient lacquerware.
作者
张岚斌
徐国庆
李澜
ZHANG Lan-bin;XU Guo-qing;LI Lan(School of Computer Science&Engineering,Wuhan Institute of Technology,Wuhan 430205,China;Hubei Provincial Museum,Wuhan 430077,China)
出处
《软件导刊》
2021年第1期84-88,共5页
Software Guide
基金
湖北省自然科学基金项目(2014CFB786)
武汉工程大学第十届研究生教育创新基金项目(CX2018210)。
关键词
古漆器漆膜
过采样
集成学习
随机森林
lacquer film on ancient lacquerware
oversampling
ensemble learning
random forest
作者简介
张岚斌(1994-),男,武汉工程大学计算机科学与工程学院硕士研究生,研究方向为机器学习;通讯作者:徐国庆(1974-),男,武汉工程大学计算机科学与工程学院副教授、硕士生导师,研究方向为物联网、图像处理。