摘要
目的建立基于优化的随机森林算法模型实现对食品不合格指标的分类预测。方法通过收集山东省食品药品监督管理局2015—2019年食品安全抽样检验产生的不合格数据,并对其进行多项数据预处理操作,采用超参数网格搜索和10折交叉验证方法建立基于随机森林的食品不合格指标的分类预测模型,并通过对传统随机森林模型的参数优化,将其与决策树(decision tree, DT)、逻辑回归(logistic regression, LR)和梯度提升决策树(gradient boosting decision tree, GBDT)算法分类预测结果进行对比。结果实验表明经过参数优化后的随机森林模型对食品中不合格指标的预测准确率能够达到89.4%,比DT算法提高了11.0%,比LR算法提高了9.0%,比GBDT算法提高了8.1%。结论基于优化的随机森林模型可以完成食品不合格指标分类预测任务,有广阔的应用前景。
Objective To establish a random forest algorithm model based on optimization and realize the classification and prediction of food unqualified indexes. Methods Through the collection of unqualified data generated by the food safety sampling inspection from 2015 to 2019 issued by the official website of Shandong Food and Drug Administration, and a number of data preprocessing operations, the hyper parameter grid search and 10-folds cross-validation method were used to establish a classification prediction model based on random forest-based food unqualified indicators. In addition, the parameters of the traditional random forest model was optimized, and compared with algorithm classification prediction results of decision tree(DT), logistic regression(LR) and gradient boosting decision tree(GBDT). Results Experiments showed that the random forest model after parameter optimization could achieve 89.4% prediction accuracy of unqualified indicators in food, which was 11.0% higher than the DT algorithm, 9.0% higher than the LR algorithm, and 8.1% higher than the GBDT algorithm. Conclusion The optimized random forest model can complete the classification and prediction task of food unqualified indicators, and has broad application prospects.
作者
刘玉航
曲媛
蒋嘉铭
宗万里
朱习军
LIU Yu-Hang;QU Yuan;JIANG Jia-Ming;ZONG Wan-Li;ZHU Xi-Jun(College of Information Science and Technology,Qingdao University of Science and Technology,Qingdao 266061,China;Weihai Food and Drug Inspection and Testing Center,Weihai 264210,China)
出处
《食品安全质量检测学报》
CAS
北大核心
2021年第18期7467-7472,共6页
Journal of Food Safety and Quality
基金
山东省产教融合研究生联合培养示范基地项目(2020-19)。
关键词
食品安全数据
决策树
随机森林
参数优化
超参数网格搜索
food safety data
decision tree
random forest
parameter optimization
hyper parametric grid search
作者简介
通信作者:曲媛,硕士研究生,主要研究方向为智慧医疗。E-mail:2595989958@qq.com;刘玉航,硕士研究生,主要研究方向为数据分析。E-mail:564275986@qq.com。