摘要
目的探索机器学习算法及衍生算法在医学数据集上的分类效果,以期更好的发现计算机在辅助医学诊断方面的应用价值。方法以皮马印第安人糖尿病数据集为例,利用WEKA平台构建机器学习模型,包括基于贝叶斯定理的NavieBayes、基于集成学习的Bagging、基于树思想的J48等模型,共六大类21种算法,运用多维度多指标对所建立模型的预测效果进行评价。结果RMSE和RRSE均较小的前5位算法依次为Logistic、LMT、RotationForest、RandomForest和Bagging;LMT、SMO、Logistic、NavieBayes、RotationForest的分类正确率均超过了76%,其真阳性率均在76%以上,ROC曲线显示,除SMO外,其余算法曲线下面积均在0.82以上。结论在该糖尿病数据集上的分类预测效果较好的算法有6种,分别是LMT、SMO、Logistic、NavieBayes、RotationForest和Bagging,均具有较高的正确率和预测价值。
Objective To explore the classification effect of machine learning algorithms and derivative algorithms on medical data sets,in order to better discover the application value of computers in assisted medical diagnosis.Methods Taking the Pima Indians diabetes dataset as an example,use the WEKA platform to build a machine learning model,it includes models such as NavieBayes based on Bayes'theorem,Bagging based on ensemble learning,and J48 based on tree ideas.There are 21 algorithms in six categories.Using multiple dimensions and multiple indicators to evaluate the prediction effect of the established model.Results The top 5 algorithms with smaller RMSE and RRSE were Logistic,LMT,RotationForest,RandomForest,and Bagging;The classification accuracy rates of LMT,SMO,Logistic,NavieBayes,and RotationForest all exceed 76%,and their true positive rates were all above 76%.The ROC curve showed that,except for SMO,the area under the other algorithm curves was above 0.82.Conclusion There are 6 algorithms with better classification prediction effect on this diabetes dataset,namely LMT,SMO,Logistic,NavieBayes,RotationForest and Bagging,all of which have high accuracy and predictive value.
作者
张颖
窦一峰
ZHANG Ying;DOU Yi-feng(Department of Urology,People's Hospital of Baodi District,Tianjin 301800,China;Network Information Center,People's Hospital of Baodi District,Tianjin 301800,China)
出处
《医学信息》
2021年第6期32-35,共4页
Journal of Medical Information
关键词
医学数据
算法
糖尿病
Medical data
Algorithm
Diabetes
作者简介
张颖(1992.10-),女,天津人,本科,护师,主要从事医院护理工作;通讯作者:窦一峰(1992.8-),男,天津人,硕士,初级工程师,主要从事医学统计与数据挖掘工作。