期刊文献+

基于集成学习的乳腺癌生存预测研究 被引量:1

Breast cancer survival prediction based on ensemble learning
在线阅读 下载PDF
导出
摘要 为对乳腺癌5年生存状态进行预测并分析其影响因素,首先,选取SEER数据库中2004—2010年乳腺癌相关数据,对选取的特征进行数据预处理;其次,在数据层面上,对数据进行SMOTE上采样以解决数据类别不平衡问题;在算法层面上,比较LightGBM、CatBoost和GBDT这3个模型在预测乳腺癌5年生存状态上的优劣;最后,根据重要性对乳腺癌5年生存状态的影响因素进行排序,并通过SHAP值对影响因素进行解释分析。本文构建的乳腺癌5年生存状态预测模型比单一模型具有更好的性能,其准确率、AUC、召回率、精确度和F_(1)值分别为0.9060、0.8443、0.9837、0.9160和0.9487;发现乳腺癌5年生存状态与肿瘤大小、检出的淋巴结总数、淋巴结转移数、雌激素受体、孕激素受体、年龄等因素有较大关系。本预测模型选择出的重要性特征与目前的临床结果保持一致,能为临床预后预测提供一定的技术支持。 The research is conducted to predict the 5-year survival status of breast cancer and analyze the influence factors.Firstly,the breast cancer related data from 2004—2010 were selected from the SEER database,and the selected featured data were preprocessed.Secondly,in terms of data,SMOTE algorithm was used to oversample the data to solve the imbalance of data categories;in terms of algorithm,the advantagess and disadvantages of lightgbm,catboost and gbc in predicting the 5-year survival status of breast cancer were compared.Finally,the influencing factors of breast cancer 5-year survival status were analyzed by SHAP value after ranking.The 5-year survival prediction model of breast cancer constructed in this paper has better performance than a single model.The accuracy rate,AUC,recall rate,precision rate and F_(1)-score are 0.9060,0.8443,0.9837,0.9160 and 0.9487 respectively;and it shows that the 5-year survival status of breast cancer is closely related to tumor size,examined lymph nodes,positive lymph nodes,ER status,PR status,and age.The model can provide prognosis prediction for the clinic with its excellent performance and the selected important features consistent with the current clinical results.
作者 张继婕 覃庆洪 刘雪萍 王康权 魏薇 ZHANG Jijie;QIN Qinghong;LIU Xueping;WANG Kangquan;WEI Wei(College of Science,Guangxi University of Science and Technology,Liuzhou 545006,China;Affiliated Cancer Hospital,Guangxi Medical University,Nanning 530021,China;Medical School,Guangxi University of Science and Technology,Liuzhou 545005,China)
出处 《广西科技大学学报》 2022年第1期101-109,共9页 Journal of Guangxi University of Science and Technology
基金 广西自然科学基金项目(2019GXNSFAA245067)资助。
关键词 SEER数据库 乳腺癌 集成学习 预后预测 SEER database breast cancer ensemble learning prognosis prediction
作者简介 张继婕,在读硕士研究生;通信作者:刘雪萍,博士,副研究员,研究方向:药理学,E-mail:100000774@gxust.edu.cn。
  • 相关文献

参考文献14

二级参考文献63

  • 1杨玲.国际与国内肿瘤登记概况[J].中国肿瘤,2005,14(12):772-775. 被引量:24
  • 2后锐,张毕西.基于MLP神经网络的区域物流需求预测方法及其应用[J].系统工程理论与实践,2005,25(12):43-47. 被引量:87
  • 3高俊,姚成,章俊.人工神经网络用于近红外光谱预测汽油辛烷值[J].分析科学学报,2006,22(1):71-73. 被引量:16
  • 4叶定伟,李长岭.前列腺癌发病趋势的回顾和展望[J].中国癌症杂志,2007,17(3):177-180. 被引量:119
  • 5Delen D, Walker GKadam A, et al. Predicting breast cancer survivability: a comparison of three data mining methods [ J ]. Artificial Intelligence in Medicine,2005,34(2) : 113- 127.
  • 6NCI.5-year Relative Survival Rates [ DB/OL] http://seer.cancer. gov, 2007/2008-5-20.
  • 7Abdelghani B, Erhan G. Predicting breast cancer survivability using data mining techniques [ A ]. In: Chandrika K, Michael B, eds. Proceedings of the 6th SIAM Int'l Conf. on Scientific Data Mining [C]. Maryland: SLAM,2006.1 - 4.
  • 8Fawcett T. An introduction to ROC analysis [ J ]. Pattern Recognition Letters, 2006,27 : 861 - 874.
  • 9Cohen M, Hilario H. Learning from imbalanced data in surveillance of nosocomial infection [ J]. Artificial Intelligence in Medicine, 2006,37(1) :7 - 18.
  • 10韩家炜.数据挖掘概念与技术[M].(第二版).中国:机械工业出版社,2007.237-239.

共引文献119

同被引文献9

引证文献1

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部