摘要
                
                    目的基于BERT预训练模型构建心血管疾病识别模型,探究自然语言处理技术在医学信息处理中的应用价值,为门诊智能分诊提供新的技术路径。方法采用双中心回顾性研究,采集皖南医学院第一、第二附属医院符合纳入标准的6200例患者的医疗数据。通过人工纠正填写错误、k-近邻算法填充缺失值、去除决定性诊断信息进行数据预处理。以BERT-base-Chinese预训练模型为基础,将文本、结构化数据按“标签+内容”拼接,完成编码处理后,基于分类任务需求对模型进行微调,并与轻量化生成式大语言模型的识别能力进行对比。结果微调后的BERT模型在心血管疾病识别任务中表现优异,准确率、精确率、召回率、F1值均为0.98,ROC曲线下面积趋近于1。轻量生成式大语言模型基于零样本学习的准确率仅为0.53。结论基于本地医疗数据微调的BERT模型能够准确识别心血管疾病,自然语言处理技术在心血管疾病识别中具有较大的应用潜力。
                
                Objective A cardiovascular disease identification model was developed using the BERT pretrained model,with the aim of investigating the application value of natural language processing technology in medical information processing and providing a new technical approach for intelligent outpatient triage.Methods A double-center retrospective study was conducted to collect medical data from 6,200 patients who met the inclusion criteria at the First and Second Affiliated Hospitals of Wannan Medical College.Data preprocessing involved manual correction of filling errors,KNN imputation for missing values,and removal of definitive diagnostic information.Based on the BERT-base-Chinese pretrained model,text and structured data were concatenated in the format of“label+content”for encoding.Following this,model parameters were fine-tuned according to the requirements of the classification task,and the recognition capability was compared with that of a lightweight generative large language model(LLM).Results The fine-tuned BERT model exhibited exceptional performance in cardiovascular disease recognition tasks,achieving an accuracy,precision,recall,and F1-score of 0.98 each,with the area under the ROC curve(AUC)approaching 1.By contrast,the lightweight generative LLM achieved an accuracy of only 0.53 based on zero-shot learning.Conclusion The BERT model fine-tuned with local medical data demonstrates efficient and accurate capabilities in identifying cardiovascular diseases,highlighting the significant application potential of natural language processing technology in cardiovascular disease recognition.
    
    
                作者
                    干伟鹏
                    王培培
                    张明超
                    葛涛
                    杨凌飞
                    叶明全
                GAN Weipeng;WANG Peipei;ZHANG Mingchao;GE Tao;YANG Lingfei;YE Mingquan(Department of Cardiovascular Medicine,the Second Affiliated Hospital of Wannan Medical College,Wuhu 241001,Anhui,China)
     
    
    
                出处
                
                    《中国卫生信息管理杂志》
                        
                        
                    
                        2025年第4期625-632,共8页
                    
                
                    Chinese Journal of Health Informatics and Management
     
            
                基金
                    教育部基金“健康医疗大数据驱动的心血管病风险评估与健康管理研究”(22YJAZH134)
                    安徽省高校哲学社会科学研究重点项目“基于急诊急救大数据的心血管疾病患者病情评估与预警研究”(2023AH051729)
                    芜湖市科技计划项目“评价数字化协同管理对冠心病患者康复疗效的影响:一项随机对照研究”(WHWJ2023y015)
                    皖南医学院中青年科研基金“数字化背景下冠心病院外协同管理模式研究”(WK2023ZQNS24)。
            
    
                关键词
                    自然语言处理
                    心血管疾病
                    分诊
                    预测模型
                
                        natural language processing
                        cardiovascular diseases
                        triage
                        predictive model
                
     
    
    
                作者简介
干伟鹏(1997-),男,硕士,住院医师,高级人工智能应用工程师,研究方向:心血管内科诊疗、智能医学,E-mail:ganweipeng@wnmc.edu.cn;通信作者:叶明全(1973-),男,博士,副院长,教授,研究方向:智能医学工程、数据挖掘与健康医疗,E-mail:ymq@wnmc.edu.cn。