摘要
目的评估第四代生成式预训练变换(CPT-4)模型在基于循证医学的影像学检查临床适用性评价(EB-MICA)中的应用价值。方法基于已发表的概念综述和《耳鸣患者影像学检查临床适用性评价共识(2022年版)》,共设计44个问题,包括EB-MICA相关通用型问题,以及耳鸣影像学合理检查推荐、不同影像学检查方法价值评判等耳科影像学检查相关专业问题。利用CPT-4模型输入问题并记录相应生成答案,影像专业人员从整体质量、准确性、专业性、语言流畅性四个方面评估生成答案。结果CPT-4模型生成答案的整体质量得分率为58.2%(128/220),准确性和专业性得分率分别为57.3%(126/220)58.6%(129/220),语言流畅性得分率为100.0%(220/220)。在EB-MICA相关通用型问题回答方面,生成答案的整体质量、准确性、专业性得分率均为93.3%(14/15)。然而,在耳科影像学检查相关专业问题回答方面效果欠佳,合理检查推荐和不同影像学检查价值评判相关生成答案的整体质量得分率仅分别为50.0%(10/20)56.2%(104/185),关键在于模型不能明确不同影像学检查选择的优先级,通常难以作出影像学合理检查的推荐,未能准确评判不同检查的临床应用价值。结论GPT-4模型的生成答案条理性很强,语言流畅性方面表现卓越,在通用型知识领域的生成答案具有积极的参考意义,但在耳科影像学检查相关专业领域的应用存在明显局限性。
Objective This study aims to assess the applicability of the fourth-generation Generative Pre-trained Transformer(GPT-4)model in the Evidence-based Medical Imaging Clinical Appropriateness(EB-MICA).MethodsLeveraging published conceptual reviews and the"Consensus on Clinical Applicability Assessment of Imaging Examinations in Tinnitus Patients(2022 Edition)",a set of 44 questions were crafted.These encompassed(1)general queries related to EB-MICA,(2)recommendations for rational imaging examinations for tinnitus,and(3)assessments of the value of different imaging examination methods in otology.The GPT-4 model was employed to generate responses to these questions,which were subsequently evaluated by imaging professionals based on their overall quality,accuracy,professionalism,and language fluency.Results The GPT-4 model-produced answers achieved an overall quality score rate of 58.2%(128/220),with corresponding accuracy and professionalism score rates of 57.3%(126/220)and 58.6%(129/220),and a language fluency score rate of 100.0%(220/220).Regarding general EB-MICA related queries,the overall quality,accuracy,and professionalism score rates of the generated responses all reached 93.3%(14/15).However,when addressing professional questions related to otology imaging examinations,the model demonstrated subpar performance,with overall quality scores amounting to merely 50.0%(10/20)and 56.2%(104/185)for rational examination recommendations and evaluation of the value of different imaging examinations,respectively The core challenge lies in the model's inability to clearly prioritize different imaging examination choices and its difficulty in providing sound imaging examination recommendations,resulting in an inaccurate assessment of the clinical application value of different examinations.Conclusion The answers generated by the GPT-4 model showcase strong logical coherence and excellent language fluency,offering valuable insights in the domain of general knowledge.However,the model's application in the specialized field of otology imaging examinations exhibits noticeable limitations.
作者
王郅翔
李佳
任鹏玲
蔡林坤
王星皓
孙婧
王振常
吕晗
Wang Zhixiang;Li Jia;Ren Pengling;Cai Linkun;Wang Xinghao;Sun Jing;Wang Zhenchang;Lyu Han(Department of Ultrasound,Beijing Friendship Hospital,Capital Medical University,Bejing 100050,China;Accurate and Intelligent Imaging Laboratory,Beijing Institute of Clinical Medicine,Medical Imaging Center,Beijing Friendship Hospital,Capital Medical University,Bejing 100050,China;School of Biological and Medical Engineering,Beihang University,Beijing 100191,China;Key Laboratory of Knowledge Mining and Service for Medical Journals,Beijing 100052,China)
出处
《数字医学与健康》
2024年第1期31-36,共6页
DIGITAL MEDICINE AND HEALTH
基金
国家自然科学基金(62171297,61931013)
北京市医院管理中心重点医学专业发展计划(ZYLX202101)
北京市卫生健康委员会-北京市重大疫情防治重点专科项目(京卫医[2021]135号)。
关键词
GPT-4模型
自然语言处理
耳鸣
影像医学
循证医学
GPT-4 Model
Natural Language Processing
Tinnitus
Medicial Imaging
Evidence-based Medicine
作者简介
通信作者:吕晗,Email:chrislvhan@126.com。