摘要
                
                    癌症是严重的全球健康问题,是导致人类死亡的重要原因之一。传统的癌症治疗往往有损害重要器官功能的风险。抗癌肽(anticancer peptides,ACPs)因其体积小、特异性高和毒性低被认为是对抗人类常见癌症最有前景的治疗剂之一。但抗癌肽识别高度局限于实验室,成本昂贵且耗时,为了解决这一问题,本研究提出了一种基于机器学习和蛋白质语言模型的抗癌肽预测模型pLM4ACP。该模型采用ProtT5蛋白质语言模型对抗癌肽序列进行特征提取,将提取的特征输入到支持向量机(support vector machine,SVM)分类算法中,并进行优化和性能评估。独立测试结果表明,该模型的准确性(accuracy,ACC)、F1分数(F1-score)、马修斯相关系数(Matthews correlation coefficient,MCC)和曲线下面积(area under the curve,AUC)分别为0.763、0.767、0.527和0.827,优于现有模型。本研究基于蛋白质语言模型构建抗癌肽高效预测模型,可推动人工智能技术在生物医学领域的应用,促进精准医疗和计算生物学的发展。
                
                Cancer is a serious global health problem and a major cause of human death.Conventional cancer treatments often run the risk of impairing vital organ functions.Anticancer peptides(ACPs)are considered to be one of the most promising therapeutic agents against common human cancers due to their small sizes,high specificity,and low toxicity.Since ACP recognition is highly limited to the laboratory,expensive,and time-consuming,we proposed pLM4ACP,a model for predicting ACPs based on machine learning and protein language models.In this model,the protein language model ProtT5 was used to extract the features of ACPs,and the extracted features were input into the support vector machine(SVM)classification algorithm for optimization and performance evaluation.The model showcased significantly higher accuracy than other methods,with the overall accuracy of 0.763,F1-score of 0.767,Matthews correlation coefficient of 0.527,and area under the curve of 0.827 on the independent test set.This study constructs an efficient anticancer peptide prediction model based on protein language models,further advancing the application of artificial intelligence in the biomedical field and promoting the development of precision medicine and computational biology.
    
    
                作者
                    刘奕彤
                    陈文欣
                    李娟娟
                    迟雪
                    马香
                    唐燕琼
                    李宏
                LIU Yitong;CHEN Wenxin;LI Juanjuan;CHI Xue;MA Xiang;TANG Yanqiong;LI Hong(School of Life and Health Sciences,Hainan University,Haikou 570228,Hainan,China)
     
    
    
                出处
                
                    《生物工程学报》
                        
                                北大核心
                        
                    
                        2025年第8期3252-3261,共10页
                    
                
                    Chinese Journal of Biotechnology
     
            
                基金
                    国家自然科学基金(32460244)
                    海南省自然科学基金(322RC589)。
            
    
                关键词
                    抗癌肽
                    预测模型
                    蛋白质语言模型
                    机器学习
                    生物信息学
                
                        anticancer peptides
                        prediction model
                        protein language models
                        machine learning
                        bioinformatics
                
     
    
    
                作者简介
Corresponding author:李宏,E-mail:lihongbio@hainanu.edu.cn。