摘要
开发基于癌症患者基因组信息预测有效治疗策略的计算模型是精准医学中的关键挑战.近年来,国际多个组织机构公开了针对数百种细胞系的多层次的基因组表征数据.将这类组学数据与体外肿瘤细胞系的药物细胞敏感性相结合,研究人员可以剖析癌症治疗药物的分子机制,并将其转化为精准医学所需的个性化诊疗策略.基于大数据的人工智能算法在基因组学与药物响应之间建立了新的桥梁,推进了肿瘤细胞中药物敏感性的预测算法的发展.本文首先对公开的基因组表征数据集进行了总结,随后介绍了基因组表征数据和包括机器学习算法、网络算法和多模态神经网络算法在内的人工智能算法在癌细胞的药物敏感性预测中的应用案例.基于网络的预测方法和多模态深度学习方法有利于实现多组学数据的系统性的整合和应用,能克服传统的机器学习方法在药物响应预测中的局限性,是今后药物敏感性研究的发展方向.
The development of computational methods for the prediction of effective therapeutic strategies based on the genomic information of patients is the main challenge of precision medicine.Since the 21 st century,next-generation sequencing(NGS)has opened up new possibilities for personalized medicine.Extensive characterization at the molecular level for hundreds of cancer cell lines has been brought to the public eye by many organizations and agencies around the world.For example,the National Cancer Institute 60 Human Cancer Cell Line Screen(NCI-60),Cancer Cell Line Encyclopedia(CCLE)and Genomics of Drug Sensitivity in Cancer(GDSC)have provided large-scale omics data such as genomic,transcriptomic and epigenomic data characterizing cancer cell lines,and The Cancer Genome Atlas(TCGA)has molecularly characterized over 20000 primary cancers of patients.Combined with the drug response data of cancer cell lines,multiomics data could be used to analyse the mechanisms of action of anticancer drugs,which could be incorporated into precision medicine strategies.Over several decades,artificial intelligence(AI)technologies based on big data have revolutionized bioinformatics.AI has built a bridge between genomics and drug sensitivity by promoting the development of predictive models for the drug response of cancer cell lines.The 2012 NCI-DREAM drug prediction challenge has been particularly influential,as the innovative applications of machine learning that emerged from it have laid the groundwork for future studies.However,classic machine learning models are still challenging in terms of predictability because they limit the systematic integration of high-dimensional multiomics data.Therefore,network-based approaches,including link prediction and network representation,have become mainstream methods for drug response prediction.On the one hand,network-based approaches have not faced the"small n,large p"problem since the multiomics features are either represented in a gene/protein network or embedded in similarity networks between cell lines.On the other hand,the introduction of gene regulatory networks(GRNs)and protein-protein interactions(PPIs)into the predictive model can provide a functional background for the integration of genomic data and thereby improve the predictive performance of drug response.In addition to network-based approaches,multimodal deep learning models can systematically integrate multiomic data by considering them as different modalities.Generally,there are three feature fusion methods in deep neural networks:Inputlevel feature fusion(early fusion),intermediate feature fusion and decision-level fusion(late fusion).Intermediate feature fusion is predominant in drug response prediction studies,by which features are learned separately for each type of omics data and then integrated into one unified representation to be used as the input for a classifier or a regressor.Moreover,the features of drug structures can be used as a model to improve the performance.In brief,we summarize the characteristics of publicly accessible genomic databases and discuss the trends of artificial intelligence applications in drug sensitivity prediction for cancer cell lines,including machine learning,networks and multimodal deep neural networks.
作者
李叙潼
吴小龙
万晓喆
钟飞盛
崔晨
陈颖佳
陈立凡
陈凯先
蒋华良
郑明月
Xutong Li;Xiaolong Wu;Xiaozhe Wan;Feisheng Zhong;Chen Cui;Yingjia Chen;Lifan Chen;Kaixian Chen;Hualiang Jiang;Mingyue Zheng(Shanghai Institute of Materia Medica,Chinese Academy of Sciences,Shanghai 201203,China;School of Pharmacy,University of Chinese Academy of Sciences,Beijing 100049,China;School of Pharmacy,East China University of Science and Technology,Shanghai 200237,China)
出处
《科学通报》
EI
CAS
CSCD
北大核心
2020年第32期3551-3561,共11页
Chinese Science Bulletin
基金
国家自然科学基金(81773634)
国家科技重大专项(2018ZX09711002)
中国科学院战略性先导科技专项(XDA12050201)资助。
关键词
药物敏感性
机器学习
网络
多模态
drug sensitivity
machine learning
network
multimodal
作者简介
联系人:蒋华良,E-mail:hljiang@simm.ac.cn;联系人:郑明月,E-mail:myzheng@simm.ac.cn。