摘要
Objective:In order to find a genetic marker to predict the prognosis of patients with ovarian cancer based on multi-omics data. Methods:We download RNA-Seq SNP, CNV data and clinical follow-up information from TCGA database and randomly divide them into training set and test set. GSE17260 dataset in GEO is taken as an external validation set. Prognosis-related genes, copy number difference genes and mutant genes are screened in the training set. After the integration of genes, the random forest algorithm is further used for feature selection, ultimately obtaining a robust biomarker. On this basis, a gene-related prognostic model is established and verified in the test set and verification set. Results:We have obtained 2097 prognostic related genes, 447 copy amplification genes, 1069 copy deletion genes and 654 significant mutations genes. Through the feature selection of random forest algorithm, five feature genes (PSMB1, COL6A6, SLC22A2, KLHL23 and CD3G) are obtained by integrating these genes, some of which have been reported to be related to tumor progress. Furthermore, the prognostic risk assessment model of 5-gene signature is established by Cox regression analysis. The model can evaluate the risk of patient samples in training set, test set and external verification set. 5-gene signature shows strong robustness and clinical independence. The results of GSEA analysis also show that the pathway of 5-gene signature enrichment is significantly related to the pathway and biological process of the occurrence and development of ovarian cancer. Conclusion:In this study, 5-gene signature is constructed as a new prognostic marker to predict the survival of patients with ovarian cancer.
目的:根据多组细胞数据,寻找一个基因标记来预测卵巢癌患者的预后。方法:从TCGA 数据库下载RNA-Seq SNP、CNV 数据和临床随访信息,并将其随机分为训练组和测试组。GEO 中的GSE17260 数据集作为外部验证集,预后相关基因、拷贝数差异基因和突变基因在训练集中进行筛选。基因整合后,进一步采用随机森林的特征选择算法,最终获得可靠的生物标志物。在此基础上,在试验集和验证集中建立与基因相关的预后模型并验证模型。结果:获得2097 个预后相关基因、447 个复印扩增基因、1069 个拷贝缺失基因和654 个显著突变基因。基于随机森林的特征选择算法,通过基因整合获得5 个特征基因(PSMB1、COL6A6、SLC22A2、KLHL23 和CD3G),其中部分基因与肿瘤进展有关。此外,通过Cox 回归分析建立了5 基因标记物的预后风险评估模型。该模型可以评估训练组、测试组和外部验证集中的患者的风险。5 基因标记物具有较强鲁棒性和临床独立性。GSEA 分析的结果还表明,5 基因物标记富集途径与卵巢癌发生发展的途径和生物过程有显著关系。结论:本研究构建了5 基因物标记物作为预测卵巢癌患者生存的新预后指标。
作者简介
Correspondence to: Gang Cao, School of Pharmacy, Zhejiang Chinese Medical University, Hangzhou, Zhejiang, China. E-mail:caogang33@163.com.