摘要
目的利用TCGA和GEO数据库中的基因表达数据和临床信息,挖掘结肠癌预后相关的基因,并构建和评估结肠癌预后模型。方法从GEO数据库中下载结肠癌相关的基因表达矩阵,包括GSE44076、GSE28000和GSE39582,从TCGA数据库中下载结肠癌相关的mRNA表达数据矩阵和临床信息,通过NCBI数据库中在线分析软件GEO2R对三个GEO数据集进行差异基因分析,利用R包limma对TCGA数据集进行差异基因分析,获取共同的差异表达基因。通过单因子回归、LASSO回归和多因子回归分析构建结肠癌相关的预后模型,进一步结合临床特征构建列线图模型,综合评估预后模型的性能。结果成功构建结肠癌相关的预后模型,构建的预后模型ROC曲线下面积在3年时为0.628,4年时为0.678,5年时为0.730;Wilcoxon检验显示,较高的风险评分与较高的T分期(P=0.049)、N分期(P=0.0015)、M分期(P=0.003)和病理分期(P=0.0019)相关;结合预后风险评分模型、年龄、性别和病理分期等级构建了列线图,模型的C-index从0.63增加至0.74。结论本次构建的结肠癌预后模型在评估结肠癌患者复发风险分层、肿瘤分期等方面具有潜在意义。
Objective To explore the prognostic genes of colon cancer by using gene expression data and clinical information in TCGA and GEO databases,and to construct and evaluate the prognostic model of colon cancer.Methods The gene expression matrix related to colon cancer was downloaded from the GEO database,including GSE44076,GSE28000 and GSE39582.The mRNA expression data matrix and clinical information related to colon cancer were downloaded from the TCGA database.The differential gene analysis of the three GEO data sets was carried out through the online analysis software GEO2R in the NCBI database.The differential gene analysis of the TCGA data set was carried out R package limma to obtain the common differential expression genes.Prognostic models related to colon cancer were constructed through single factor regression,LASSO regression and multi-factor regression analysis,and the line chart model was further constructed combined with clinical characteristics to comprehensively evaluate the performance of the prognosis model.Results The colon cancer-related prognostic model was successfully constructed.The area under the ROC curve of the prognostic model was 0.628 at 3 years,0.678 at 4 years and 0.730 at 5 years.Wilcoxon test showed that higher risk scores were correlated with higher T staging(P=0.049),N staging(P=0.0015),M staging(P=0.003)and pathological staging(P=0.0019).Combined with the prognostic risk score model,age,gender and pathological staging level,a line chart was constructed,and the C-index of the model increased from 0.63 to 0.74.Conclusion The constructed colon cancer prognosis model has potential significance in evaluating the recurrence risk stratification and tumor staging of colon cancer patients.
作者
操利超
巴颖
卢晓萍
张核子
CAO Li-chao;BA Ying;LU Xiao-ping;ZHANG He-zi(Shenzhen Nucleus Gene Technology Co.,Ltd.,Shenzhen 518071,Guangdong,China)
出处
《医学信息》
2021年第24期27-32,共6页
Journal of Medical Information
基金
深圳市可持续发展专项(编号:深科技创新〔2020〕180号,专2019N002)。
关键词
结肠癌
预后模型
生物信息学
病理分期
复发
Colorectal cancer
Prognostic model
Bioinformatics
Pathological staging
Recurrence
作者简介
操利超(1984.4-),男,湖北黄冈人,硕士,工程师,主要从事基于多组学测序数据和临床指标的结直肠癌预测诊断和预后模型的研究;通讯作者:张核子(1972.4-),男,湖南永州人,硕士,工程师,主要从事肿瘤早筛技术和ctDNA精准医疗应用的研究。