摘要
目的基于LASSO-Cox模型探索交叉验证(cross validation)、pcvl法(penalized cross-validated log-likelihood)、EBIC准则(extended bayesian information criterion)、平稳选择(stability selection)四种方法在控制FDR(false discovery rate)方面的表现及其变量选择效果。方法通过模拟研究评价各方法在不同删失比例、自变量间不同相关程度以及回归系数的不同稀疏水平下的FDR和PSR(positive select rate),并从GEO上下载DLBCL数据进行基因与预后间的关联分析。结果模拟结果表明,在不同删失比例、自变量相关程度和稀疏水平的情况下,平稳选择法控制FDR的能力都优于其他方法且其变量选择效能也较高。EBIC准则在相关程度低、自变量较稀疏时表现较好,当样本量较小时结果较保守。pcvl法虽然不容易漏掉有效应的变量,但其FDR仍较高。实例结果显示,EBIC准则只选出1个基因,平稳选择法选出的基因中大部分有统计学意义且与其他方法的结果重合度高。结论在基于LASSO-Cox模型的高维数据生存分析中平稳选择法能较好地控制FDR且其变量选择效能也较高。
Objective To explore the performance of CV method ( cross validation), pcvl method (penalized cross-val- idated log-likelihood), EBIC criterion (extended bayesian information criterion) and stability selection approach in the aspect of controlling FDR ( false discovery rate) and the effect of variable selection based on LASSO-Cox model. Methods Based on the simulation study, we evaluate the influence of the censoring proportion of survival data, the different linear correlations between covariates and the different sparse scenarios on the FDR and positive select rate (PSR) of each method respectively. We used a data set from GEO to identify prognostic genes in the real data analysis. Results The simulation results show that in the case of different censoring proportion,the correlation coefficients and the sparse scenarios,the stability selection's ability to control the FDR is better and more stable than other methods, simultaneously, and its power is relatively high. The EBIC performs well when the correlation coefficients and the sparse scenarios are low, however, the EBIC performs conservative when the sample size is less. Although the pcvl method is not easy to miss important variables, but the FDR is still relatively high. The real data analysis results show that only one gene was identified by the EBIC. Most of the genes identified by the stability selection method were statistically significant and were highly consistent with the results of other methods. Conclusion Based on LASSO-Cox model, the stability selection' s ability to control the FDR is better, and its power is relatively high in the survival analysis of high-di- mensional data.
作者
许树红
董晓强
陶然
高雪
高倩
虞明星
王彤
Xu Shuhong;Dong Xiaoqiang;Tao Ran(Department of Health Statistics, School of Public Health, Shanxi Medical Universi- ty (030001), Taiyuan)
出处
《中国卫生统计》
CSCD
北大核心
2018年第3期322-329,共8页
Chinese Journal of Health Statistics
基金
国家自然科学基金项目(81473073)
作者简介
通信作者:王彤,E-mail:tongwang@sxmu.edu.cn