Objective: To develop novel strategies to identify relevant molecular signatures for complex human diseases based on data of identical-by-decent profiles and genomic context.Methods: In the proposed strategies, we def...Objective: To develop novel strategies to identify relevant molecular signatures for complex human diseases based on data of identical-by-decent profiles and genomic context.Methods: In the proposed strategies, we define four relevancy criteria for mapping SNP-phenotype relationships-point-wise IBD mean difference, averaged IBD difference for window, Z curve and averaged slope for window.Results: Application of these criteria and permutation test to 100 simulated replicates for two hypothetical American populations to extract the relevant SNPs for alcoholism based on sib-pair IBD profiles of pedigrees demonstrates that the proposed strategies have successfully identified most of the simulated true loci.Conclusion: The data mining practice implies that IBD statistic and genomic context could be used as the informatics for locating the underlying genes for complex human diseases. Compared with the classical Haseman-Elston sib-pair regression method, the proposed strategies are more efficient for large-scale genomic mining.展开更多
Objective: To extract the relevant SNPs for alcoholism using sib-pair IBD profiles of pedigrees.Methods: We used the ensemble decision approach, a supervised learning approach based on decision forests, to locate alco...Objective: To extract the relevant SNPs for alcoholism using sib-pair IBD profiles of pedigrees.Methods: We used the ensemble decision approach, a supervised learning approach based on decision forests, to locate alcoholism relevant SNPs using genome-wide SNP data. Results: Application to a publicly available large dataset of 100 simulated replicates for three American populations (http://www.gaworkshop.org/) demonstrates that the proposed approach has successfully located all of the simulated true loci.Conclusion: The numerical results establish the proposed decision forest analysis to be a powerful and practical alternative for large-scale family-based association study.展开更多
基因芯片技术为疾病异质性研究提供了有力的工具。当前基于传统聚类分析的方法一般利用芯片上大量基因作为特征来发现疾病的亚型,因此它们没有考虑到特征中包含的大量无关基因会掩盖有意义的疾病样本的分割。为了避免这个缺点,提出了基...基因芯片技术为疾病异质性研究提供了有力的工具。当前基于传统聚类分析的方法一般利用芯片上大量基因作为特征来发现疾病的亚型,因此它们没有考虑到特征中包含的大量无关基因会掩盖有意义的疾病样本的分割。为了避免这个缺点,提出了基于耦合双向聚类的异质性分析方法(Heterogeneous Analysis Based on Coupled Two-WayClustering,HCTWC)来搜索有意义的基因簇以便发现样本的内在分割。该方法被应用于弥漫性大B细胞淋巴瘤(diffuselargeB-celllymphomaDLBCL)芯片数据集,通过识别的基因簇作为特征对DLBCL样本聚类发现生存期分别为55%和25%的两类DLBCL亚型(P<0.05),因此,HCTWC方法在解决疾病异质性是有效的。展开更多
文摘Objective: To develop novel strategies to identify relevant molecular signatures for complex human diseases based on data of identical-by-decent profiles and genomic context.Methods: In the proposed strategies, we define four relevancy criteria for mapping SNP-phenotype relationships-point-wise IBD mean difference, averaged IBD difference for window, Z curve and averaged slope for window.Results: Application of these criteria and permutation test to 100 simulated replicates for two hypothetical American populations to extract the relevant SNPs for alcoholism based on sib-pair IBD profiles of pedigrees demonstrates that the proposed strategies have successfully identified most of the simulated true loci.Conclusion: The data mining practice implies that IBD statistic and genomic context could be used as the informatics for locating the underlying genes for complex human diseases. Compared with the classical Haseman-Elston sib-pair regression method, the proposed strategies are more efficient for large-scale genomic mining.
文摘Objective: To extract the relevant SNPs for alcoholism using sib-pair IBD profiles of pedigrees.Methods: We used the ensemble decision approach, a supervised learning approach based on decision forests, to locate alcoholism relevant SNPs using genome-wide SNP data. Results: Application to a publicly available large dataset of 100 simulated replicates for three American populations (http://www.gaworkshop.org/) demonstrates that the proposed approach has successfully located all of the simulated true loci.Conclusion: The numerical results establish the proposed decision forest analysis to be a powerful and practical alternative for large-scale family-based association study.
文摘基因芯片技术为疾病异质性研究提供了有力的工具。当前基于传统聚类分析的方法一般利用芯片上大量基因作为特征来发现疾病的亚型,因此它们没有考虑到特征中包含的大量无关基因会掩盖有意义的疾病样本的分割。为了避免这个缺点,提出了基于耦合双向聚类的异质性分析方法(Heterogeneous Analysis Based on Coupled Two-WayClustering,HCTWC)来搜索有意义的基因簇以便发现样本的内在分割。该方法被应用于弥漫性大B细胞淋巴瘤(diffuselargeB-celllymphomaDLBCL)芯片数据集,通过识别的基因簇作为特征对DLBCL样本聚类发现生存期分别为55%和25%的两类DLBCL亚型(P<0.05),因此,HCTWC方法在解决疾病异质性是有效的。