近年来,基于单分子测序技术的ISO-seq数据以其超长读段长度被越来越多地应用于转录组新型异构体预测研究,但目前大多数研究工作只用到全长读段数据,丢失了非全长读段数据中较多有用信息,因而数据没有得到充分利用。针对这一问题,本文在...近年来,基于单分子测序技术的ISO-seq数据以其超长读段长度被越来越多地应用于转录组新型异构体预测研究,但目前大多数研究工作只用到全长读段数据,丢失了非全长读段数据中较多有用信息,因而数据没有得到充分利用。针对这一问题,本文在保留非全长读段的基础上提出了两个能同时预测异构体结构和计算其表达比例的模型基于狄利克雷采样的异构体探测与预测(Dirichletsampling for isoform detection and prediction,DSIDP)和基于马尔科夫链的异构体探测与预测(Markovchain for isoform detection and predition,MCIDP)。两个模型均从全长读段中建立异构体预测集,并采用全长读段和非全长读段计算异构体表达比例。DSIDP将所有读段比对至异构体预测集,并使用Dirichlet采样解决多源映射问题,MCIDP使用马尔科夫链模拟基因外显子之间的选择性剪切,该模型还能预测出数据中没有全长读段的异构体。本文采用模拟数据和真实数据验证了两个模型的有效性。展开更多
Recently,Wang et al.systematically explored the transcription landscape in diploid cotton Gossypium arboreum.In the study,they integrated four high-throughput sequencing techniques,including Pacbio sequencing,strandsp...Recently,Wang et al.systematically explored the transcription landscape in diploid cotton Gossypium arboreum.In the study,they integrated four high-throughput sequencing techniques,including Pacbio sequencing,strandspecific RNA sequencing(ss RNA-seq),Cap analysis gene expression sequencing(CAGE-seq),and Poly A sequencing(Poly A-seq)to profile the RNA transcriptome of G.arboreum.They developed a pipeline,IGIA to construct accurate gene structure annotation based on the updated genome of G.arboreum and the multi-strategic RNA-seq data.Their study revealed some intriguing phenomena and potential novel mechanisms in the regulation of RNA transcription in plants,and also provided valuable resources for further functional genomic research in cotton.展开更多
文摘近年来,基于单分子测序技术的ISO-seq数据以其超长读段长度被越来越多地应用于转录组新型异构体预测研究,但目前大多数研究工作只用到全长读段数据,丢失了非全长读段数据中较多有用信息,因而数据没有得到充分利用。针对这一问题,本文在保留非全长读段的基础上提出了两个能同时预测异构体结构和计算其表达比例的模型基于狄利克雷采样的异构体探测与预测(Dirichletsampling for isoform detection and prediction,DSIDP)和基于马尔科夫链的异构体探测与预测(Markovchain for isoform detection and predition,MCIDP)。两个模型均从全长读段中建立异构体预测集,并采用全长读段和非全长读段计算异构体表达比例。DSIDP将所有读段比对至异构体预测集,并使用Dirichlet采样解决多源映射问题,MCIDP使用马尔科夫链模拟基因外显子之间的选择性剪切,该模型还能预测出数据中没有全长读段的异构体。本文采用模拟数据和真实数据验证了两个模型的有效性。
文摘Recently,Wang et al.systematically explored the transcription landscape in diploid cotton Gossypium arboreum.In the study,they integrated four high-throughput sequencing techniques,including Pacbio sequencing,strandspecific RNA sequencing(ss RNA-seq),Cap analysis gene expression sequencing(CAGE-seq),and Poly A sequencing(Poly A-seq)to profile the RNA transcriptome of G.arboreum.They developed a pipeline,IGIA to construct accurate gene structure annotation based on the updated genome of G.arboreum and the multi-strategic RNA-seq data.Their study revealed some intriguing phenomena and potential novel mechanisms in the regulation of RNA transcription in plants,and also provided valuable resources for further functional genomic research in cotton.