摘要
基因是遗传的物质基础。生物体的生、长、病、老、死等一切生命现象都与基因有关。基因测序是解读生命的一种途径。随着新一代高通量测序技术的发展,每天会产生TB甚至更多的序列数据。合理诠释这些大规模及复杂高维度的数据成为获取数据后一个更大的难点,是当前生物研究的关键步骤,具有巨大的现实意义。海量高通量测序数据的存储、处理和分析都极大地挑战着当前的计算机系统和计算模式。本文将结合调研情况,尤其是华大基因的实例调研,讨论当前高通量测序数据分析的现状、问题和多方采取的措施。然而,面对高通量测序数据带来的挑战,仍需要多方密切合作和长久深入的研究。
Gene is the genetic material basis.All life phenomena,like disease and death,are related to Gene.Gene sequencing is a way to read life.With the development of new generation high-throughput sequencing technology,TB or more sequence data will be generated daily.It's more difficult to interpret these big and complex data than to acquire them.Sequence data interpretation is a critical step in current biological research and has great practical significance.It's a great challenge for current computer systems and computing models to store,process and analysis massive high throughput sequence data.With survey,especially from BGI(Beijing Genome Institute),the current status,problems and measures taken to process high throughput sequence data will be discussed.However,the challenge is too big to be solved unless more people in different fields work together in depth for a long term.
出处
《集成技术》
2012年第3期20-24,共5页
Journal of Integration Technology
关键词
基因组
高通量测序
数据分析
云计算
工作流
genome
high throughput sequencing
data analysis
cloud computing
work flow