生物信息学:努力在数据的海洋里畅游

Bioinformatics–Trying to Swim in a Sea of Data

    我们处在一个激动人心的时代——基因组时代。科学的进步已使人类可以窥探生命的秘密,甚至包括人类自身。人类基因组在世纪之交被人类自己破译了。这部由30亿个字符组成的人类遗传密码本已活生生地摆在了我们面前。于此同时,来自其它生物的基因组信息源源不断从自动测序仪中涌出,堆集如山,浩如烟海。这些海量的生物信息是用特殊的“遗传语言”——DNA的四个碱基字符(A、T、G和C)和蛋白质的20个氨基酸字符(A、R、N、D、C、Q、E、G、H、I、L、K、M、F、P、S、T、W、Y和V)——写成。
    《科学》(Science)在2001年2月16日人类基因组专刊上配发了一篇题为“生物信息学:努力在数据的海洋里畅游”(Roos DS.Bioinformatics—Trying to swim in a sea of data.Science,2001,291:1260-1261)的文章。文章写道:“我们身处急速上涨的数据海洋中…,我们如何避免生物信息的没顶之灾呢?”一叶轻舟也许可以救命!生物信息学便是我们找到的这样一条“轻舟”,而且我们已在这条轻舟上安装了诸如卫星定位系统等先进的电子设备。也许在不久的将来,人类会造就一艘永不沉没的航空母艇……生物信息学是一门年青的学科,学科虽然年青,但它充满挑战、机遇且引人入胜。

Science 16 February 2001:
Vol. 291. no. 5507, pp. 1260 – 1261
DOI: 10.1126/science.291.5507.1260

COMPUTATIONAL BIOLOGY:
Bioinformatics–Trying to Swim in a Sea of Data

David S. Roos*

Advances in many areas of genomics research are heavily rooted in engineering technology, from the capillary electrophoresis units used in large-scale DNA sequencing projects, to the photolithography and robotics technology used in chip manufacture, to the confocal imaging systems used to read those chips, to the beam and detector technology driving high-throughput mass spectroscopy. Further advances in (for example) materials science and nanotechnology promise to improve the sensitivity and cost of these technologies greatly in the near future. Genomic research makes it possible to look at biological phenomena on a scale not previously possible: all genes in a genome, all transcripts in a cell, all metabolic processes in a tissue.  

One feature that all of these approaches share is the production of massive quantities of data. GenBank, for example, now accommodates >1010 nucleotides of nucleic acid sequence data and continues to more than double in size every year. New technologies for assaying gene expression patterns, protein structure, protein-protein interactions, etc., will provide even more data. How to handle these data, make sense of them, and render them accessible to biologists working on a wide variety of problems is the challenge facing bioinformatics–an emerging field that seeks to integrate computer science with applications derived from molecular biology. We are swimming in a rapidly rising sea of data…how do we keep from drowning?  

……
……
……
  
The “postgenomic era” holds phenomenal promise for identifying the mechanistic bases of organismal development, metabolic processes, and disease, and we can confidently predict that bioinformatics research will have a dramatic impact on improving our understanding of such diverse areas as the regulation of gene expression, protein structure determination, comparative evolution, and drug discovery. The availability of virtually complete data sets also makes negative data informative: by mapping entire pathways, for example, it becomes interesting to ask not only what is present, but also what is absent. As the potential of genomics-scale studies becomes more fully appreciated, it is likely that genomics research will increasingly come to be viewed as indistinguishable from biology itself. But such research is only possible if data remain available not only for examination, but also to build upon. It is hard to swim in a sea of data while bound and gagged!

———————————————————    
The author is at the Department of Biology and Genomics Institute, University of Pennsylvania, Philadelphia, PA 19104, USA. E-mail: droos@sas.upenn.edu

http://www.sciencemag.org/cgi/content/full/291/5507/1260#affiliation

9 回复

评论已关闭。