《科学》（Science）在2001年2月16日人类基因组专刊上配发了一篇题为“ ：努力在数据的海洋里畅游”(Roos DS．Bioinformatics—Trying to swim in a sea of data．Science，2001，291：1260-1261)的文章。文章写道：“我们身处急速上涨的数据海洋中…，我们如何避免生物信息的没顶之灾呢？”一叶轻舟也许可以救命！ 便是我们找到的这样一条“轻舟”，而且我们已在这条轻舟上安装了诸如卫星定位系统等先进的电子设备。也许在不久的将来，人类会造就一艘永不沉没的航空母艇…… 是一门年青的学科，学科虽然年青，但它充满挑战、机遇且引人入胜。
Science 16 February 2001:
Vol. 291. no. 5507, pp. 1260 – 1261
Bioinformatics–Trying to Swim in a Sea of Data
David S. Roos*
Advances in many areas of genomics research are heavily rooted in engineering technology, from the capillary electrophoresis units used in large-scale DNA sequencing projects, to the photolithography and robotics technology used in chip manufacture, to the confocal imaging systems used to read those chips, to the beam and detector technology driving high-throughput mass spectroscopy. Further advances in (for example) materials science and nanotechnology promise to improve the sensitivity and cost of these technologies greatly in the near future. Genomic research makes it possible to look at biological phenomena on a scale not previously possible: all genes in a genome, all transcripts in a cell, all metabolic processes in a tissue.
One feature that all of these approaches share is the production of massive quantities of data. GenBank, for example, now accommodates >1010 nucleotides of nucleic acid sequence data and continues to more than double in size every year. New technologies for assaying gene expression patterns, protein structure, protein-protein interactions, etc., will provide even more data. How to handle these data, make sense of them, and render them accessible to biologists working on a wide variety of problems is the challenge facing bioinformatics–an emerging field that seeks to integrate computer science with applications derived from molecular biology. We are swimming in a rapidly rising sea of data…how do we keep from drowning?
The “postgenomic era” holds phenomenal promise for identifying the mechanistic bases of organismal development, metabolic processes, and disease, and we can confidently predict that bioinformatics research will have a dramatic impact on improving our understanding of such diverse areas as the regulation of gene expression, protein structure determination, comparative evolution, and drug discovery. The availability of virtually complete data sets also makes negative data informative: by mapping entire pathways, for example, it becomes interesting to ask not only what is present, but also what is absent. As the potential of genomics-scale studies becomes more fully appreciated, it is likely that genomics research will increasingly come to be viewed as indistinguishable from biology itself. But such research is only possible if data remain available not only for examination, but also to build upon. It is hard to swim in a sea of data while bound and gagged!
The author is at the Department of Biology and Genomics Institute, University of Pennsylvania, Philadelphia, PA 19104, USA. E-mail: firstname.lastname@example.org