Fastq格式的详细说明
看清楚,这里所说的是Fastq格式,不是Fasta格式,要了解Fasta格式,请看Fasta格式的详细说明。Fastq格式也是序列格式中常见的一种。下面简单介绍一下FASTQ格式,
A FASTQ file normally uses four lines per sequence. Line 1 begins with a ‘@’ character and is followed by a sequence identifier and an optional description (like a FASTA title line). Line 2 is the raw sequence letters. Line 3 begins with a ‘+’ character and is optionally followed by the same sequence identifier (and any description) again. Line 4 encodes the quality values for the sequence in Line 2, and must contain the same number of symbols as letters in the sequence.
FASTQ格式的序列一般都包含有四行,第一行由‘@’开始,后面跟着序列的描述信息,这点跟FASTA格式是一样的。第二行是序列。第三行由’+’开始,后面也可以跟着序列的描述信息。第四行是第二行序列的质量评价(quality values,注:应该是测序的质量评价),字符数跟第二行的序列是相等的。
FASTQ格式例子:
@SEQ_ID GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT + !''*((((***+))%%%++)(%%%%).1***-+*''))**55CCF>>>>>>CCCCCCC65
例如在NCBI看到的FASTQ格式如下:
@SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36 GGGTGATGGCCGCTGCCGATGGCGTCAAATCCCACC +SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36 IIIIIIIIIIIIIIIIIIIIIIIIIIIIII9IG9IC
至于序列的quality values值,是通过一些算法得出来的。具体也搞不明白,不多讲。另外FASTQ格式是不至一种的,不同的来源会有些差异,如Illumina 1.0 FASTQ 、 Sanger FASTQ等。都是比较特殊的情况。
FASTQ格式与Fasta格式、GenBank等格式的相互转换,看BioPerl指南 – 序列格式的转换
有点相关的文章
- 生物信息学:努力在数据的海洋里畅游 (0.500)
- 本地blast的详细用法 (0.500)
- 本地blast下载 (0.500)
- NCBI资源介绍及使用手册(不断更新,索引页面) (0.500)
- NCBI(美国国立生物技术信息中心)简介 (0.500)
- NCBI站点地图---Human Genome人类基因组数据介绍 (RANDOM - 0.500)
这里有篇关于fastq质量算法的介绍
http://jchoo1986.blog.sohu.com/147415763.html
We are a bunch of volunteers and opening a brand new scheme in our community. Your site offered us with valuable information to paintings on. You have performed a formidable task and our entire group will probably be grateful to you.
which volunteers group??
about the bioinformatics?