Fastq格式的详细说明

Posted on 22 七月 2009 by 柳城 ,阅读 1,060

看清楚,这里所说的是Fastq格式,不是Fasta格式,要了解Fasta格式,请看Fasta格式的详细说明Fastq格式也是序列格式中常见的一种。下面简单介绍一下FASTQ格式,

A FASTQ file normally uses four lines per sequence. Line 1 begins with a '@' character and is followed by a sequence identifier and an optional description (like a FASTA title line). Line 2 is the raw sequence letters. Line 3 begins with a '+' character and is optionally followed by the same sequence identifier (and any description) again. Line 4 encodes the quality values for the sequence in Line 2, and must contain the same number of symbols as letters in the sequence.

FASTQ格式的序列一般都包含有四行,第一行由'@'开始,后面跟着序列的描述信息,这点跟FASTA格式是一样的。第二行是序列。第三行由'+'开始,后面也可以跟着序列的描述信息。第四行是第二行序列的质量评价(quality values,注:应该是测序的质量评价),字符数跟第二行的序列是相等的。

FASTQ格式例子:

@SEQ_ID
GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT
+
!''*((((***+))%%%++)(%%%%).1***-+*''))**55CCF>>>>>>CCCCCCC65

例如在NCBI看到的FASTQ格式如下:

@SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
GGGTGATGGCCGCTGCCGATGGCGTCAAATCCCACC
+SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
IIIIIIIIIIIIIIIIIIIIIIIIIIIIII9IG9IC

至于序列的quality values值,是通过一些算法得出来的。具体也搞不明白,不多讲。另外FASTQ格式是不至一种的,不同的来源会有些差异,如Illumina 1.0 FASTQ 、 Sanger FASTQ等。都是比较特殊的情况。

FASTQ格式与Fasta格式、GenBank等格式的相互转换,看BioPerl指南 – 序列格式的转换

转载请注明 : 来源于 Fastq格式的详细说明 | 柳城

赞助商

2条评论 于 “Fastq格式的详细说明”

  1. gainover gainover Says:

    这里有篇关于fastq质量算法的介绍
    http://jchoo1986.blog.sohu.com/147415763.html

    [回复]

  2. Max Green Max Green Says:

    We are a bunch of volunteers and opening a brand new scheme in our community. Your site offered us with valuable information to paintings on. You have performed a formidable task and our entire group will probably be grateful to you.

    [回复]

Leave a Reply

广告招租

[强] [握手] [可爱] [ok] [呲牙] :) [偷笑] [流泪] [疑问] [亲亲] [擦汗] [得意] [衰] [可怜] [抱拳] [坏笑] more »

无觅相关文章插件,快速提升流量