用BioPerl解析blast的代码


用BioPerl解析blast的代码。相关的文章请看:Bioperl:使用和解析 BLAST

用法: perl blast.pl out.blast out.parser

out.blast是blast的结果文件。out.parser是解析后输出的结果保存文件。

如:另存为文件名blast.pl

#! /usr/bin/perl -w
use Bio::SearchIO;
my $cutoff = "0.001";
my $file = "$ARGV[0]";
my $in = new Bio::SearchIO(-format => 'blast',
-file => $file);
open (OUT,">$ARGV[1]");
while( my $r = $in->next_result ) {
  print OUT "Query is: ", $r->query_name, " ",
  $r->query_description," ",$r->query_length,"aa\n";
  print OUT "Matrix was ", $r->get_parameter('matrix'), "\n";
    while( my $h = $r->next_hit ) {
      last if $h->significance > $cutoff;
      print OUT "Hit is ", $h->name, "\n";
         while( my $hsp = $h->next_hsp ) {
            print OUT " HSP Len is ", $hsp->length('total'), " ",
             " E-value is ", $hsp->evalue, " Bit score ",
             $hsp->score, " \n",
             " Query loc: ",$hsp->query->start, " ",
             $hsp->query->end," ",
             " Sbject loc: ",$hsp->hit->start, " ",
             $hsp->hit->end,"\n";
          }
  }
}
close (OUT);

它打印的结果是像下面这个样子的(如果你熟悉 BLAST 就很容易理解每一项的意思):

Query is: BOSS_DROME Bride of sevenless protein
precursor. 896 aa
Matrix was BLOSUM62
Hit is F35H10.10
HSP Len is 315 E-value is 4.9e-11 Bit score 182
Query loc: 511 813 Sbject loc: 1006 1298
HSP Len is 28 E-value is 1.4e-09 Bit score 39
Query loc: 508 535 Sbject loc: 427 454


《 “用BioPerl解析blast的代码” 》 有 5 条评论

  1. 博主貌似生物信息专业人士
    能否帮忙写个脚本将两个多序列的FASTA文件互为database进行BLAST啊,多谢了
    继续关注中 [呲牙]

  2. 我就是想用NCBI的COG 对我的基因组的所有蛋白进行下分类,现在不知怎么搞,能否提供下解决方法啊