用BioPerl解析blast的代码
用BioPerl解析blast的代码。相关的文章请看:Bioperl:使用和解析 BLAST
用法: perl blast.pl out.blast out.parser
out.blast是blast的结果文件。out.parser是解析后输出的结果保存文件。
如:另存为文件名blast.pl
#! /usr/bin/perl -w
use Bio::SearchIO;
my $cutoff = "0.001";
my $file = "$ARGV[0]";
my $in = new Bio::SearchIO(-format => 'blast',
-file => $file);
open (OUT,">$ARGV[1]");
while( my $r = $in->next_result ) {
print OUT "Query is: ", $r->query_name, " ",
$r->query_description," ",$r->query_length,"aa\n";
print OUT "Matrix was ", $r->get_parameter('matrix'), "\n";
while( my $h = $r->next_hit ) {
last if $h->significance > $cutoff;
print OUT "Hit is ", $h->name, "\n";
while( my $hsp = $h->next_hsp ) {
print OUT " HSP Len is ", $hsp->length('total'), " ",
" E-value is ", $hsp->evalue, " Bit score ",
$hsp->score, " \n",
" Query loc: ",$hsp->query->start, " ",
$hsp->query->end," ",
" Sbject loc: ",$hsp->hit->start, " ",
$hsp->hit->end,"\n";
}
}
}
close (OUT);
它打印的结果是像下面这个样子的(如果你熟悉 BLAST 就很容易理解每一项的意思):
Query is: BOSS_DROME Bride of sevenless protein
precursor. 896 aa
Matrix was BLOSUM62
Hit is F35H10.10
HSP Len is 315 E-value is 4.9e-11 Bit score 182
Query loc: 511 813 Sbject loc: 1006 1298
HSP Len is 28 E-value is 1.4e-09 Bit score 39
Query loc: 508 535 Sbject loc: 427 454
有点相关的文章
- Bioperl:使用和解析 BLAST (1.000)
- Bioperl:解析blast结果 (1.000)
- Bioperl:对序列数据进行操作 (0.860)
- Bioperl:Genbank Seq parsing script (0.860)
- Bioperl:把Genbank格式的序列转换为基因结构图 (0.860)
- Blast2.2.21版本的一个BUG (RANDOM - 0.141)
THANKS VERY MUCH!
我仍然只是路过一下
博主貌似生物信息专业人士
能否帮忙写个脚本将两个多序列的FASTA文件互为database进行BLAST啊,多谢了
继续关注中 [呲牙]
这不是很简单嘛。那需要写脚本啊。 [大兵]
我就是想用NCBI的COG 对我的基因组的所有蛋白进行下分类,现在不知怎么搞,能否提供下解决方法啊