DDBJ/EMBL/GenBank Accession的命名规则


The format for GenBank Accession numbers are:

GenBank Accession numbers命名的规则是:

Nucleotide: 1 letter + 5 numerals OR 2 letters + 6 numerals 1个字母+5个数字 或 2个字母+6位数字 
Protein: 3 letters + 5 numerals 3个字母+5位数字
WGS: 4 letters + 2 numerals for WGS assembly version + 6-8 numerals 4个字母+2位数字+WGS的版本+6-8位数字
MGA: 5 letters + 7 numerals 5个字母+7位数字

Accession号前缀在各个数据库的分布:

Nucleotide Accession Prefixes (核酸序列的前缀)

Prefix Database Type  
BA,DF,DG DDBJ CON division  
AN EMBL CON division  
CH,CM,DS,EM, EN,EP,EQ,FA, GG,GL NCBI CON division  
C,AT,AU,AV,BB, BJ,BP,BW,BY,CI, CJ,DA,DB,DC, DK,FS DDBJ EST  
F EMBL EST  
H,N,T,R,W,AA,AI, AW,BE,BF,BG, BI,BM,BQ,BU, CA,CB,CD,CF, CK,CN,CO,CV, CX,DN,DR,DT, DV,DY,EB,EC, EE,EG,EH,EL, ES,EV,EW,EX, EY,FC,FD,FE, FF,FG,FK,FL, GD,GE,GH,GO GenBank EST  
D,AB DDBJ Direct submissions  
V,X,Y,Z,AJ,AM, FM EMBL Direct submissions  
U,AF,AY,DQ,EF, EU,FJ,GQ GenBank Direct submissions  
AP DDBJ Genome project data  
BS DDBJ Chimpanzee genome data  
AL,BX,CR,CT, CU EMBL Genome project data  
AE,CP,CY GenBank Genome project data  
AG,DE,DH,FT DDBJ GSS  
B,AQ,AZ,BH,BZ, CC,CE,CG,CL, CW,CZ,DU,DX, ED,EI,EJ,EK, ER,ET,FH,FI GenBank GSS  
AK DDBJ cDNA projects  
AC,DP GenBank HTGS  
E,BD,DD,DI,DJ, DL,DM,FU DDBJ Patents  
A,AX,CQ,CS,FB, GM,GN EMBL Patents (nucleotide only)  
I,AR,DZ,EA,GC, GP GenBank Patents (nucleotide)  
G,BV,GF GenBank STS  
BR DDBJ TPA  
BN EMBL TPA  
EZ GenBank TSA  
S GenBank From journal scanning  
AD GenBank From GSDB  
AH GenBank Segmented set header  
AS GenBank Other – not currently being used  
BC GenBank MGC project    
BK GenBank TPA  
BL,GJ,GK GenBank TPA CON division  
BT GenBank FLI-cDNA projects  
J,K,L,M GenBank from GSDB direct submissions  
N GenBank and DDBJ N0-N2 were used intially by both groups but have been removed from circulation, N2-N9 are ESTs  
AAAA-AZZZ GenBank WGS  
BAAA-BZZZ DDBJ WGS  
CAAA-CZZZ EMBL WGS  
DAAA-DZZZ GenBank WGS TPA  
AAAAA-AZZZZ DDBJ MGA  

 

Protein Accession Prefixes (蛋白序列的前缀)

Prefix Database Type  
BAA-BZZ DDBJ Protein ID  
CAA-CZZ EMBL Protein ID  
AAA-AZZ GenBank Protein ID  
AAE GenBank Protein ID for Patents (note that there are also some patent proteins with AAA and AAC  
FAA_FZZ DDBJ TPA Protein ID  
DAA-DZZ GenBank TPA Protein ID  
GAA-GZZ DDBJ WGS Protein ID  
EAA-EZZ GenBank WGS Protein ID  
HAA-HZZ GenBank TPA WGS Protein ID  
O Swiss-Prot Protein  
P Swiss-Prot (Geneva) Protein  
Q Swiss-Prot (Hinxton) Protein

RefSeq的Accessio命名规则请看:http://www.liucheng.name/?p=379

原文链接:http://www.ncbi.nlm.nih.gov/Sequin/acc.html


《 “DDBJ/EMBL/GenBank Accession的命名规则” 》 有 5 条评论

  1. hey there and thank you to your info ? I have definitely picked up something new from right here. I did on the other hand experience some technical points the use of this site, as I skilled to reload the web site a lot of times prior to I may get it to load properly. I were pondering in case your hosting is OK? No longer that I’m complaining, but sluggish loading instances instances will very frequently have an effect on your placement in google and could harm your high-quality score if advertising and ***********

  2. Can I simply just say what a comfort to discover someone that really understands what
    they are talking about on the web. You actually realize how to bring a problem
    to light and make it important. More people need to check this out and understand this side of your story.
    I was surprised that you are not more popular given that you definitely possess
    the gift.