NCBI Logo
NCBI News




In this issue

Plasmodium
falciparum


Third Party
Annotation


Map Viewers

What’s the
Longest Sequence
in GenBank?


Structure Summaries

PubMed Central

The NCBI
Handbook


BLAST Lab

New Microbial
Genomes


GenBank
Release 133


Masthead

 



Third Party Annotation Database Debuts at GenBank

As the amount of publicly available sequence data rapidly increases, third party annotation will become increasingly important. The Third Party Annotation (TPA) database, created by GenBank and its international partners DNA Data Bank of Japan (DDBJ) and European Bioinformatics Institute (EBI), accepts third party annotation of genomic sequence, or computationally derived/predicted sequences. TPA submissions must use sequence data that is already represented in GenBank, and the analysis upon which the annotations are based must appear in a peer-reviewed scientific journal. Those wishing to add a feature annotation, such as a gene, to an unannotated genomic sequence or, wanting to combine two or more records, such as a set of ESTs, to create a longer transcript sequence, can submit their analysis or assembly to the TPA database. Trace data sequences or Whole Genome Shotgun (WGS) may be used as the basis of a TPA submission, but data from secondary sources such as NCBI Reference sequences or primary data from proprietary databases may not be used.

Third parties can submit annotations using either Sequin or BankIt. If using BankIt, choose “NO” when asked whether the submission is primary data in order to initiate the TPA option. Those making TPA submissions via Sequin should indicate this in their email message to NCBI and provide accession numbers for the primary sequence(s) used in their analysis. Instructions for making TPA submissions are found at:

  www.ncbi.nlm.nih.gov/Genbank/index.html

TPA records can be located with Entrez using the TPA term within the Properties field; for example:

  TPA [prop]

As of November 2002, there were 104 TPA records in the Entrez database. An example of a TPA record is shown below. The “Primary” field shows how the sequence in the TPA record was constructed from existing database sequences. In the case below, four GenBank database sequences were combined to produce the sequence upon which the submission is based. For instance, bases 1 through 503 in the TPA sequence were derived from bases 3 through 505 in GenBank sequence AQ655575.1.

Locus

BK000167

561 bp DNA linear INV 19-OCT-2002
Definition TPA: Trypanosoma brucei GRIP domain containing protein gene, partial cds.
Accession BK000167          
Version BK000167.1 GI:24137384        
Keywords Third Party Annotation; TPA.        
. . . . . .            
PRIMARY TPA_SPAN PRIMARY_IDENTIFIER PRIMARY_SPAN COMP
  1-503 AQ655575.1 3-505  
  323-561 AL465640.1 1-239  
  335-561 AQ638516.1 1-227  
  404-561 AZ213060.1 435-592 c




Continue


NCBI News | Fall/winter 2002 NCBI News