BankIt Submission Help: Feature Table File

The first line of the feature table contains the following basic information

>Feature Sequence_ID

The sequence identifier (Sequence_ID) must match the label used to identify each table's corresponding sequence in the nucleotide FASTA file.
Subsequent lines of the table list the features.

Prepare the feature table file in a text editor and save it as plain ascii text (not .rtf or .doc)

Format for a feature table:

Each feature is shown on a separate line.

Multiple nucleotide intervals for a feature are on subsequent lines.

Qualifier(s) describing a feature are on the line(s) below that feature and its intervals.

Each column is separated by a tab.

As shown in the examples below:

Line 1
Column 1: Start location (first nucleotide) of a feature
Column 2: Stop location (last nucleotide) of a feature
Column 3: Feature name (for example, 'CDS' or 'mRNA' or 'rRNA' or 'gene' or 'exon')

Line2:
Column 4: Qualifier name (for example, 'product' or 'number' or 'gene' or 'note')
Column 5: Qualifier value

Note in the examples below that 'gene' is both a Feature and a Qualifier and must be entered in two separate columns.

The examples below show sample tables and illustrates a number of points about the table format.

>Feature Seq1
<1    >1050    gene
                        gene          ATH1
<1    1009    CDS
                        product       acid trehalase
                        product       Athlp
                        codon_start   2
<1    >1050    mRNA
                        product       acid trehalase

>Feature Seq2
2626  2590    tRNA
2570  2535
                        product       tRNA-Phe

>Feature Seq3
1080  1210  CDS
1275  1315
                        product       actin
                        note          alternatively spliced
1055  1210  mRNA
1275  1340
                        product       actin
1055  1340  gene
                        gene          ACT
1055  1079  5'UTR
1316  1340  3'UTR

Features that are on complementary strand, such as the tRNA-Phe, are indicated by reversing the interval locations.

Locations of partial(incomplete) features are indicated with a ">" or "<" next to the number. In the Seq1 example, the gene, CDS and mRNA all begin upstream of the start of the nucleotide sequence. The "<" symbol indicates that they are 5' partial features and the ">" symbol indicates that the gene and mRNA are 3' partial. Furthermore, for the protein to translate correctly, the correct reading frame must be indicated with the qualifer "codon_start" on the CDS. There is no need to indicate codon_start on complete CDSs, as it is assumed that the translation starts at the first nucleotide of the interval if no codon_start is provided.

If a feature contains multiple intervals, like the spliced tRNA-Phe, each interval is listed on a separate line by its start and stop position before subsequent qualifier lines.

Gene features are always a single interval, and their location should cover the intervals of all the relevant features (for example: CDS plus 5'UTR plus 3'UTR).

If a protein has more than one name, each can be listed in the table as a separate product qualifier on the CDS in the table. The value of the first product qualifier will become the /product on the CDS in the flatfile, and any additional product qualifiers will be shown as a /note on the CDS in the flatfile. All CDS features must have atleast one product.

A flatfile /note can be added to any feature using the qualifier note in the table

BankIt Submission Help: Feature Table File

Preparing the Feature Table File