BankIt accepts features as a five-column, tab-delimited
table file. The feature table specifies the location and type of each feature,
and BankIt processes the feature intervals and translates any CDS features into
proteins.
The feature table format allows different kinds of features (e.g., gene,
mRNA, coding region, tRNA) and qualifiers (e.g., /product, /note) to be
annotated. The valid features
and qualifiers
are restricted to those approved by the International Nucleotide Sequence Database Collaboration.
The first line of the feature table contains the
following basic information
>Feature Sequence_ID
The sequence identifier (Sequence_ID) must match the label used to identify each
table's corresponding sequence in the nucleotide FASTA file.
Subsequent lines of the table list the features.
Prepare the feature table file in a text editor and save it as plain ascii
text (not .rtf or .doc)
Format for a feature table:
- Each feature is shown on a separate line.
- Multiple nucleotide intervals for a feature are on subsequent lines.
- Qualifier(s) describing a feature are on the line(s) below that feature and its intervals.
- Each column is separated by a tab.
As shown in the examples below:
Line 1
Column 1: Start location (first nucleotide) of a feature
Column 2: Stop location (last nucleotide) of a feature
Column 3: Feature name (for example, 'CDS' or 'mRNA' or 'rRNA' or 'gene' or
'exon')
Line2:
Column 4: Qualifier name (for example, 'product' or 'number' or 'gene' or 'note')
Column 5: Qualifier value
Note in the examples below that 'gene' is both a Feature and a
Qualifier and must be entered in two separate columns.
The examples below show sample tables and illustrates a number of points
about the table format.
>Feature Seq1
<1 >1050 gene
gene ATH1
<1 1009 CDS
product acid trehalase
product Athlp
codon_start 2
<1 >1050 mRNA
product acid trehalase
>Feature Seq2
2626 2590 tRNA
2570 2535
product tRNA-Phe
>Feature Seq3
1080 1210 CDS
1275 1315
product actin
note alternatively spliced
1055 1210 mRNA
1275 1340
product actin
1055 1340 gene
gene ACT
1055 1079 5'UTR
1316 1340 3'UTR
- Features that are on complementary strand, such as the tRNA-Phe, are indicated by reversing the interval locations.
- Locations of partial(incomplete) features are indicated with a ">" or
"<" next to the number. In the Seq1 example, the gene, CDS and mRNA all
begin upstream of the start of the nucleotide sequence.
The "<" symbol indicates that they are 5' partial features and the ">" symbol
indicates that the gene and mRNA are 3' partial.
Furthermore, for the protein to translate correctly, the correct reading frame
must be indicated with the qualifer "codon_start" on the CDS. There is no need
to indicate codon_start on complete CDSs, as it is assumed that the translation
starts at the first nucleotide of the interval if no codon_start is provided.
- If a feature contains multiple intervals, like the spliced tRNA-Phe, each
interval is listed on a separate line by its start and stop position before
subsequent qualifier lines.
- Gene features are always a single interval, and their location should cover
the intervals of all the relevant features (for example: CDS plus 5'UTR plus 3'UTR).
- If a protein has more than one name, each can be listed in the table as a
separate product qualifier on the CDS in the table. The value of the first
product qualifier will become the /product on the CDS in the flatfile, and any
additional product qualifiers will be shown as a /note on the CDS in the
flatfile. All CDS features must have atleast one product.
- A flatfile /note can be added to any feature using the qualifier note in the
table