|
|
 |
 
Evidence
Viewer Facilitates Analysis of NCBI Human Gene Models
NCBI
has an ongoing program to assemble and annotate the human genome, incorporating
updates as new and revised genome data is deposited in public resources.
As part of the process, NCBI generates gene models based primarily on
alignment of mRNA sequences to the human genomic assembly. These alignments
are used as evidence of the intron/exon organization of a gene, as annotated
on the contigs. NCBI has developed an Evidence Viewer so that users can
see the alignment evidence for the gene models when mRNA is used in this
way.
Links to the Evidence Viewer (ev) are now provided in LocusLink and the
Human Genome Map Viewer when gene models are presented in an output report.
The model sequences are designated with accession numbers beginning with
XM_ for nucleotide and XP_ for protein sequences.
In Figure 1, the Evidence Viewer graphic shows a genomic contig from NCBIs
human genome assembly (NT_007993, Build 25) aligned to a GenBank mRNA
sequence (AF091214) for the human WRN helicase gene involved in Werner
Syndrome (OMIM number 277700). Also aligned are an NCBI mRNA Reference
Sequence (NM_000553) and an NCBI mRNA Model Sequence (XM_015858). Not
included in the figure is the detailed base-by-base ev alignment view,
which follows the graphical overview in the report.

Figure
1: Evidence Viewer Display for human
WRN helicase gene.
The 35 exons of the WRN gene, implied by the mRNA to genomic sequence
alignments, are shown as vertical tick marks along the gene. Single nucleotide
mismatches between the mRNA sequences and the corresponding genomic sequence,
as well as insertions and deletions, are marked on the mismatches
and indels scales immediately below the alignments.
In the case of the WRN gene, the NCBI mRNA RefSeq NM_000553 was constructed
from the GenBank mRNA record AF91214, then aligned to the genomic sequence
to produce the NCBI-generated mRNA Model Sequence XM_015858. The single
gene model currently given on the Genes_sequence map of the Map Viewer
is based on the mRNA alignment, shown in Figure 1, between the RefSeq
and genomic sequence.
Figure 1 indicates that the WRN gene has a distinct 5' exon cluster separated
from the rest of the gene by an intron. It is interesting to speculate
that this exon cluster may code for a distinct domain module in the WRN
helicase protein. In fact, it does! See the article
on DART in this issue to learn the identity of this protein domain.
The simplest case is illustrated here, where a single mRNA aligns to a
single place in the genome. However, there can also be multiple or overlapping
models. This can be due to a number of reasons, including splice variants,
paralogous genes, or inaccuracies in the draft sequence or assembly. The
Evidence Viewer is useful in helping researchers to analyze the alternative
models presentede.g., to see where the mismatches are and decide
how to interpret the evidence.

|