NCBI Logo NCBI News Masthead

In this issue

Using TaxPlot to
Compare Genomes

New RefSeq Accession
Numbers for Curated
Genomic Regions

GenBank News

Recent Publications

DART Targets
Protein Domains

Evidence Viewer
Facilitates Analysis
of NCBI Human
Gene Models

Frequently Asked
Questions

BLAST Lab

Masthead

Evidence Viewer Facilitates Analysis of NCBI Human Gene Models

NCBI has an ongoing program to assemble and annotate the human genome, incorporating updates as new and revised genome data is deposited in public resources. As part of the process, NCBI generates gene models based primarily on alignment of mRNA sequences to the human genomic assembly. These alignments are used as evidence of the intron/exon organization of a gene, as annotated on the contigs. NCBI has developed an Evidence Viewer so that users can see the alignment evidence for the gene models when mRNA is used in this way.

Links to the Evidence Viewer (ev) are now provided in LocusLink and the Human Genome Map Viewer when gene models are presented in an output report. The model sequences are designated with accession numbers beginning with XM_ for nucleotide and XP_ for protein sequences.

In Figure 1, the Evidence Viewer graphic shows a genomic contig from NCBI’s human genome assembly (NT_007993, Build 25) aligned to a GenBank mRNA sequence (AF091214) for the human WRN helicase gene involved in Werner Syndrome (OMIM number 277700). Also aligned are an NCBI mRNA Reference Sequence (NM_000553) and an NCBI mRNA Model Sequence (XM_015858). Not included in the figure is the detailed base-by-base ev alignment view, which follows the graphical overview in the report.

Figure: Evidence Viewer Display for human WRN helicase gene.

Figure 1: Evidence Viewer Display for human WRN helicase gene.


The 35 exons of the WRN gene, implied by the mRNA to genomic sequence alignments, are shown as vertical tick marks along the gene. Single nucleotide mismatches between the mRNA sequences and the corresponding genomic sequence, as well as insertions and deletions, are marked on the “mismatches” and “indels” scales immediately below the alignments.

In the case of the WRN gene, the NCBI mRNA RefSeq NM_000553 was constructed from the GenBank mRNA record AF91214, then aligned to the genomic sequence to produce the NCBI-generated mRNA Model Sequence XM_015858. The single gene model currently given on the Genes_sequence map of the Map Viewer is based on the mRNA alignment, shown in Figure 1, between the RefSeq and genomic sequence.

Figure 1 indicates that the WRN gene has a distinct 5' exon cluster separated from the rest of the gene by an intron. It is interesting to speculate that this exon cluster may code for a distinct domain module in the WRN helicase protein. In fact, it does! See the article on DART in this issue to learn the identity of this protein domain.

The simplest case is illustrated here, where a single mRNA aligns to a single place in the genome. However, there can also be multiple or overlapping models. This can be due to a number of reasons, including splice variants, paralogous genes, or inaccuracies in the draft sequence or assembly. The Evidence Viewer is useful in helping researchers to analyze the alternative models presented—e.g., to see where the mismatches are and decide how to interpret the evidence.



Continue Link


NCBI News | Fall 2001 NCBI News