Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation

meta_tags: - description: "Find out about the NCBI Virus portal including our data model, team, and related publications." title: "About Us | NCBI Virus"

About Us


Our Data Model

What is NCBI Virus?

NCBI Virus is an integrative, value-added resource designed to support retrieval, display and analysis of a curated collection of virus sequences and large sequence datasets. We are a community portal for viral sequence data, and our goal is to increase the usability of data archived in GenBank and other NCBI repositories.

Our mission is to enable researchers:

1) to find sequences and sequence data-sets of interest more easily via filtering of data along normalized metadata, and

2) to use virus sequence data more effectively by creating custom data reports and exporting those reports in various formats for use outside of NCBI Virus.

This is a work in process and we welcome your feedback! Please, contact us using this contact form or use the Feedback link to send in your comments/suggestions.

Feedback

Back to Top

Data Model, Type of Data and Dataflow

NCBI Virus uses machine processing of all records from the International Nucleotide Sequence Database Collaboration (INSDC) databases as well as human curation to provide high-quality virus sequences, with standardized metadata for a subset of these sequence (read more in our publication).

We use manual and machine curation to validate viral sequence data and normalize sequence and sample attributes (metadata). This data is then made available through a custom search interface that supports selection of data based on a variety of properties.

When a sequence is submitted to GenBank or another INSDC database, the authors provide a description of the sample it was isolated from - for example, the collection date and country, the host, and the isolation source. We have standardized this metadata, so instead of searching for all similar terms (and their misspellings), you can easily filter by a single term. You can read more at Search for sequences by virus name or taxonomy group.

Currently NCBI Virus database includes the data from following sequence groups:

Find more about NCBI Virus functionalities and how to get started at our Help page.

In September 2023, we removed Protein Data Base (PDB) nucleotide records from NCBI Virus search results. PDB records are typically very short and accompany three-dimensional protein structures that are available in the NCBI Structure database. PDB records are still searchable through other resources such as RCSB PDB and the NCBI Nucleotide database.

Back to Top


Meet the Team

The NCBI Virus Team is part of the National Center for Biotechnology Information (NCBI). We focus on developing free virus-related resources, which includes both computational development and design as well as curation of virus sequences and large sequence datasets.

Back to Top


Our Publications

Cite Us!

In case you would like to credit the NCBI Virus resource as part of your work, include this URL https://www.ncbi.nlm.nih.gov/labs/virus/vssi/#/ in the website citation formatted according to your publisher's recommendations.

Example:

NCBI Virus [Internet]. Bethesda (MD): National Library of Medicine (US), National Center for Biotechnology Information; 2004 – [cited YYYY MM DD]. Available from: https://www.ncbi.nlm.nih.gov/labs/virus/vssi/#/

If you want to find more information on citing other NCBI services and databases, please follow this link: How do I cite NCBI services and databases? .

Back to Top

Title Authors Read Online
Virus Variation Resource - improved response to emergent viral outbreaks. Hatcher EL, Zhdanov SA, Bao Y, Blinkova O, Nawrocki EP, Ostapchuck Y, Schaffer AA, Brister JR. Nucleic Acids Res. 2017 Jan 4;45(D1):D482-D490. doi: 10.1093/nar/gkw1065. Epub 2016 Nov 28.
Minimum Information about an Uncultivated Virus Genome (MIUViG). Roux S, Adriaenssens EM, Dutilh BE, Koonin EV, Kropinski AM, et al. Nat Biotechnol. 2018 Dec 17. doi: 10.1038/nbt.4306.
Overlapping genes and the proteins they encode differ significantly in their sequence composition from non-overlapping genes. Pavesi A, Vianelli A, Chirico N, Bao Y, Blinkova O, et al. PLoS One. 2018 Oct 19;13(10):e0202513.
How to Name and Classify Your Phage: An Informal Guide. Adriaenssens E, Brister JR. Viruses. 2017 Apr 3;9(4). pii: E70.
Consensus statement: Virus taxonomy in the age of metagenomics. Simmonds P, Adams MJ, Benkő M, Breitbart M, Brister JR, et al. Nat Rev Microbiol. 2017 Mar;15(3):161-168.
NCBI will no longer make taxonomy identifiers for individual influenza strains on January 15, 2018. Hatcher E, Bao Y, Amedeo P, Blinkova O, Cochrane G, et al. PeerJ Preprints.
NCBI viral genomes resource. Brister JR, Ako-Adjei D, Bao Y, Blinkova O. Nucleic Acids Res. 2015 Jan;43(Database issue):D571-7.
HIV-1, human interaction database: current status and new features. Ako-Adjei D, Fu W, Wallin C, Katz KS, Song G, Darji D, et al. Nucleic Acids Res. 2015 Jan;43(Database issue):D566-70.
The Influenza Virus Resource at the National Center for Biotechnology Information. Bao Y., P. Bolotov, D. Dernovoy, B. Kiryutin, L. Zaslavsky, et al. J. Virol. 2008 Jan;82(2):596-601.
Visualization of large influenza virus sequence datasets using adaptively aggregated trees with sampling-based subscale representation. Zaslavsky L, Y. Bao and T. A. Tatusova. BMC Bioinformatics, 2008; 9:237.
Accelerating the neighbor-joining algorithm using the adaptive bucket data structure. Zaslavsky L. and Tatusova T. Bioinformatics Research and Applications. Lecture Notes in Computer Science, Springer-Verlag, 2008; 4983:122-133.
Multiresolution approaches to representation and visualization of large influenza virus sequence datasets. Zaslavsky L, Bao Y and Tatusova T. IEEE International Conference on Bioinformatics and Biomedicine. 2007.
FLAN: a web server for influenza virus genome annotation. Bao Y, Bolotov P, Dernovoy D, Kiryutin B, Tatusova T. Nucleic Acids Research. 2007 Jul 1; 35 (Web Server issue): W280-4.
An Adaptive Resolution Tree Visualization of Large Influenza Virus Sequence Datasets. Zaslavsky L, Bao Y, and Tatusova T. Bioinformatics Research and Applications. Lecture Notes in Computer Science, Springer-Verlag, 2007;4463:192-202.
Large-scale sequencing of human influenza reveals the dynamic nature of viral genome evolution. Ghedin E, Sengamalay NA, Shumway M, Zaborsky J, Feldblyum T, et al. Nature. 2005 Oct 20; 437(7062): 1162-6.
Whole-genome analysis of human influenza A virus reveals multiple persistent lineages and reassortment among recent H3N2 viruses. Holmes EC, Ghedin E, Miller N, Taylor J, Bao Y, et al. PLoS Biol. 2005 Sep; 3(9): e300.
Virus Variation Resource - recent updates and future directions. Brister JR, Bao Y, Zhdanov SA, Ostapchuck Y, Chetvernin V, et al. Nucleic Acids Res. 2014 Jan; 42(Database issue):D660-5. doi: 10.1093/nar/gkt1268. Epub 2013 Dec 4.
The virus variation resources at the National Center for Biotechnology Information: dengue virus. Resch W, Zaslavsky L, Kiryutin B, Rozanov M, Bao Y, Tatusova TA. BMC Microbiol. 2009 Apr 2;9:65.

Back to Top