U.S. flag

An official website of the United States government

Introduction

The National Center for Biotechnology Information (NCBI) creates and maintains a set of databases that archive, process, display and report information related to human germline and somatic variants. These databases, primarily the Database of Short Genetic Variations (dbSNP) and the Database of Genomic Structural Variations (dbVar) represent almost 2 billion submitted human variants. The primary roles of both databases are to process submissions, archive the data, annotate on the genome and NCBI Reference Sequences (RefSeqs), and distribute it worldwide. The data is important for studying the basis of human diseases to improve diagnosis, treatment, and prevention and for research in a variety of fields such as species diversity, evolution, and conservation. Submission is accepted in various formats including VCF for reporting numerous variations generated by high-throughput sequencing (HTS) projects over multiple populations, as well as a wide variety of associated data including genotype and allele frequency data. Each submitted variant is assigned a database identifier (ss# in dbSNP or nsv#/esv# in dbVar) for citing in publications, allow cross-reference to other databases and linking to related data, facilitate annotation, and promote data exchange. These submissions are then processed to aggregate information from multiple submitters (rs# in dbSNP) and to calculate locations and functional consequences on RefSeqs and to integrate with other NCBI resources including Gene, PubMed, Nucleotide, Protein, and Genome. dbSNP and dbVar data are updated during regular build cycle with annotations on new assemblies and RefSeqs and the data distributed in diverse ways: Entrez searches, study-specific reports, annotation on the genome, Sequence Viewer, and FTP downloads as BED, VCF, and other.

NIH Genomic Data Sharing

If you're funded by NIH please consider complying with NIH Genomic Data Sharing (GDS) policy which takes effect on January 25, 2015. If you're NOT funded by NIH we would still hope you follow the spirit of the policy and submit to dbSNP or dbVar which are trusted GDS repository partners.

The table below highlight features of dbSNP and dbVar and their differences.

Database

dbSNP

https://www.ncbi.nlm.nih.gov/snp

dbVar

https://www.ncbi.nlm.nih.gov/dbvar/

Description

The SNP database (commonly known as dbSNP) contains short human nucleotide variations.

The dbVar database contains large human genomic structural variation data generated mostly by published studies. Variants typically have lengths longer than 50 nucleotides(in contrast to dbSNP).

Variation Type

Small variations (<= 50bp)

  • Single nucleotide variation (SNV)
  • Short multi-nucleotide changes (MNV)
  • Small deletions or insertions
  • retrotransposable element insertions

Large variations (> 50bp)

  • Copy number Variants (CNV)
  • Large deletions and insertions
  • Inversions
  • Translocations
  • Mobile elements
  • More…

Accession

  • Submitted SNP (ss#) – submitted variant based on asserted location or flanking sequences
  • Reference SNP(rs#) - Non-redundant set of variations based on clustering of SS’es of same variant type and sequence position (More).
  • Study (std#) - unit of submission, usually corresponds to the data output of a publication
  • Variant call (ssv#) - all independent experimental observations of structural variation
  • Variant region (sv#) - regions of the genome containing aggregated structural variation, i.e. calls

Data Aggregation

Data by RS:

  • Submitted SNP (ss) information
    • Submitter contact and publications
    • Variation Data – alleles, genotype, and frequency
    • Experimental methods and conditions
  • Genomic positions on different assembly versions
  • ClinVar clinical assertions

Data by SV and SSV:

  • Submitter contact and publications
  • Method
  • Genotype and Frequency
  • Genomic positions on different assembly versions
  • ClinVar clinical assertions

Linked Resources

  • ClinVar
  • dbGaP
  • BioProject
  • BioSample
  • Gene
  • PubMed
  • Genome
  • Nucleotide
  • Protein
  • Taxonomy
  • External collaborators
  • ClinVar
  • dbGaP
  • BioProject
  • BioSample
  • Gene
  • PubMed
  • Genome
  • Nucleotide
  • Protein
  • Taxonomy
  • External collaborators

Annotation

  • RS are annotated on all available latest genomic assemblies and RefSeq sequences (mRNA, Protein, and RefSeqGene)
  • SV and SSV are annotated on all available latest genomic assemblies

Access Policy

Open – All variant data including genotype and frequency and associated meta data are available without restrictions on website and FTP.

Open – All variant data including genotype and frequency and associated meta data are available without restrictions on website and FTP.

WEB:https://www.ncbi.nlm.nih.gov/dbvar/

FTP:ftp://ftp.ncbi.nlm.nih.gov/pub/dbVar/data/

Submission Guidelines

https://www.ncbi.nlm.nih.gov/projects/SNP/how_to_submit.html

https://www.ncbi.nlm.nih.gov/dbvar/content/submission/

Submission Limitations

dbSNP and dbVar DO NOT accept:

  • Synthetic mutations
  • Variations ascertained from cross-species alignments and analysis
  • Personal human data due to current NIH policy unless the participant is enrolled in a study with institutional oversight
  • Bacterial variant sequences which can be submitted to SRA (https://www.ncbi.nlm.nih.gov/sra/) or as alignments to GenBank PopSet (http://www.ncbi.nlm.nih.gov/popset/)
  • Human variations with an asserted relationship to disease or other phenotypes. These should be submitted to ClinVar (https://www.ncbi.nlm.nih.gov/clinvar/docs/submit/). However, dbSNP and dbVar staff members will help broker such submissions.

Contact

snp-sub@ncbi.nlm.nih.gov

dbvar@ncbi.nlm.nih.gov

Support Center

Last updated: 2019-06-25T17:38:26Z