Submit Sequences
Why should I submit virus sequences?
The scientific community depends on sharing data and results, so that we can build from each other’s work. GenBank and the Sequence Read Archive depend on contributors to help keep the databases as comprehensive, current, and accurate as possible. NCBI provides timely and accurate processing and biological review of new entries and updates to existing entries and is ready to assist authors who have new data to submit.
In addition to receiving accessions for publishing manuscripts, sharing sequence data is critical for epidemiological studies, understanding viral pathogenesis, and following FAIR data-sharing principles.
How do I submit virus sequences?
Before you submit your data, determine the submission type and submission tool appropriate for that type.
SARS-CoV-2 sequences
For SARS-CoV-2, GenBank has built a focused resource to make submitting easier at https://submit.ncbi.nlm.nih.gov/sarscov2/. All you need is the sequence in FASTA format and a table of the source metadata.
All other viruses
-
For Influenza virus A, B or C, Norovirus and Dengue virus, use Submission Portal - online wizard submission tool that provides immediate validation and feedback and adds coding region and gene annotation for you. We plan to add other viruses in the future.
-
For all other complete or partial Viral Genomes use Bankit or tbl2asn.
-
For unassembled sequence reads from a viral sample, a metagenome, or a metatranscriptome use Sequence Read Archive (SRA). SRA accepts genetic data and the associated quality scores produced by next generation sequencing technologies.
What do I need in order to submit sequences?
The process and data formats vary somewhat depending on the submission tool you use, but in general here is what you will need:
- Information on the authors submitting the data
- Associated publication information, even if it’s not published yet
- Nucleotide sequence(s) in FASTA format
-
Annotation of genes, coding regions, and any other relevant information
- This is NOT needed for SARS-CoV-2, influenza, norovirus, or dengue virus submissions – GenBank does it automatically
- For BankIt and tbl2asn, you can use a 5-column feature table, described here
-
Source metadata, either as a tab-delimited plain-text document, or entered manually
- Scientific name for the virus: select most appropriate taxonomy for organism/metagenome in the NCBI Taxonomy Browser
- Geographic location where the virus was isolated (e.g., USA: Bethesda, MD)
- Host and/or physical environment from which the virus was isolated
- Description of how the sample was taken (called “isolate source,” for example, nasal swab or serum)
- Complete collection date, including month and day if known, in the format DD-Mon-YYYY
- Isolate or strain designation
- Serotype or genotype if appropriate
- Segment name/number if appropriate
- BioSample records can be created to organize additional metadata. They are an optional but useful way to add more detailed description to your GenBank records. Find out more about BioSample here, where you can also browse pre-made templates called "packages" such as SARS-CoV-2: clinical or host-associated; version 1.0 or Virus; version 1.0.