Download a SARS-CoV-2 genome data package

Download sequences for SARS-CoV-2 GenBank genomes by taxon or lineage

Download a SARS-CoV-2 genome data package

Download sequences for SARS-CoV-2 GenBank genomes by taxon or lineage

Download a SARS-CoV-2 GenBank genome data package by taxon name or accession. The default data package includes genome sequence and primary metadata. Options are available to include cds and protein fasta sequence, and annotation and biosample metadata. Refer to the datasets command-line (CLI) reference for all available flags and subcommands.

If you want to download a virus data package for all SARS-CoV-2 genomes we recommend using the datasets CLI to request a cached virus data package. These packages are highly compressed and allow for a faster more reliable download experience. Cached packages are only available for all SARS-CoV-2 GenBank genomes and the following filtered sets:

  1. All SARS-CoV-2 genomes.
  2. Human host only
  3. Human host only & complete
  4. Complete only
  5. Annotated only

Download a cached virus data package of all SARS-CoV-2 genomes by taxon

You can use the organism name or NCBI Taxonomy ID (2697049).

datasets download virus genome taxon SARS-CoV-2 --filename sars_cov_2.zip

Download a cached virus data package of all SARS-CoV-2 complete genomes by taxon

datasets download virus genome taxon SARS-CoV-2 --complete-only --filename sars_cov_2_complete.zip

Download a custom set of SARS-CoV-2 genomes by accession(s)

For multiple accessions, list them on the CLI, separated by spaces. Alternatively, use the flag --inputfile, and provide a text file with one accession per line.

datasets download virus genome accession NC_045512.2

Download by SARS-CoV-2 lineage

Download SARS-CoV-2 GenBank genomes for specific lineages as classified by pangolin

datasets download virus genome taxon SARS-CoV-2 --lineage P.1 --filename SARS-CoV-2-P.1.zip

Generated May 16, 2024