Download a SARS-CoV-2 protein data package

Download sequences and metadata for selected SARS-CoV-2 proteins

Download a SARS-CoV-2 protein data package

Download sequences and metadata for selected SARS-CoV-2 proteins

Download a protein data package for one or more SARS-CoV-2 proteins by protein name. The default data package includes protein sequence and primary metadata. Options are available to include cds sequence, and annotation and biosample metadata. Refer to the datasets command-line (CLI) reference for all available flags and subcommands.

Download cached protein data package by protein name

Download a virus protein data package by providing one or more space-delimited protein names.

datasets download virus protein S

Choosing which data files to include in the data package

Virus protein data packages contain protein sequences and primary metadata by default. You can choose to add additional data files or only include metadata in the data package using --include with one or more terms. For a full list of available data files, see the <em>datasets</em> reference .

Here are a few examples of using the --include flag to choose which data files to include in the data package.

Get protein and CDS sequences for the protein M of the SARS-CoV-2 reference genome:

datasets download virus protein M --refseq --include protein,cds

Get protein sequences and the annotation report for the proteins M, S and E of the SARS-CoV-2 reference genome:

datasets download virus protein M S E --refseq --include protein,annotation

Get a data package with only the virus data report (metadata):

datasets download virus protein M --refseq --include none

Filtering by properties

When downloading a virus protein data package for SARS-CoV-2, you can filter the results by different properties, including the following:

  • reference status
  • annotation status
  • geographic location
  • completeness
  • release date
  • update date
  • host
     

Get protein M data for the SARS-CoV-2 reference genome:

datasets download virus protein M --refseq

Get protein M data for SARS-CoV-2 genomes isolated from dogs:

datasets download virus protein M --host dog

Get protein M data for the SARS-CoV-2 genomes based on geographic location:

datasets download virus protein M --geo-location Brazil

Get protein M data for genomes released after January 1, 2021:

datasets download virus protein M --released-after 01/01/2021
Generated May 21, 2024