taxon

Download a genome data package by taxon (NCBI Taxonomy ID, scientific or common name at any tax rank)

taxon

Download a genome data package by taxon (NCBI Taxonomy ID, scientific or common name at any tax rank)

Name

datasets download genome taxon - Download a genome data package by taxon (NCBI Taxonomy ID, scientific or common name at any tax rank)

Synopsis

datasets download genome taxon <taxon> [flags]

Description

Download a genome data package by taxon (NCBI Taxonomy ID, scientific or common name at any tax rank). Genome data packages may include genome, transcript and protein sequences, annotation and one or more data reports. Data packages are downloaded as a zip archive.

The default genome data package includes the following files:

  • _<assembly_name>_genomic.fna (genomic sequences)
  • assembly_data_report.jsonl (data report with genome assembly and annotation metadata)
  • dataset_catalog.json (a list of files and file types included in the data package)

Examples

  datasets download genome taxon human --chromosomes 21 --include none
  datasets download genome taxon "bos taurus"
  datasets download genome taxon human --preview
  datasets download genome taxon 10116 --include rna,protein

Options

      --annotated                 Limit to annotated genomes
      --api-key string            Specify an NCBI API key
      --assembly-level string     Limit to genomes at one or more assembly levels (comma-separated):
                                    * chromosome
                                    * complete
                                    * contig
                                    * scaffold
                                     (default "[]")
      --assembly-source string    Limit to 'RefSeq' (GCF_) or 'GenBank' (GCA_) genomes (default "all")
      --assembly-version string   Limit to 'latest' assembly accession version or include 'all' (latest + previous versions)
      --chromosomes strings       Limit to a specified, comma-delimited list of chromosomes, or 'all' for all chromosomes
      --debug                     Emit debugging info
      --dehydrated                Download a dehydrated zip archive including the data report and locations of data files (use the rehydrate command to retrieve data files).
      --exclude-atypical          Exclude atypical assemblies
      --exclude-multi-isolate     Exclude assemblies from multi-isolate projects
      --filename string           Specify a custom file name for the downloaded data package (default "ncbi_dataset.zip")
      --from-type                 Only return records with type material
      --help                      Print detailed help about a datasets command
      --include string(,string)   Specify the data files to include (comma-separated).
                                    * genome:     genomic sequence
                                    * rna:        transcript
                                    * protein:    amnio acid sequences
                                    * cds:        nucleotide coding sequences
                                    * gff3:       general feature file
                                    * gtf:        gene transfer format
                                    * gbff:       GenBank flat file
                                    * seq-report: sequence report file
                                    * none:       do not retrieve any sequence files
                                     (default [genome])
      --mag string                Limit to metagenome assembled genomes (only) or remove them from the results (exclude) (default "all")
      --no-progressbar            Hide progress bar
      --preview                   Show information about the requested data package
      --reference                 Limit to reference genomes
      --released-after string     Limit to genomes released on or after a specified date (MM/DD/YYYY)
      --released-before string    Limit to genomes released on or before a specified date (MM/DD/YYYY)
      --search strings            Limit results to genomes with specified text in the searchable fields:
                                  species and infraspecies, assembly name and submitter.
                                  To search multiple strings, use the flag multiple times.
      --tax-exact-match           Exclude sub-species when a species-level taxon is specified
      --version                   Print version of datasets
Generated May 21, 2024