taxon
Download a genome data package by taxon (NCBI Taxonomy ID, scientific or common name at any tax rank)
taxon
Name
datasets download genome taxon - Download a genome data package by taxon (NCBI Taxonomy ID, scientific or common name at any tax rank)
Synopsis
datasets download genome taxon <taxon> [flags]
Description
Download a genome data package by taxon (NCBI Taxonomy ID, scientific or common name at any tax rank). Genome data packages may include genome, transcript and protein sequences, annotation and one or more data reports. Data packages are downloaded as a zip archive.
The default genome data package includes the following files:
_<assembly_name>_genomic.fna (genomic sequences) - assembly_data_report.jsonl (data report with genome assembly and annotation metadata)
- dataset_catalog.json (a list of files and file types included in the data package)
Examples
datasets download genome taxon human --chromosomes 21 --include none
datasets download genome taxon "bos taurus"
datasets download genome taxon human --preview
datasets download genome taxon 10116 --include rna,protein
Options
--annotated Limit to annotated genomes
--api-key string Specify an NCBI API key
--assembly-level string Limit to genomes at one or more assembly levels (comma-separated):
* chromosome
* complete
* contig
* scaffold
(default "[]")
--assembly-source string Limit to 'RefSeq' (GCF_) or 'GenBank' (GCA_) genomes (default "all")
--assembly-version string Limit to 'latest' assembly accession version or include 'all' (latest + previous versions)
--chromosomes strings Limit to a specified, comma-delimited list of chromosomes, or 'all' for all chromosomes
--debug Emit debugging info
--dehydrated Download a dehydrated zip archive including the data report and locations of data files (use the rehydrate command to retrieve data files).
--exclude-atypical Exclude atypical assemblies
--exclude-multi-isolate Exclude assemblies from multi-isolate projects
--filename string Specify a custom file name for the downloaded data package (default "ncbi_dataset.zip")
--from-type Only return records with type material
--help Print detailed help about a datasets command
--include string(,string) Specify the data files to include (comma-separated).
* genome: genomic sequence
* rna: transcript
* protein: amnio acid sequences
* cds: nucleotide coding sequences
* gff3: general feature file
* gtf: gene transfer format
* gbff: GenBank flat file
* seq-report: sequence report file
* none: do not retrieve any sequence files
(default [genome])
--mag string Limit to metagenome assembled genomes (only) or remove them from the results (exclude) (default "all")
--no-progressbar Hide progress bar
--preview Show information about the requested data package
--reference Limit to reference genomes
--released-after string Limit to genomes released on or after a specified date (MM/DD/YYYY)
--released-before string Limit to genomes released on or before a specified date (MM/DD/YYYY)
--search strings Limit results to genomes with specified text in the searchable fields:
species and infraspecies, assembly name and submitter.
To search multiple strings, use the flag multiple times.
--tax-exact-match Exclude sub-species when a species-level taxon is specified
--version Print version of datasets
Generated May 21, 2024