genome

Download a genome data package

Name

datasets download genome - Download a genome data package

Synopsis

datasets download genome [flags]

Description

Download a genome data package. Genome data packages may include genome, transcript and protein sequences, annotation and one or more data reports. Data packages are downloaded as a zip archive.

The default genome data package includes the following files:

  • _<assembly_name>_genomic.fna (genomic sequences)
  • assembly_data_report.jsonl (data report with genome assembly and annotation metadata)
  • dataset_catalog.json (a list of files and file types included in the data package)

Examples

  datasets download genome accession GCF_000001405.40 --chromosomes X,Y --include genome,gff3,rna
  datasets download genome taxon "bos taurus" --dehydrated
  datasets download genome taxon human --assembly-level chromosome,complete --dehydrated
  datasets download genome taxon mouse --search C57BL/6J --search "Broad Institute" --dehydrated

Options

      --annotated                 Limit to annotated genomes
      --api-key string            Specify an NCBI API key
      --assembly-level string     Limit to genomes at one or more assembly levels (comma-separated):
                                    * chromosome
                                    * complete
                                    * contig
                                    * scaffold
                                     (default "[]")
      --assembly-source string    Limit to 'RefSeq' (GCF_) or 'GenBank' (GCA_) genomes (default "all")
      --assembly-version string   Limit to 'latest' assembly accession version or include 'all' (latest + previous versions)
      --chromosomes strings       Limit to a specified, comma-delimited list of chromosomes, or 'all' for all chromosomes
      --debug                     Emit debugging info
      --dehydrated                Download a dehydrated zip archive including the data report and locations of data files (use the rehydrate command to retrieve data files).
      --exclude-atypical          Exclude atypical assemblies
      --exclude-multi-isolate     Exclude assemblies from multi-isolate projects
      --filename string           Specify a custom file name for the downloaded data package (default "ncbi_dataset.zip")
      --from-type                 Only return records with type material
      --help                      Print detailed help about a datasets command
      --include string(,string)   Specify the data files to include (comma-separated).
                                    * genome:     genomic sequence
                                    * rna:        transcript
                                    * protein:    amnio acid sequences
                                    * cds:        nucleotide coding sequences
                                    * gff3:       general feature file
                                    * gtf:        gene transfer format
                                    * gbff:       GenBank flat file
                                    * seq-report: sequence report file
                                    * none:       do not retrieve any sequence files
                                     (default [genome])
      --mag string                Limit to metagenome assembled genomes (only) or remove them from the results (exclude) (default "all")
      --no-progressbar            Hide progress bar
      --preview                   Show information about the requested data package
      --reference                 Limit to reference genomes
      --released-after string     Limit to genomes released on or after a specified date (input format is flexible, YYYY/MM/DD is suggested)
      --released-before string    Limit to genomes released on or before a specified date (input format is flexible, YYYY/MM/DD is suggested)
      --search strings            Limit results to genomes with specified text in the searchable fields:
                                  species and infraspecies, assembly name and submitter.
                                  To search multiple strings, use the flag multiple times.
      --version                   Print version of datasets

Commands


accession

Download a genome data package by Assembly or BioProject accession

taxon

Download a genome data package by taxon (NCBI Taxonomy ID, scientific or common name at any tax rank)

Generated May 30, 2024