NCBI Datasets Taxonomy Package

Taxonomic metadata for a set of requested taxa.

NCBI Datasets Taxonomy Package

Taxonomic metadata for a set of requested taxa.

The NCBI Datasets Taxonomy Data Package contains metadata for the requested taxonomic entities (NCBI TaxID, scientific or common name). In addition to the taxonomy report, the data package can be customized to include the names report in JSON Lines format, and a subset of metadata in tabular format.

Package Content

NCBI Datasets Taxonomy Data Package

This example shows the contents of the taxonomy data package for the genus Drosophila (taxid 7215)

datasets download taxonomy taxon 7215 --filename 7215.zip               
unzip 7215.zip -d 7215     
tree 7215

7215
|-- README.md
`-- ncbi_dataset
    `-- data
        |-- dataset_catalog.json
        |-- taxonomy_report.jsonl
        `-- taxonomy_summary.tsv

2 directories, 4 files

Taxonomy report

The taxonomy report contains metadata describing the taxonomic classification, parent and children nodes (when applicable) and counts of assemblies, genes and other genomic features. The file is in JSON Lines format, where each line is the metadata for one taxonomic entity. Use the dataformat tool for easy conversion to a tabular format of selected fields.

  • Path: ncbi_dataset/data/taxonomy_report.jsonl

Taxonomy summary

The taxonomy summary table is a tabular representation of a subset of metadata in the taxonomy report. Each row of the data table represents one NCBI Taxonomic ID. The columns in the data table are listed below:

Taxid
Tax name
Authority
Rank
Basionym
Basionym authority
Curator common name
Has type material
Group name
Superkingdom name
Superkingdom taxid
Kingdom name
Kingdom taxid
Phylum name
Phylum taxid
Class name
Class taxid
Order name
Order taxid
Family name
Family taxid
Genus name
Genus taxid
Species name
Species taxid
  • Path: ncbi_dataset/data/taxonomy_summary.tsv

Names report

The names report describes current scientific name, type material, basionym and authority as well as rank and taxonomic ID. The file is in JSON Lines format, where each line describes one NCBI taxonomic ID.

  • Path: ncbi_dataset/data/names_report.jsonl

README.md

The README contains a general project description common to all data packages.

  • Path: README.md

Dataset catalog

The dataset catalog lists each data file contained within or referenced by the package. Each data file is associated with a content type and location.

  • Path: ncbi_dataset/dataset_catalog.json
Generated May 21, 2024