Organism information

Valid organism
Organism warning
Metagenomes (microbiomes)
Metagenome-assembled genomes

Valid organism

In general, NCBI BioSample requires each record to have a single organism, with a valid taxonomy name to the species level.

In most cases, you should enter the binomial scientific name, with complete genus and species. This includes model organisms and species with well-known common names. Examples:

Homo sapiens instead of "human"
Mus musculus instead of "mouse"
Escherichia coli instead of "E. coli"

If the name is published and is in the NCBI Taxonomy database, it will be automatically processed.

Organism warning

If the name is not found in the NCBI Taxonomy database, you will see a message that says:

Warning: Submission processing may be delayed due to necessary curator review. 
Please check spelling of organism, current information could not be resolved 
automatically and will require a taxonomy consult.

First, check that you spelled the organism name correctly. For example, "Homo spaiens" instead of "Homo sapiens". Make any needed corrections and click "Continue" again to resubmit the page.

If the name is spelled correctly, there are valid reasons for the name triggering the Warning message, including:

A validly published name may not yet be in the NCBI Taxonomy database. The database is authoritative but not comprehensive. We only add species names when we receive a sequence submission for the organism, so not every valid species is there.
You may be submitting a new taxonomy name or new combination that is not yet published.
You may be submitting an organism that you identified to genus or higher taxa, but not species.

If the name submitted is a valid name but not yet in the NCBI taxonomy database, enter the binomial name and our taxonomists will add it to the database.

In the case of unpublished or unidentified species, our taxonomists assign a provisional taxID for the organism based on the genus and the strain or isolate name.

If your organism is unpublished or unidentified, be sure to include a unique identifier, such as isolate or strain name or a voucher specimen number. The identifier should be unique among your samples and should not include the organism name or an abbreviation of the organism name. It should be a series of letters or numbers that serve as an identifier for your specimen, for example:

Staphlococcus sp.
strain= abc123

For an organism not identified to genus, use the lowest taxon (phylum, class, order, or family) that you know.

For bacterial or archaeal taxa, append "bacterium" or "archaeon" to the organism name, for example:

Enterobacteriaceae bacterium
Nanoarchaeales archaeon

For all other organisms, append "sp." to the name, for example:

A termite identified only to Family "Rhinotermitinae" but not to genus would be entered as "Rhinotermitinae sp."

If you submit multiple "[taxon] sp." samples and these are thought to come from more than one putative species containing several isolates or strains, please present these as "[taxon] sp. 1," "[taxon] sp. 2," etc., or with other unique identifiers to group them. The common format in NCBI Taxonomy ensure these are unique involve appending submitters’ initials and year, for example:

[taxon] sp. 1 ABC-2021
[taxon] sp. 2 ABC-2021

Metagenomes (Microbiomes)

We consider any environmental or clinical sample that may contain multiple organisms to be a "metagenome". Typically, metagenome samples are composed of microbial organisms, including archaea, bacteria and fungi, but are not restricted to those. The terminology is partially historical, since the first instances of this sample type were for genomic sequencing, but it now includes any sample of this type, regardless of the type of data that will be generated. You can think of NCBI "metagenome" taxonomy nodes as meaning "microbiomes".

If you are submitting sequences from such a sample, you need to use one of a special set of taxonomy nodes. The metagenome taxonomy nodes are under "unclassified sequences", since there is not a specific lineage. These are mostly divided into ecological metagenomes and organismal metagenomes. These are created on an as-needed basis, so not every imaginable type is present. Current practice is to use an existing node wherever possible and to provide more detailed information in the isolation_source and/or host attribute. The metagenome names are reflective of the source, not the organisms that will be identified. You should use the same name regardless of the type of sequencing you will be using. For example, if you are using 16S RNA primers to target bacterial species or ITS primers to target eukaryotic species, the metagenome name to use remains the same.

Browse the list and use the taxonomy name that best describes your sample. Some judgement is involved in choosing names and you should consider the intention of the study. There are minimal checks on the name you choose, to allow for maximum flexibility, but it must be in the NCBI Taxonomy database. Examples include:

A soil sample could use "soil metagenome" or perhaps "rhizome metagenome".
A plant sample would use "plant metagenome" or might use more specific names like "root metagenome" or "leaf metagenome" if only those tissues were sampled.
If the sample is from a specific organ of an animal, use a tissue-specific name where available, such as "skin metagenome" or "liver metagenome", or you can use one of the organism-specific names, such as "mouse metagenome" or "human metagenome", or the generic "insect metagenome" or "mollusk metagenome".
A sample from a goat rumen or from mouse cecum would use "gut metagenome". In such cases, be sure to include the host organism in the "host" field.
Stool or feces samples would also use "gut metagenome" if the target of the study is the intestinal microbiota of the organism. Alternately, if the bacterial community that develops on weathering feces outside the animal is the target of the study, use "feces metagenome". In either case provide the source organism name in the host field.
Note that there are a few specific gut metagenomes for commonly studied organisms, including "human gut metagenome", "mouse gut metagenome", and others.
A cyanobacterial enrichment culture would usually use "freshwater metagenome".
An artificial community put together from a set of known organisms as a test sample should use "synthetic metagenome"

Metagenome-assembled genomes

Metagenome-assembled genomes (MAGs) represent individual organisms computationally binned from samples containing a mixture of one or more organisms. The organism names for MAGs are often assigned using clustering tools such as SILVA or GTDB which may use unpublished taxonomic names. Please convert unpublished names to the equivalent NCBI taxonomic names. We want names that are taxonomically meaningful, at the lowest rank that's reliable (division, phylum, class, order, family, genus). Use "bacterium" or "archaeon" or "Eukaryota sp." if you don’t have more information. See this FAQ about submitting prokaryotic or eukaryotic MAGs to GenBank.

MAGs also require a unique alpha-numeric code to distinguish each organism. The identifier should be added as an isolate name, but we realize each organism was computationally binned and not isolated. The identifier should be unique and is often the same as the sample_name. It should not include the organism name or an abbreviation of the organism name. It should be a series of letters or numbers that serve as an identifier for your organism assembly.

For any of the above cases where a Warning message is received, you can click the "Continue" button again and the submission will proceed to the next step. At the final step of your submission, you will again be notified that the submission will be delayed for manual review. In most cases, this review takes about two business days.