Preparing a Source Modifiers Table File for All Source Modifiers
BankIt accepts source modifiers (e.g. specimen
voucher and isolate) in two ways, as a tab-delimited text
file containing a Source Modifiers Table (as described below) or by applying
the same source modifier value to all sequences in the submission using the
input form. Source modifiers can be changed by uploading new tables to overwrite a
previous table or by correcting or removing a previously input value in the
form. The current values of all source modifiers appear at the bottom of the page.
It is recommended for multiple sequences that you use only a table file
that contains all the source modifiers you want to add and that you do not add
source modifiers using both a table and the input value forms.
Setting up the Source Modifiers Table
The Source Modifiers Table is a
tab-delimited text file of the source modifiers for all
specimens in a BankIt set.
The following modifiers must have only 'TRUE' as the value reported in a
source modifier table when they are used:
- Germline
- Metagenomic
- Rearranged
- Transgenic
See below for an annotated list of source modifiers
Contents of the Source Modifiers Table
The first row in the table contains the labels for each column. Each column in the table is a different source modifier. See below for the complete list of source modifiers.
The first column contains the Sequence_IDs used to identify each sequence in the nucleotide FASTA file.
Specimens are identified in the Source Modifiers Table by the same Sequence_ID used in the FASTA file.
The heading for the first column must be exactly Sequence_ID as shown in the sample below.
Each specimen in the set must have a line in the source modifiers file, even if there are no modifiers to apply to the specimen.
Each Sequence_ID may appear only once in the source modifier file.
Shown below are the contents of a Sample Source Modifiers Table file. Right-click on the link to save as a tab-delimited text file.
Sequence_ID |
Collected_by |
Collection_date |
Country |
Isolation_source |
Isolate |
Lat_Lon |
Specimen_voucher |
Seq1 |
C. Grant |
31-Jan-2001 |
USA |
soil |
A |
13.57 N 24.68 W |
MKP 334 |
Seq2 |
S. Tracy |
28-Feb-2002 |
Slovakia |
contaminated soil |
B |
13.24 N 24.35 W |
MKP 1230 |
Seq3 |
A. Gardner |
16-Apr-2001 |
France |
farm soil |
C |
43.21 N 56.78 W |
1B-2526 |
Seq4 |
F. McMurray |
26-May-2002 |
Germany |
farm runoff water |
D |
45.32 N 21.34 E |
WBM 86-64 |
Seq5 |
V. Leigh |
13-Jun-2003 |
Brazil |
forest soil |
E |
46.80 N 13.57 E |
1B-2518 |
Seq6 |
E. Flynn |
15-Aug-2000 |
Australia |
river water |
F |
68.53 S 57.42 E |
WBM 86-65 |
Seq7 |
G. Kelly |
26-Oct-2002 |
Mexico |
river bed soil |
G |
22.44 S 55.77 W |
1B-2355 |
Saving the Source Modifiers Table
When using a spreadsheet program,
be sure to save your file as tab-delimited text.
If you are not sure that the "Save" option in your program
will do this for you, use "Save As..."
In Excel, select "Save As..." from the File menu. In the "Save as type:" pull-down menu, select "Text (Tab delimited) (*.txt)."
Source Modifiers
Commonly used Source Modifiers
The following source modifiers are available to further describe the sequences in a BankIt set:
- Altitude - Altitude in metres above or below sea level of where the sample was collected.
- Authority - The author or authors of the organism name from which sequence was obtained.
- Bio_material - An identifier for the biological material from which the nucleotide sequence was obtained, with optional institution code and collection code for the place where it is currently stored.
This should be provided using the following format 'institution-code:collection-code:material_id'.
material_id is mandatory, institution-code and collection-code are optional; institution-code is mandatory when collection-code is present.
This qualifier should be used to annotate the identifiers of material in biological collections which include zoos and aquaria, stock centers, seed banks, germplasm repositories and DNA banks.
- Biotype - Variety of a species (usually a fungus, bacteria, or virus) characterized by some specific biological property (often geographical, ecological, or physiological). Same as biotype.
- Biovar - See biotype
- Breed - The named breed from which sequence was obtained (usually applied to domesticated mammals).
- Cell_line - Cell line from which sequence was obtained.
- Cell_type - Type of cell from which sequence was obtained.
- Chemovar - Variety of a species (usually a fungus, bacteria, or virus) characterized by its biochemical properties.
- Clone - Name of clone from which sequence was obtained.
- Collected_by - Name of person who collected the sample.
- Collection_date - Date the specimen was collected.
In format DD-Mon-YYYY, that is 2-digit date, three-character abbreviation of month, and 4-digit year,
(e.g., 11-Feb-2002).
Mon-YYYY and YYYY are alternate formats to use when date information is less complete.
- Country - The country where the sequence's organism was
located. May also be an ocean or major sea. Additional region or locality
information must be after the country name and separated by a ':'. For
example: USA: Riverview Park, Ripkentown, MD
- Cultivar - Cultivated variety of plant from which sequence was obtained.
- Culture_collection - Institution code and identifier for the culture from which the nucleotide sequence was obtained, with optional collection code.
This should be provided using the following format
'institution-code:collection-code:culture-id'. culture-id and institution-code are mandatory.
This qualifier should be used to annotate live microbial and viral cultures, and cell lines that have been deposited in curated culture collections.
- Dev_stage - Developmental stage of organism.
- Ecotype - The named ecotype (population adapted to a local habitat) from which sequence was obtained (customarily applied to populations of Arabidopsis thaliana).
- Forma - The forma (lowest taxonomic unit governed by the nomenclatural codes) of organism from which sequence was obtained. This term is usually applied to plants and fungi.
- Forma_specialis - The physiologically distinct form from which sequence was obtained (usually restricted to certain parasitic fungi).
- Fwd_primer_name - name of forward PCR primer
- Fwd_primer_seq - nucleotide sequence of forward PCR primer
- Genotype - Genotype of the organism.
- Haplogroup - Name for a group of similar haplotypes that share some sequence variation
- Haplotype - Haplotype of the organism.
- Host - When the sequence submission is from an organism that exists in a symbiotic, parasitic, or other special relationship with some second organism, the 'host' modifier can be used to identify the name of the host species.
- Identified_by - name of the person or persons who identified by taxonomic name the organism from which the sequence was obtained
- Isolate - Identification or description of the specific individual from which this sequence was obtained.
- Isolation source - Describes the local geographical source of the organism from which the sequence was obtained.
- Lab_host - Laboratory host used to propagate the organism from which the sequence was obtained.
- Lat_Lon - Latitude and longitude, in decimal degrees, of where the sample was collected.
- Note - Any additional information that you wish to provide about the sequence.
- Pathovar - Variety of a species (usually a fungus, bacteria or virus) characterized by the biological target of the pathogen. Examples include Pseudomonas syringae pathovar tomato and Pseudomonas syringae pathovar tabaci.
- Pop_variant - name of the population variant from which the sequence was obtained
- Rev_primer_name - name of reverse PCR primer
- Rev_primer_seq - nucleotide sequence of reverse PCR primer
- Segment - name of viral or phage segment sequenced
- Serogroup - Variety of a species (usually a fungus, bacteria, or virus) characterized by its antigenic properties. Same as serogroup and serovar.
- Serotype - See Serogroup
- Serovar - See Serogroup
- Sex - Sex of the organism from which the sequence was obtained.
- Specimen_voucher - An identifier of the individual or collection of the source organism and the place where it is currently stored, usually an institution.
This should be provided using the following format
'institution-code:collection-code:specimen-id'. specimen-id is mandatory,
collection-code is optional; institution-code is mandatory when collection-code
is provided. Examples:
- 99-SRNP
- UAM:Mamm:52179
- personal collection:Joe Smith:99-SRNP
- AMCC:101706
- Strain - Strain of organism from which sequence was obtained.
- Sub_species - Subspecies of organism from which sequence was obtained.
- Subclone - Name of subclone from which sequence was obtained.
- Subtype - Subtype of organism from which sequence was obtained.
- Substrain - Sub-strain of organism from which sequence was obtained.
- Tissue_lib - Tissue library from which the sequence was obtained.
- Tissue_type - Type of tissue from which sequence was obtained.
- Type - Type of organism from which sequence was obtained.
- Variety - Variety of organism from which sequence was obtained.