MicroBIGG-E report
MicroBIGG-E record accession, organism, location, and biosample information
MicroBIGG-E report
The downloaded MicroBIGG-E package contains a MicroBIGG-E data report in
JSON lines
format at the following location in the file:
ncbi_dataset/data/data_report.jsonl
Each line of the MicroBIGG-E data report file is a hierarchical JSON
object that represents a single MicroBIGG-E record. The schema of the MicroBIGG-E record is defined in the tables below
where each row describes a single field in the report or a sub-structure, which is a collection of fields.
The outermost structure of the report is MicroBiggeReport.
Table fields that include a Table Field Mnemonic can be used with the
dataformat command-line tool's
--fields
Sample report
{
"amrFinderPlus": {
"dbVersion": "2023-08-08.2",
"type": "COMBINED",
"version": "3.11.26"
},
"amrMethod": "BLASTP",
"biosample": {
"accession": "SAMN07179453",
"assembly": "GCA_009287105.1",
"geographicOrigin": "United Kingdom: United Kingdom",
"source": "human",
"type": "clinical"
},
"class": "COPPER",
"closestReferenceSequenceComparison": {
"accession": "CAA58527.1",
"alignLength": 126,
"name": "copper resistance system metallochaperone PcoC",
"percentCoverage": 100.0,
"percentIdentical": 99.21
},
"element": {
"length": 126,
"name": "copper resistance system metallochaperone PcoC",
"referenceLength": 126,
"symbol": "pcoC"
},
"location": {
"accessionVersion": "AAMJFE010000009.1",
"range": [
{
"begin": "20304",
"end": "20684",
"orientation": "plus"
}
]
},
"readToAssemblyCoverage": {
"assembly": 52,
"contig": 55,
"ratio": 1.05769
},
"subclass": "COPPER",
"subtype": "METAL",
"targetAcc": "PDT000214120.2",
"taxonomy": {
"group": "Salmonella enterica",
"scientificName": "Salmonella enterica subsp. enterica serovar Rissen"
},
"type": "STRESS"
}
MicroBiggeReport Structure
Field | Table Field Mnemonic | Table Column Name | Type | Description | Examples |
---|---|---|---|---|---|
targetAcc | target-accession | Target accession | string | ||
element | Element | ||||
location | SeqRangeSet | The range of the gene | |||
type | type | Type | string | AMR STRESS | |
subtype | subtype | Subtype | string | AMR METAL | |
class | class | Class | string | GLYCOPEPTIDE COPPER/SILVER | |
subclass | subclass | Subclass | string | VANCOMYCIN COPPER/SILVER | |
amrMethod | amr-method | AMR method | string | EXACTP | |
isPlus | is-plus | Is plus | bool | ||
closestReferenceSequenceComparison | ClosestReference | ||||
taxonomy | Taxonomy | ||||
biosample | Biosample | ||||
readToAssemblyCoverage | ReadToAssemblyCoverage | ||||
amrFinderPlus | AmrFinderPlus | ||||
genesOnContig repeated | coming soon | coming soon | string | ||
genesOnIsolate repeated | coming soon | coming soon | string |
AmrFinderPlus Structure
Field | Table Field Mnemonic | Table Column Name | Type | Description | Examples |
---|---|---|---|---|---|
version | amrfinderplus-version | AMRFinderPlus version | string | ||
type | amrfinderplus-type | AMRFinderPlus type | string | ||
dbVersion | amrfinderplus-db-version | AMRFinderPlus database version | string |
Biosample Structure
Field | Table Field Mnemonic | Table Column Name | Type | Description | Examples |
---|---|---|---|---|---|
geographicOrigin | biosample-geo-origin | BioSample geographic origin | string | Denmark not determined | |
source | biosample-source | BioSample source | string | ||
type | biosample-type | BioSample type | string | clinical environmental/other | |
accession | biosample-accession | BioSample accession | string | SAMN00808999 | |
assembly | biosample-assembly | BioSample assembly accession | string | GCA_000395725.1 | |
collectionDate | biosample-collection-date | BioSample collection date | string |
ClosestReference Structure
Field | Table Field Mnemonic | Table Column Name | Type | Description | Examples |
---|---|---|---|---|---|
accession | closest-ref-accession | Closest reference accession | string | ||
name | closest-ref-name | Closest reference name | string | ||
percentCoverage | closest-ref-pct-coverage | Closest reference percent coverage | float | ||
percentIdentical | closest-ref-pct-ident | Closest reference percent identity | float | ||
alignLength | closest-ref-align-len | Closest reference alignment length | int32 |
Element Structure
Field | Table Field Mnemonic | Table Column Name | Type | Description | Examples |
---|---|---|---|---|---|
symbol | elem-symbol | Element symbol | string | vanS-A copB | |
name | elem-name | Element name | string | VanA-type vancomycin resistance histidine kinase VanS copper/silver-translocating P-type ATPase CopB | |
length | elem-length | Element length | int32 | ||
referenceLength | elem-ref-length | Element reference length | int32 |
Range Structure
A 1-based range on a sequence record.
Field | Table Field Mnemonic | Table Column Name | Type | Description | Examples |
---|---|---|---|---|---|
begin | start | Start | uint64 | ||
end | stop | Stop | uint64 | ||
orientation | orientation | Orientation | Orientation | ||
order | order | Order | uint32 | ||
ribosomalSlippage | coming soon | coming soon | int32 | When ribosomal slippage is desired, fill out slippage amount between this and previous range. |
ReadToAssemblyCoverage Structure
Field | Table Field Mnemonic | Table Column Name | Type | Description | Examples |
---|---|---|---|---|---|
contig | read-assm-coverage-contig | Read-to-Assembly-Coverage contig | uint32 | ||
assembly | read-assm-coverage-assembly | Read-to-Assembly-Coverage assembly | uint32 | ||
ratio | read-assm-coverage-ratio | Read-to-Assembly-Coverage ratio | float |
SeqRangeSet Structure
Field | Table Field Mnemonic | Table Column Name | Type | Description | Examples |
---|---|---|---|---|---|
accessionVersion | accession | Sequence Accession | string | NCBI Accession.version of the sequence | |
range repeated | range- | Range | Series of intervals on above accession_version |
Taxonomy Structure
Field | Table Field Mnemonic | Table Column Name | Type | Description | Examples |
---|---|---|---|---|---|
group | tax-group | Taxonomic group | string | Enterococcus faecium | |
scientificName | tax-name | Taxonomic name | string | Enterococcus faecium EnGen0172 |
Orientation Enumeration
Name | Number | Description |
---|---|---|
none | 0 | |
plus | 1 | |
minus | 2 |
Scalar Value Types
Protocol buffers type | Notes | C++ | Python | Java | Go |
---|---|---|---|---|---|
double | double | float | double | float64 | |
float | float | float | float | float32 | |
int32 | Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint32 instead. | int32 | int | int | int32 |
int64 | Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint64 instead. | int64 | int/long | long | int64 |
uint32 | Uses variable-length encoding. | uint32 | int/long | int | uint32 |
uint64 | Uses variable-length encoding. | uint64 | int/long | long | uint64 |
sint32 | Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int32s. | int32 | int | int | int32 |
sint64 | Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int64s. | int64 | int/long | long | int64 |
fixed32 | Always four bytes. More efficient than uint32 if values are often greater than 2^28. | uint32 | int | int | uint32 |
fixed64 | Always eight bytes. More efficient than uint64 if values are often greater than 2^56. | uint64 | int/long | long | uint64 |
sfixed32 | Always four bytes. | int32 | int | int | int32 |
sfixed64 | Always eight bytes. | int64 | int/long | long | int64 |
bool | bool | boolean | boolean | bool | |
string | A string must always contain UTF-8 encoded or 7-bit ASCII text. | string | str/unicode | String | string |
bytes | May contain any arbitrary sequence of bytes. | string | str | ByteString | []byte |