![]() |
| New Microbial Genomes in GenBank Submission Corner GenBank Release 139 UniGene Adds Four RefSeq Version 3 Released Masthead |
As part of the latest release (v1.63) of the Conserved Domain Database (CDD), the alignment sets of the KOG1 database (clusters of euKaryotic Orthologous Groups) have been merged into CDD . The KOG database is essentially a eukaryotic version of the COG database (Clusters of Orthologous Groups) that was integrated into CDD in late 2002 (v1.60). KOGs and COGs cluster eukaryotic and prokaryotic proteins respectively into groups containing sequences that are mutual best hits in sequence similarity searches between different species. The KOG database includes proteins from H. sapiens, D. melanogaster, C. elegans, A. thaliana, S. cerevisiae, S. pombe, and E. cuniculi. With RPS-BLAST searches available for KOGs and COGs in CDD, users can now classify query sequences by similarity to these pre-determined sets alongside the alignments from Pfam, SMART, and the curated NCBI Conserved Domains. Because CDD data is also incorporated into Entrez as the Domains database, KOGs and COGs can be found using standard Entrez queries by fields such as title, organism, or text words. With KOGs and COGs now included in CDD, the displays of pre-computed RPS-BLAST results have been updated to reflect the different clustering schemes underlying the several datasets within CDD. CDD now contains datasets that cluster proteins based on overall sequence similarity (COGs and KOGs) along with those that cluster based on the presence of defined functional domains (Pfam, SMART, curated CDs). Multiple domain proteins will therefore often have two sets of hits in CDD: hits from COGs and KOGs to large portions of the sequence, and hits to Pfam, SMART, and/or CDD for each functional domain. In order to show both sets of hits in a simple display, each CDD record is now classified as either a "single" or "multiple" domain record, and the best hits from each set are shown when the Domains link is clicked for a record in Entrez Protein. Moreover, the Conserved Domain Architecture Retrieval Tool (CDART) only uses single domain records to group protein sequences by domain architecture. Click on image to view larger Figure 1. Graphical overview of Conserved Domain Search results for human SRC protein, RefSeq accession NP_005408, showing hits to KOG0197 and a PFAM-based conserved domain for tyrosine kinases, as well as hits to SH2 and SH3 domains. In the example shown above for NP_005408, the human SRC protein, hits are shown to both the multiple domain KOG0197 (tyrosine kinases) and to single domains pfam00018 (SH3), pfam00017 (SH2), and cd00192 (TyrKc, tyrosine kinase catalytic domain). 1Tatusov RL, Fedorova ND, Jackson JJ, Jacobs AR, Kiryutin B, Koonin EV, Krylov DM, Mazumder R, Mekhedov SL, Nikolskaya AN, Rao BS, Smirnov S, Sverdlov AV, Vasudevan S, Wolf YI, Yin JJ, Natale DA. The COG database: an updated version includes eukaryotes. BioMed Central Bioinformatics. 2003 Sep 11 [Epub ahead of print] PMID: 12969510
|
||
|
|
|||