![]() |
![]() |
| Plasmodium
|
What’s the Longest Sequence in GenBank? How About the Largest Protein? The Entrez search system makes it relatively easy to determine the answers to both of these questions. A bit of trial and error yields:
This query, which ensures that the sequence we find is in the primary database, GenBank, and is not a derivative record from the NCBI RefSeq database, picks up a single record:
This sequence is part of the recently deposited build 3 of the Drosophila melanogaster genome visible in the Map Viewer. The longest protein, found using...
...turns out to be human Titin, NP_ 596869, which is an astounding 34,350 amino acids in length. Titin is a muscle protein that binds to the Z-disc region and the Z-line and M-line of the sarcomere, respectively, so that a single titin molecule spans half the length of a sarcomere. As part of its processing of this RefSeq, NCBI has identified 274 “Immunoglobulin” and 264 “Fibronectin” domains within this isoform of titin. |