Show simple item record

dc.creatorLockwood, Svetlana
dc.creatorBrayton, Kelly A.
dc.creatorDaily, Jeff A.
dc.creatorBroschat, Shira L.
dc.date.accessioned2020-07-13T20:07:24Z
dc.date.available2020-07-13T20:07:24Z
dc.date.issued2019
dc.identifier.urihttp://hdl.handle.net/2376/17911
dc.description.abstractWe clustered 8.76 M protein sequences deduced from 2,307 completely sequenced Proteobacterial genomes resulting in 707,311 clusters of one or more sequences of which 224,442 ranged in size from 2 to 2,894 sequences. To our knowledge this is the first study of this scale. We were surprised to find that no single cluster contained a representative sequence from all the organisms in the study. Given the minimal genome concept, we expected to find a shared set of proteins. To determine why the clusters did not have universal representation we chose four essential proteins, the chaperonin GroEL, DNA dependent RNA polymerase subunits beta and beta′ (RpoB/RpoB′), and DNA polymerase I (PolA), representing fundamental cellular functions, and examined their cluster distribution. We found these proteins to be remarkably conserved with certain caveats. Although the groEL gene was universally conserved in all the organisms in the study, the protein was not represented in all the deduced proteomes. The genes for RpoB and RpoB′ were missing from two genomes and merged in 88, and the sequences were sufficiently divergent that they formed separate clusters for 18 RpoB proteins (seven clusters) and 14 RpoB′ proteins (three clusters). For PolA, 52 organisms lacked an identifiable sequence, and seven sequences were sufficiently divergent that they formed five separate clusters. Interestingly, organisms lacking an identifiable PolA and those with divergent RpoB/RpoB′ were predominantly endosymbionts. Furthermore, we present a range of examples of annotation issues that caused the deduced proteins to be incorrectly represented in the proteome. These annotation issues made our task of determining protein conservation more difficult than expected and also represent a significant obstacle for high-throughput analyses.en_US
dc.languageEnglish
dc.publisherFrontiers in Microbiology
dc.rightsCreative Commons Attribution 4.0 International
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.titleWhole proteome clustering of 2,307 Proteobacterial genomes reveals conserved proteins and significant annotation issues
dc.typeArticle
dc.description.versionPublished copy
dc.description.citationLockwood, S., K. A. Brayton, J. A. Daily, and S. L. Broschat. (2019). Whole proteome clustering of 2,307 Proteobacterial genomes reveals conserved proteins and significant annotation issues. Frontiers in Microbiology, Vol. 10, 383. doi:10.3389/fmicb.2019.00383. PMCID:PMC7041399.
dc.description.noteFirst publication by Frontiers Media


Files in this item

Thumbnail

This item appears in the following Collection(s)

  • Broschat, Shira
    This collection features research and educational materials by Shira Broschat, Professor and Curriculum Coordinator for the School of Electrical Engineering and Computer Science at Washington State University.

Show simple item record

Creative Commons Attribution 4.0 International
Except where otherwise noted, this item's license is described as Creative Commons Attribution 4.0 International