Show simple item record

dc.creatorAshari, Zhila Esna
dc.creatorDasgupta, Nairanjana
dc.creatorBrayton, Kelly A.
dc.creatorBroschat, Shira L.
dc.date.accessioned2020-07-13T21:06:04Z
dc.date.available2020-07-13T21:06:04Z
dc.date.issued2018
dc.identifier.urihttp://hdl.handle.net/2376/17916
dc.description.abstractType IV secretion systems (T4SS) are multi-protein complexes in a number of bacterial pathogens that can translocate proteins and DNA to the host. Most T4SSs function in conjugation and translocate DNA; however, approximately 13% function to secrete proteins, delivering effector proteins into the cytosol of eukaryotic host cells. Upon entry, these effectors manipulate the host cell’s machinery for their own benefit, which can result in serious illness or death of the host. For this reason recognition of T4SS effectors has become an important subject. Much previous work has focused on verifying effectors experimentally, a costly endeavor in terms of money, time, and effort. Having good predictions for effectors will help to focus experimental validations and decrease testing costs. In recent years, several scoring and machine learning-based methods have been suggested for the purpose of predicting T4SS effector proteins. These methods have used different sets of features for prediction, and their predictions have been inconsistent. In this paper, an optimal set of features is presented for predicting T4SS effector proteins using a statistical approach. A thorough literature search was performed to find features that have been proposed. Feature values were calculated for datasets of known effectors and non-effectors for T4SS-containing pathogens for four genera with a sufficient number of known effectors, Legionella pneumophila, Coxiella burnetii, Brucella spp, and Bartonella spp. The features were ranked, and less important features were filtered out. Correlations between remaining features were removed, and dimensional reduction was accomplished using principal component analysis and factor analysis. Finally, the optimal features for each pathogen were chosen by building logistic regression models and evaluating each model. The results based on evaluation of our logistic regression models confirm the effectiveness of our four optimal sets of features, and based on these an optimal set of features is proposed for all T4SS effector proteins.en_US
dc.languageEnglish
dc.publisherPLoS ONE
dc.rightsCreative Commons Attribution 4.0 International
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.titleAn optimal set of features for predicting type IV secretion system effector proteins for a subset of species based on a multi-level feature selection approach
dc.typeArticle
dc.description.versionPublished copy
dc.description.citationEsna Ashari, Z., N. Dasgupta, K.A. Brayton, and S.L. Broschat. (2018). An optimal set of fea-tures for predicting type IV secretion system effector proteins for a subset of species based ona multi-level feature selection approach. PLoS ONE, Vol. 13, No. 5, e0197041. doi:10.1371/journal.pone.0197041. PMCID: PMC5942808.


Files in this item

Thumbnail

This item appears in the following Collection(s)

  • Broschat, Shira
    This collection features research and educational materials by Shira Broschat, Professor and Curriculum Coordinator for the School of Electrical Engineering and Computer Science at Washington State University.

Show simple item record

Creative Commons Attribution 4.0 International
Except where otherwise noted, this item's license is described as Creative Commons Attribution 4.0 International